util/aio: Defer disabling poll mode as long as possible

Message ID	20220710120849.63086-1-chao.gao@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Chao Gao <chao.gao@intel.com> To: stefanha@redhat.com, fam@euphon.net, pbonzini@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, Chao Gao <chao.gao@intel.com> Subject: [PATCH] util/aio: Defer disabling poll mode as long as possible Date: Sun, 10 Jul 2022 20:08:49 +0800 Message-Id: <20220710120849.63086-1-chao.gao@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=134.134.136.65; envelope-from=chao.gao@intel.com; helo=mga03.intel.com X-Spam_score_int: -44 X-Spam_score: -4.5 X-Spam_bar: ---- X-Spam_report: (-4.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	util/aio: Defer disabling poll mode as long as possible \| expand util/aio: Defer disabling poll mode as long as possible

Message ID

20220710120849.63086-1-chao.gao@intel.com (mailing list archive)

State

New, archived

Headers

From: Chao Gao <chao.gao@intel.com>
To: stefanha@redhat.com,
	fam@euphon.net,
	pbonzini@redhat.com
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
 Chao Gao <chao.gao@intel.com>
Subject: [PATCH] util/aio: Defer disabling poll mode as long as possible
Date: Sun, 10 Jul 2022 20:08:49 +0800
Message-Id: <20220710120849.63086-1-chao.gao@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=134.134.136.65; envelope-from=chao.gao@intel.com;
 helo=mga03.intel.com
X-Spam_score_int: -44
X-Spam_score: -4.5
X-Spam_bar: ----
X-Spam_report: (-4.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

util/aio: Defer disabling poll mode as long as possible | expand

Commit Message

Chao Gao July 10, 2022, 12:08 p.m. UTC

When we measure FIO read performance (cache=writethrough, bs=4k,
iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed
from guest to qemu.

It turns out those frequent notificatons are caused by interference from
worker threads. Worker threads queue bottom halves after completing IO
requests.  Pending bottom halves may lead to either aio_compute_timeout()
zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns
no progress after noticing pending aio_notify() events. Both cause
run_poll_handlers() to call poll_set_started(false) to disable poll mode.
However, for both cases, as timeout is already zeroed, the event loop
(i.e., aio_poll()) just processes bottom halves and then starts the next
event loop iteration. So, disabling poll mode has no value but leads to
unnecessary notifications from guest.

To minimize unnecessary notifications from guest, defer disabling poll
mode to when the event loop is about to be blocked.

With this patch applied, FIO seq-read performance (bs=4k, iodepth=64,
cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS.

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 util/aio-posix.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Stefan Hajnoczi July 11, 2022, 3:05 p.m. UTC | #1

On Sun, Jul 10, 2022 at 08:08:49PM +0800, Chao Gao wrote:
> When we measure FIO read performance (cache=writethrough, bs=4k,
> iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed
> from guest to qemu.
> 
> It turns out those frequent notificatons are caused by interference from
> worker threads. Worker threads queue bottom halves after completing IO
> requests.  Pending bottom halves may lead to either aio_compute_timeout()
> zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns
> no progress after noticing pending aio_notify() events. Both cause
> run_poll_handlers() to call poll_set_started(false) to disable poll mode.
> However, for both cases, as timeout is already zeroed, the event loop
> (i.e., aio_poll()) just processes bottom halves and then starts the next
> event loop iteration. So, disabling poll mode has no value but leads to
> unnecessary notifications from guest.
> 
> To minimize unnecessary notifications from guest, defer disabling poll
> mode to when the event loop is about to be blocked.
> 
> With this patch applied, FIO seq-read performance (bs=4k, iodepth=64,
> cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS.
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  util/aio-posix.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan

Stefan Hajnoczi Aug. 4, 2022, 6:50 p.m. UTC | #2

On Sun, Jul 10, 2022 at 08:08:49PM +0800, Chao Gao wrote:
> When we measure FIO read performance (cache=writethrough, bs=4k,
> iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed
> from guest to qemu.
> 
> It turns out those frequent notificatons are caused by interference from
> worker threads. Worker threads queue bottom halves after completing IO
> requests.  Pending bottom halves may lead to either aio_compute_timeout()
> zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns
> no progress after noticing pending aio_notify() events. Both cause
> run_poll_handlers() to call poll_set_started(false) to disable poll mode.
> However, for both cases, as timeout is already zeroed, the event loop
> (i.e., aio_poll()) just processes bottom halves and then starts the next
> event loop iteration. So, disabling poll mode has no value but leads to
> unnecessary notifications from guest.
> 
> To minimize unnecessary notifications from guest, defer disabling poll
> mode to when the event loop is about to be blocked.
> 
> With this patch applied, FIO seq-read performance (bs=4k, iodepth=64,
> cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS.
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>  util/aio-posix.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)

I just noticed that I forgot to send a pull request with this for QEMU
7.1. It's my fault that this missed QEMU 7.1, sorry. It will be merged
once the 7.2 merge window opens.

Stefan

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 731f3826c0..6cc6256d53 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -585,18 +585,16 @@  static bool try_poll_mode(AioContext *ctx, AioHandlerList *ready_list,
 
     max_ns = qemu_soonest_timeout(*timeout, ctx->poll_ns);
     if (max_ns && !ctx->fdmon_ops->need_wait(ctx)) {
+        /*
+         * Enable poll mode. It pairs with the poll_set_started() in
+         * aio_poll() which disables poll mode.
+         */
         poll_set_started(ctx, ready_list, true);
 
         if (run_poll_handlers(ctx, ready_list, max_ns, timeout)) {
             return true;
         }
     }
-
-    if (poll_set_started(ctx, ready_list, false)) {
-        *timeout = 0;
-        return true;
-    }
-
     return false;
 }
 
@@ -657,6 +655,17 @@  bool aio_poll(AioContext *ctx, bool blocking)
      * system call---a single round of run_poll_handlers_once suffices.
      */
     if (timeout || ctx->fdmon_ops->need_wait(ctx)) {
+        /*
+         * Disable poll mode. poll mode should be disabled before the call
+         * of ctx->fdmon_ops->wait() so that guest's notification can wake
+         * up IO threads when some work becomes pending. It is essential to
+         * avoid hangs or unnecessary latency.
+         */
+        if (poll_set_started(ctx, &ready_list, false)) {
+            timeout = 0;
+            progress = true;
+        }
+
         ctx->fdmon_ops->wait(ctx, &ready_list, timeout);
     }

util/aio: Defer disabling poll mode as long as possible

Commit Message

Comments

Patch