From patchwork Wed Mar 11 12:40:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431591 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB81A924 for ; Wed, 11 Mar 2020 12:41:54 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF277208E4 for ; Wed, 11 Mar 2020 12:41:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BJOVDRJ9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF277208E4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51134 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0gL-00083w-QQ for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:41:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46831) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0fW-0006L7-TV for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0fV-0000Bk-JA for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:02 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:45357 helo=us-smtp-delivery-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0fV-0000BI-G3 for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930461; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tjIHP9UFruVDoydfZu86iPJdLD2gZX7JnMJoEBOXlN0=; b=BJOVDRJ9E5XPunYQGnRRkMgRJVk6AF8mdhYKX2mdqJuMggpfTQ9z2/DVXEep3SmSEt8X37 nsAGBaCfky0xW+NBUkjTmMUm3o8E+Q8/Aw3ctGhurX2kN/y7ekUSkKLaDDDFb4ZDwL6VEC LahqQxUggy1fnPTmKlx3IEUIiq7XG3o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-153-B3WmZbWvPfmaMptAT9VXFQ-1; Wed, 11 Mar 2020 08:40:56 -0400 X-MC-Unique: B3WmZbWvPfmaMptAT9VXFQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AD451477; Wed, 11 Mar 2020 12:40:55 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id 93571907F7; Wed, 11 Mar 2020 12:40:52 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 1/9] qemu/queue.h: clear linked list pointers on remove Date: Wed, 11 Mar 2020 12:40:37 +0000 Message-Id: <20200311124045.277969-2-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.81 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Do not leave stale linked list pointers around after removal. It's safer to set them to NULL so that use-after-removal results in an immediate segfault. The RCU queue removal macros are unchanged since nodes may still be traversed after removal. Suggested-by: Paolo Bonzini Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200224103406.1894923-2-stefanha@redhat.com Message-Id: <20200224103406.1894923-2-stefanha@redhat.com> --- include/qemu/queue.h | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 294db54eb1..456a5b01ee 100644 --- a/include/qemu/queue.h +++ b/include/qemu/queue.h @@ -142,6 +142,8 @@ struct { \ (elm)->field.le_next->field.le_prev = \ (elm)->field.le_prev; \ *(elm)->field.le_prev = (elm)->field.le_next; \ + (elm)->field.le_next = NULL; \ + (elm)->field.le_prev = NULL; \ } while (/*CONSTCOND*/0) /* @@ -225,12 +227,15 @@ struct { \ } while (/*CONSTCOND*/0) #define QSLIST_REMOVE_HEAD(head, field) do { \ - (head)->slh_first = (head)->slh_first->field.sle_next; \ + typeof((head)->slh_first) elm = (head)->slh_first; \ + (head)->slh_first = elm->field.sle_next; \ + elm->field.sle_next = NULL; \ } while (/*CONSTCOND*/0) #define QSLIST_REMOVE_AFTER(slistelm, field) do { \ - (slistelm)->field.sle_next = \ - QSLIST_NEXT(QSLIST_NEXT((slistelm), field), field); \ + typeof(slistelm) next = (slistelm)->field.sle_next; \ + (slistelm)->field.sle_next = next->field.sle_next; \ + next->field.sle_next = NULL; \ } while (/*CONSTCOND*/0) #define QSLIST_REMOVE(head, elm, type, field) do { \ @@ -241,6 +246,7 @@ struct { \ while (curelm->field.sle_next != (elm)) \ curelm = curelm->field.sle_next; \ curelm->field.sle_next = curelm->field.sle_next->field.sle_next; \ + (elm)->field.sle_next = NULL; \ } \ } while (/*CONSTCOND*/0) @@ -304,8 +310,10 @@ struct { \ } while (/*CONSTCOND*/0) #define QSIMPLEQ_REMOVE_HEAD(head, field) do { \ - if (((head)->sqh_first = (head)->sqh_first->field.sqe_next) == NULL)\ + typeof((head)->sqh_first) elm = (head)->sqh_first; \ + if (((head)->sqh_first = elm->field.sqe_next) == NULL) \ (head)->sqh_last = &(head)->sqh_first; \ + elm->field.sqe_next = NULL; \ } while (/*CONSTCOND*/0) #define QSIMPLEQ_SPLIT_AFTER(head, elm, field, removed) do { \ @@ -329,6 +337,7 @@ struct { \ if ((curelm->field.sqe_next = \ curelm->field.sqe_next->field.sqe_next) == NULL) \ (head)->sqh_last = &(curelm)->field.sqe_next; \ + (elm)->field.sqe_next = NULL; \ } \ } while (/*CONSTCOND*/0) @@ -446,6 +455,8 @@ union { \ (head)->tqh_circ.tql_prev = (elm)->field.tqe_circ.tql_prev; \ (elm)->field.tqe_circ.tql_prev->tql_next = (elm)->field.tqe_next; \ (elm)->field.tqe_circ.tql_prev = NULL; \ + (elm)->field.tqe_circ.tql_next = NULL; \ + (elm)->field.tqe_next = NULL; \ } while (/*CONSTCOND*/0) /* remove @left, @right and all elements in between from @head */ From patchwork Wed Mar 11 12:40:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431597 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E5AA921 for ; Wed, 11 Mar 2020 12:43:34 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D829B208E4 for ; Wed, 11 Mar 2020 12:43:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GtGR6CAP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D829B208E4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51166 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0hx-0002cN-2C for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:43:33 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46856) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0fc-0006Ts-Lc for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0fb-0000Dc-Iu for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:08 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:21562 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0fb-0000DS-Fl for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930467; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LYqKUmaJke914092shDhJOwoKcdjeHSS5ptesBdfPMI=; b=GtGR6CAP7NwfQoOVNrkxwvjHzZ0hmmGXHczvICfriXx4ITuCavkKoIZUnZBjFHmVJcoefh F6QQmUK7TsaLM/YbHlVCmKaltacu/kMByDHLYsHZrwUXj3EQ8lo2VvWebdYUa/xGIbxdk3 7tBWoNJIgPVvFDDz8I2TbLL5WIeOAO4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-213-rRoVyL-vM7yTf0u4DTwryA-1; Wed, 11 Mar 2020 08:41:01 -0400 X-MC-Unique: rRoVyL-vM7yTf0u4DTwryA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EB6788010E3; Wed, 11 Mar 2020 12:40:59 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1AF105C28E; Wed, 11 Mar 2020 12:40:56 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 2/9] aio-posix: remove confusing QLIST_SAFE_REMOVE() Date: Wed, 11 Mar 2020 12:40:38 +0000 Message-Id: <20200311124045.277969-3-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" QLIST_SAFE_REMOVE() is confusing here because the node must be on the list. We actually just wanted to clear the linked list pointers when removing it from the list. QLIST_REMOVE() now does this, so switch to it. Suggested-by: Paolo Bonzini Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200224103406.1894923-3-stefanha@redhat.com Message-Id: <20200224103406.1894923-3-stefanha@redhat.com> --- util/aio-posix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index 9e1befc0c0..b339aab12c 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -493,7 +493,7 @@ static bool aio_dispatch_ready_handlers(AioContext *ctx, AioHandler *node; while ((node = QLIST_FIRST(ready_list))) { - QLIST_SAFE_REMOVE(node, node_ready); + QLIST_REMOVE(node, node_ready); progress = aio_dispatch_handler(ctx, node) || progress; } From patchwork Wed Mar 11 12:40:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431593 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EFB8D921 for ; Wed, 11 Mar 2020 12:41:59 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C711E208E4 for ; Wed, 11 Mar 2020 12:41:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Fa4Q/Nav" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C711E208E4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51136 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0gQ-0008IP-Vj for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:41:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46900) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0fh-0006eq-Ob for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0fg-0000FL-JH for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:13 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:47850 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0fg-0000FB-GF for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930472; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U3Jtfg1TTB2Ah/JG3MKEAc1vIUmTSjTGAWT/B4X6ezQ=; b=Fa4Q/Navdq7WxyOFWvYv6wMoqzPEOuyvWKWuG/XBe0xRyXleRwJDP4XbgUxWrr6trt65oh Jb+pSEDP/U2pDOtQwD5OcikkwlICqrm9/70MxE7MIDlCCUpdnoEmm2R6xmtuE0HPWWHs/r wx0C2RHPK90M438GeryEEpB2Xg8AjSA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-315-88HbSqWPNM-RqxCCeowwTQ-1; Wed, 11 Mar 2020 08:41:05 -0400 X-MC-Unique: 88HbSqWPNM-RqxCCeowwTQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8D7431084431; Wed, 11 Mar 2020 12:41:04 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id 648C338E; Wed, 11 Mar 2020 12:41:01 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 3/9] aio-posix: completely stop polling when disabled Date: Wed, 11 Mar 2020 12:40:39 +0000 Message-Id: <20200311124045.277969-4-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" One iteration of polling is always performed even when polling is disabled. This is done because: 1. Userspace polling is cheaper than making a syscall. We might get lucky. 2. We must poll once more after polling has stopped in case an event occurred while stopping polling. However, there are downsides: 1. Polling becomes a bottleneck when the number of event sources is very high. It's more efficient to monitor fds in that case. 2. A high-frequency polling event source can starve non-polling event sources because ppoll(2)/epoll(7) is never invoked. This patch removes the forced polling iteration so that poll_ns=0 really means no polling. IOPS increases from 10k to 60k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device because the large number of event sources being polled slows down the event loop. Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200305170806.1313245-2-stefanha@redhat.com Message-Id: <20200305170806.1313245-2-stefanha@redhat.com> --- util/aio-posix.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index b339aab12c..65964a2597 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -361,12 +361,13 @@ void aio_set_event_notifier_poll(AioContext *ctx, (IOHandler *)io_poll_end); } -static void poll_set_started(AioContext *ctx, bool started) +static bool poll_set_started(AioContext *ctx, bool started) { AioHandler *node; + bool progress = false; if (started == ctx->poll_started) { - return; + return false; } ctx->poll_started = started; @@ -388,8 +389,15 @@ static void poll_set_started(AioContext *ctx, bool started) if (fn) { fn(node->opaque); } + + /* Poll one last time in case ->io_poll_end() raced with the event */ + if (!started) { + progress = node->io_poll(node->opaque) || progress; + } } qemu_lockcnt_dec(&ctx->list_lock); + + return progress; } @@ -670,12 +678,12 @@ static bool try_poll_mode(AioContext *ctx, int64_t *timeout) } } - poll_set_started(ctx, false); + if (poll_set_started(ctx, false)) { + *timeout = 0; + return true; + } - /* Even if we don't run busy polling, try polling once in case it can make - * progress and the caller will be able to avoid ppoll(2)/epoll_wait(2). - */ - return run_poll_handlers_once(ctx, timeout); + return false; } bool aio_poll(AioContext *ctx, bool blocking) From patchwork Wed Mar 11 12:40:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431599 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C0191921 for ; Wed, 11 Mar 2020 12:43:52 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 976B6208E4 for ; Wed, 11 Mar 2020 12:43:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OTahsydW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 976B6208E4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51168 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0iF-0003FI-Mm for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:43:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46960) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0fm-0006nP-KL for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0fl-0000He-FN for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:18 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:41122 helo=us-smtp-delivery-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0fl-0000HG-B5 for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930477; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5vEhsnHdkrdmlhhoxg4gu84Vi1CeoTvkMYk4LdIuEGI=; b=OTahsydW2jtjd5emneYS8G8U7hyoHXmp3d/12Fe6Y6vzPdjpoNLP7DZ8Kt9GB6AfsqWWdD CJihxOzmqBBvpgeUdCLP7shyHtK3prpivfXxVG+lhxd07uD+3sOMssUCqsbbZl1KFbQCbj BoLDi6KgLSiDekbmb8gtLP77F9K+dEU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-263-HZF3W29oPL6OHaIlFysOwA-1; Wed, 11 Mar 2020 08:41:10 -0400 X-MC-Unique: HZF3W29oPL6OHaIlFysOwA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9BC498024CF; Wed, 11 Mar 2020 12:41:09 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4F8E392D27; Wed, 11 Mar 2020 12:41:06 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 4/9] aio-posix: move RCU_READ_LOCK() into run_poll_handlers() Date: Wed, 11 Mar 2020 12:40:40 +0000 Message-Id: <20200311124045.277969-5-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.81 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Now that run_poll_handlers_once() is only called by run_poll_handlers() we can improve the CPU time profile by moving the expensive RCU_READ_LOCK() out of the polling loop. This reduces the run_poll_handlers() from 40% CPU to 10% CPU in perf's sampling profiler output. Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200305170806.1313245-3-stefanha@redhat.com Message-Id: <20200305170806.1313245-3-stefanha@redhat.com> --- util/aio-posix.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index 65964a2597..11a4971955 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -583,16 +583,6 @@ static bool run_poll_handlers_once(AioContext *ctx, int64_t *timeout) bool progress = false; AioHandler *node; - /* - * Optimization: ->io_poll() handlers often contain RCU read critical - * sections and we therefore see many rcu_read_lock() -> rcu_read_unlock() - * -> rcu_read_lock() -> ... sequences with expensive memory - * synchronization primitives. Make the entire polling loop an RCU - * critical section because nested rcu_read_lock()/rcu_read_unlock() calls - * are cheap. - */ - RCU_READ_LOCK_GUARD(); - QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { if (!QLIST_IS_INSERTED(node, node_deleted) && node->io_poll && aio_node_check(ctx, node->is_external) && @@ -636,6 +626,16 @@ static bool run_poll_handlers(AioContext *ctx, int64_t max_ns, int64_t *timeout) trace_run_poll_handlers_begin(ctx, max_ns, *timeout); + /* + * Optimization: ->io_poll() handlers often contain RCU read critical + * sections and we therefore see many rcu_read_lock() -> rcu_read_unlock() + * -> rcu_read_lock() -> ... sequences with expensive memory + * synchronization primitives. Make the entire polling loop an RCU + * critical section because nested rcu_read_lock()/rcu_read_unlock() calls + * are cheap. + */ + RCU_READ_LOCK_GUARD(); + start_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); do { progress = run_poll_handlers_once(ctx, timeout); From patchwork Wed Mar 11 12:40:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E6A6F921 for ; Wed, 11 Mar 2020 12:44:05 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AD39721655 for ; Wed, 11 Mar 2020 12:44:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="L1Vf9wn1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AD39721655 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51174 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0iS-0003f3-ST for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:44:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47092) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0g8-0007Zd-Hd for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0g5-0000Pn-Im for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:40 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:43531 helo=us-smtp-delivery-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0g5-0000P0-DC for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930497; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PNSjL+Si7ga0B/Oz2LGeXrGFMipVtA0Gc2ySjSbpi4k=; b=L1Vf9wn1dXl9Os/CtElYGFPH0M/2fUCWN3KSD21paGdx5RuMpouzk48N0PoRH+Bslu3Thd G4Y97sk/HKHrvUkEeAUwFcuNcNmuFHDQvDXEkgdzTsHKarI3gBlkuH7fRJ2b0D/foFJtZS uMSRwht03viFLKSB2LFWDf8c6vQ7N8M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-440-7_8NW8MWOVe5ta70OuR9wg-1; Wed, 11 Mar 2020 08:41:15 -0400 X-MC-Unique: 7_8NW8MWOVe5ta70OuR9wg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 89407100550D; Wed, 11 Mar 2020 12:41:14 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id 09043100164D; Wed, 11 Mar 2020 12:41:10 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 5/9] aio-posix: extract ppoll(2) and epoll(7) fd monitoring Date: Wed, 11 Mar 2020 12:40:41 +0000 Message-Id: <20200311124045.277969-6-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 205.139.110.61 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" The ppoll(2) and epoll(7) file descriptor monitoring implementations are mixed with the core util/aio-posix.c code. Before adding another implementation for Linux io_uring, extract out the existing ones so there is a clear interface and the core code is simpler. The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps struct. See the patch for details. Semantic changes: 1. ppoll(2) now reflects events from pollfds[] back into AioHandlers while we're still on the clock for adaptive polling. This was already happening for epoll(7), so if it's really an issue then we'll need to fix both in the future. 2. epoll(7)'s fallback to ppoll(2) while external events are disabled was broken when the number of fds exceeded the epoll(7) upgrade threshold. I guess this code path simply wasn't tested and no one noticed the bug. I didn't go out of my way to fix it but the correct code is simpler than preserving the bug. I also took some liberties in removing the unnecessary AioContext->epoll_available (just check AioContext->epollfd != -1 instead) and AioContext->epoll_enabled (it's implicit if our AioContext->fdmon_ops callbacks are being invoked) fields. Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com Message-Id: <20200305170806.1313245-4-stefanha@redhat.com> --- MAINTAINERS | 2 + include/block/aio.h | 36 +++++- util/Makefile.objs | 2 + util/aio-posix.c | 286 ++------------------------------------------ util/aio-posix.h | 61 ++++++++++ util/fdmon-epoll.c | 151 +++++++++++++++++++++++ util/fdmon-poll.c | 104 ++++++++++++++++ 7 files changed, 366 insertions(+), 276 deletions(-) create mode 100644 util/aio-posix.h create mode 100644 util/fdmon-epoll.c create mode 100644 util/fdmon-poll.c diff --git a/MAINTAINERS b/MAINTAINERS index 36d0c6887a..66f46fa41a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1885,6 +1885,8 @@ L: qemu-block@nongnu.org S: Supported F: util/async.c F: util/aio-*.c +F: util/aio-*.h +F: util/fdmon-*.c F: block/io.c F: migration/block* F: include/block/aio.h diff --git a/include/block/aio.h b/include/block/aio.h index 9dd61cee7e..90e07d7507 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -52,6 +52,38 @@ struct ThreadPool; struct LinuxAioState; struct LuringState; +/* Callbacks for file descriptor monitoring implementations */ +typedef struct { + /* + * update: + * @ctx: the AioContext + * @node: the handler + * @is_new: is the file descriptor already being monitored? + * + * Add/remove/modify a monitored file descriptor. There are three cases: + * 1. node->pfd.events == 0 means remove the file descriptor. + * 2. !is_new means modify an already monitored file descriptor. + * 3. is_new means add a new file descriptor. + * + * Called with ctx->list_lock acquired. + */ + void (*update)(AioContext *ctx, AioHandler *node, bool is_new); + + /* + * wait: + * @ctx: the AioContext + * @ready_list: list for handlers that become ready + * @timeout: maximum duration to wait, in nanoseconds + * + * Wait for file descriptors to become ready and place them on ready_list. + * + * Called with ctx->list_lock incremented but not locked. + * + * Returns: number of ready file descriptors. + */ + int (*wait)(AioContext *ctx, AioHandlerList *ready_list, int64_t timeout); +} FDMonOps; + /* * Each aio_bh_poll() call carves off a slice of the BH list, so that newly * scheduled BHs are not processed until the next aio_bh_poll() call. All @@ -173,8 +205,8 @@ struct AioContext { /* epoll(7) state used when built with CONFIG_EPOLL */ int epollfd; - bool epoll_enabled; - bool epoll_available; + + const FDMonOps *fdmon_ops; }; /** diff --git a/util/Makefile.objs b/util/Makefile.objs index 6b38b67cf1..6439077a68 100644 --- a/util/Makefile.objs +++ b/util/Makefile.objs @@ -5,6 +5,8 @@ util-obj-y += aiocb.o async.o aio-wait.o thread-pool.o qemu-timer.o util-obj-y += main-loop.o util-obj-$(call lnot,$(CONFIG_ATOMIC64)) += atomic64.o util-obj-$(CONFIG_POSIX) += aio-posix.o +util-obj-$(CONFIG_POSIX) += fdmon-poll.o +util-obj-$(CONFIG_EPOLL_CREATE1) += fdmon-epoll.o util-obj-$(CONFIG_POSIX) += compatfd.o util-obj-$(CONFIG_POSIX) += event_notifier-posix.o util-obj-$(CONFIG_POSIX) += mmap-alloc.o diff --git a/util/aio-posix.c b/util/aio-posix.c index 11a4971955..bc0b86547c 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -20,191 +20,17 @@ #include "qemu/sockets.h" #include "qemu/cutils.h" #include "trace.h" -#ifdef CONFIG_EPOLL_CREATE1 -#include -#endif +#include "aio-posix.h" -struct AioHandler -{ - GPollFD pfd; - IOHandler *io_read; - IOHandler *io_write; - AioPollFn *io_poll; - IOHandler *io_poll_begin; - IOHandler *io_poll_end; - void *opaque; - bool is_external; - QLIST_ENTRY(AioHandler) node; - QLIST_ENTRY(AioHandler) node_ready; /* only used during aio_poll() */ - QLIST_ENTRY(AioHandler) node_deleted; -}; - -/* Add a handler to a ready list */ -static void add_ready_handler(AioHandlerList *ready_list, - AioHandler *node, - int revents) +void aio_add_ready_handler(AioHandlerList *ready_list, + AioHandler *node, + int revents) { QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */ node->pfd.revents = revents; QLIST_INSERT_HEAD(ready_list, node, node_ready); } -#ifdef CONFIG_EPOLL_CREATE1 - -/* The fd number threshold to switch to epoll */ -#define EPOLL_ENABLE_THRESHOLD 64 - -static void aio_epoll_disable(AioContext *ctx) -{ - ctx->epoll_enabled = false; - if (!ctx->epoll_available) { - return; - } - ctx->epoll_available = false; - close(ctx->epollfd); -} - -static inline int epoll_events_from_pfd(int pfd_events) -{ - return (pfd_events & G_IO_IN ? EPOLLIN : 0) | - (pfd_events & G_IO_OUT ? EPOLLOUT : 0) | - (pfd_events & G_IO_HUP ? EPOLLHUP : 0) | - (pfd_events & G_IO_ERR ? EPOLLERR : 0); -} - -static bool aio_epoll_try_enable(AioContext *ctx) -{ - AioHandler *node; - struct epoll_event event; - - QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { - int r; - if (QLIST_IS_INSERTED(node, node_deleted) || !node->pfd.events) { - continue; - } - event.events = epoll_events_from_pfd(node->pfd.events); - event.data.ptr = node; - r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event); - if (r) { - return false; - } - } - ctx->epoll_enabled = true; - return true; -} - -static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new) -{ - struct epoll_event event; - int r; - int ctl; - - if (!ctx->epoll_enabled) { - return; - } - if (!node->pfd.events) { - ctl = EPOLL_CTL_DEL; - } else { - event.data.ptr = node; - event.events = epoll_events_from_pfd(node->pfd.events); - ctl = is_new ? EPOLL_CTL_ADD : EPOLL_CTL_MOD; - } - - r = epoll_ctl(ctx->epollfd, ctl, node->pfd.fd, &event); - if (r) { - aio_epoll_disable(ctx); - } -} - -static int aio_epoll(AioContext *ctx, AioHandlerList *ready_list, - int64_t timeout) -{ - GPollFD pfd = { - .fd = ctx->epollfd, - .events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR, - }; - AioHandler *node; - int i, ret = 0; - struct epoll_event events[128]; - - if (timeout > 0) { - ret = qemu_poll_ns(&pfd, 1, timeout); - if (ret > 0) { - timeout = 0; - } - } - if (timeout <= 0 || ret > 0) { - ret = epoll_wait(ctx->epollfd, events, - ARRAY_SIZE(events), - timeout); - if (ret <= 0) { - goto out; - } - for (i = 0; i < ret; i++) { - int ev = events[i].events; - int revents = (ev & EPOLLIN ? G_IO_IN : 0) | - (ev & EPOLLOUT ? G_IO_OUT : 0) | - (ev & EPOLLHUP ? G_IO_HUP : 0) | - (ev & EPOLLERR ? G_IO_ERR : 0); - - node = events[i].data.ptr; - add_ready_handler(ready_list, node, revents); - } - } -out: - return ret; -} - -static bool aio_epoll_enabled(AioContext *ctx) -{ - /* Fall back to ppoll when external clients are disabled. */ - return !aio_external_disabled(ctx) && ctx->epoll_enabled; -} - -static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds, - unsigned npfd, int64_t timeout) -{ - if (!ctx->epoll_available) { - return false; - } - if (aio_epoll_enabled(ctx)) { - return true; - } - if (npfd >= EPOLL_ENABLE_THRESHOLD) { - if (aio_epoll_try_enable(ctx)) { - return true; - } else { - aio_epoll_disable(ctx); - } - } - return false; -} - -#else - -static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new) -{ -} - -static int aio_epoll(AioContext *ctx, AioHandlerList *ready_list, - int64_t timeout) -{ - assert(false); -} - -static bool aio_epoll_enabled(AioContext *ctx) -{ - return false; -} - -static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds, - unsigned npfd, int64_t timeout) -{ - return false; -} - -#endif - static AioHandler *find_aio_handler(AioContext *ctx, int fd) { AioHandler *node; @@ -314,10 +140,10 @@ void aio_set_fd_handler(AioContext *ctx, atomic_read(&ctx->poll_disable_cnt) + poll_disable_change); if (new_node) { - aio_epoll_update(ctx, new_node, is_new); + ctx->fdmon_ops->update(ctx, new_node, is_new); } else if (node) { /* Unregister deleted fd_handler */ - aio_epoll_update(ctx, node, false); + ctx->fdmon_ops->update(ctx, node, false); } qemu_lockcnt_unlock(&ctx->list_lock); aio_notify(ctx); @@ -532,52 +358,6 @@ void aio_dispatch(AioContext *ctx) timerlistgroup_run_timers(&ctx->tlg); } -/* These thread-local variables are used only in a small part of aio_poll - * around the call to the poll() system call. In particular they are not - * used while aio_poll is performing callbacks, which makes it much easier - * to think about reentrancy! - * - * Stack-allocated arrays would be perfect but they have size limitations; - * heap allocation is expensive enough that we want to reuse arrays across - * calls to aio_poll(). And because poll() has to be called without holding - * any lock, the arrays cannot be stored in AioContext. Thread-local data - * has none of the disadvantages of these three options. - */ -static __thread GPollFD *pollfds; -static __thread AioHandler **nodes; -static __thread unsigned npfd, nalloc; -static __thread Notifier pollfds_cleanup_notifier; - -static void pollfds_cleanup(Notifier *n, void *unused) -{ - g_assert(npfd == 0); - g_free(pollfds); - g_free(nodes); - nalloc = 0; -} - -static void add_pollfd(AioHandler *node) -{ - if (npfd == nalloc) { - if (nalloc == 0) { - pollfds_cleanup_notifier.notify = pollfds_cleanup; - qemu_thread_atexit_add(&pollfds_cleanup_notifier); - nalloc = 8; - } else { - g_assert(nalloc <= INT_MAX); - nalloc *= 2; - } - pollfds = g_renew(GPollFD, pollfds, nalloc); - nodes = g_renew(AioHandler *, nodes, nalloc); - } - nodes[npfd] = node; - pollfds[npfd] = (GPollFD) { - .fd = node->pfd.fd, - .events = node->pfd.events, - }; - npfd++; -} - static bool run_poll_handlers_once(AioContext *ctx, int64_t *timeout) { bool progress = false; @@ -689,8 +469,6 @@ static bool try_poll_mode(AioContext *ctx, int64_t *timeout) bool aio_poll(AioContext *ctx, bool blocking) { AioHandlerList ready_list = QLIST_HEAD_INITIALIZER(ready_list); - AioHandler *node; - int i; int ret = 0; bool progress; int64_t timeout; @@ -723,26 +501,7 @@ bool aio_poll(AioContext *ctx, bool blocking) * system call---a single round of run_poll_handlers_once suffices. */ if (timeout || atomic_read(&ctx->poll_disable_cnt)) { - assert(npfd == 0); - - /* fill pollfds */ - - if (!aio_epoll_enabled(ctx)) { - QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { - if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events - && aio_node_check(ctx, node->is_external)) { - add_pollfd(node); - } - } - } - - /* wait until next event */ - if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) { - npfd = 0; /* pollfds[] is not being used */ - ret = aio_epoll(ctx, &ready_list, timeout); - } else { - ret = qemu_poll_ns(pollfds, npfd, timeout); - } + ret = ctx->fdmon_ops->wait(ctx, &ready_list, timeout); } if (blocking) { @@ -791,19 +550,6 @@ bool aio_poll(AioContext *ctx, bool blocking) } } - /* if we have any readable fds, dispatch event */ - if (ret > 0) { - for (i = 0; i < npfd; i++) { - int revents = pollfds[i].revents; - - if (revents) { - add_ready_handler(&ready_list, nodes[i], revents); - } - } - } - - npfd = 0; - progress |= aio_bh_poll(ctx); if (ret > 0) { @@ -821,23 +567,15 @@ bool aio_poll(AioContext *ctx, bool blocking) void aio_context_setup(AioContext *ctx) { -#ifdef CONFIG_EPOLL_CREATE1 - assert(!ctx->epollfd); - ctx->epollfd = epoll_create1(EPOLL_CLOEXEC); - if (ctx->epollfd == -1) { - fprintf(stderr, "Failed to create epoll instance: %s", strerror(errno)); - ctx->epoll_available = false; - } else { - ctx->epoll_available = true; - } -#endif + ctx->fdmon_ops = &fdmon_poll_ops; + ctx->epollfd = -1; + + fdmon_epoll_setup(ctx); } void aio_context_destroy(AioContext *ctx) { -#ifdef CONFIG_EPOLL_CREATE1 - aio_epoll_disable(ctx); -#endif + fdmon_epoll_disable(ctx); } void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns, diff --git a/util/aio-posix.h b/util/aio-posix.h new file mode 100644 index 0000000000..97899d0fbc --- /dev/null +++ b/util/aio-posix.h @@ -0,0 +1,61 @@ +/* + * AioContext POSIX event loop implementation internal APIs + * + * Copyright IBM, Corp. 2008 + * Copyright Red Hat, Inc. 2020 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ + +#ifndef AIO_POSIX_H +#define AIO_POSIX_H + +#include "block/aio.h" + +struct AioHandler { + GPollFD pfd; + IOHandler *io_read; + IOHandler *io_write; + AioPollFn *io_poll; + IOHandler *io_poll_begin; + IOHandler *io_poll_end; + void *opaque; + bool is_external; + QLIST_ENTRY(AioHandler) node; + QLIST_ENTRY(AioHandler) node_ready; /* only used during aio_poll() */ + QLIST_ENTRY(AioHandler) node_deleted; +}; + +/* Add a handler to a ready list */ +void aio_add_ready_handler(AioHandlerList *ready_list, AioHandler *node, + int revents); + +extern const FDMonOps fdmon_poll_ops; + +#ifdef CONFIG_EPOLL_CREATE1 +bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd); +void fdmon_epoll_setup(AioContext *ctx); +void fdmon_epoll_disable(AioContext *ctx); +#else +static inline bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd) +{ + return false; +} + +static inline void fdmon_epoll_setup(AioContext *ctx) +{ +} + +static inline void fdmon_epoll_disable(AioContext *ctx) +{ +} +#endif /* !CONFIG_EPOLL_CREATE1 */ + +#endif /* AIO_POSIX_H */ diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c new file mode 100644 index 0000000000..29c1454469 --- /dev/null +++ b/util/fdmon-epoll.c @@ -0,0 +1,151 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * epoll(7) file descriptor monitoring + */ + +#include "qemu/osdep.h" +#include +#include "qemu/rcu_queue.h" +#include "aio-posix.h" + +/* The fd number threshold to switch to epoll */ +#define EPOLL_ENABLE_THRESHOLD 64 + +void fdmon_epoll_disable(AioContext *ctx) +{ + if (ctx->epollfd >= 0) { + close(ctx->epollfd); + ctx->epollfd = -1; + } + + /* Switch back */ + ctx->fdmon_ops = &fdmon_poll_ops; +} + +static inline int epoll_events_from_pfd(int pfd_events) +{ + return (pfd_events & G_IO_IN ? EPOLLIN : 0) | + (pfd_events & G_IO_OUT ? EPOLLOUT : 0) | + (pfd_events & G_IO_HUP ? EPOLLHUP : 0) | + (pfd_events & G_IO_ERR ? EPOLLERR : 0); +} + +static void fdmon_epoll_update(AioContext *ctx, AioHandler *node, bool is_new) +{ + struct epoll_event event; + int r; + int ctl; + + if (!node->pfd.events) { + ctl = EPOLL_CTL_DEL; + } else { + event.data.ptr = node; + event.events = epoll_events_from_pfd(node->pfd.events); + ctl = is_new ? EPOLL_CTL_ADD : EPOLL_CTL_MOD; + } + + r = epoll_ctl(ctx->epollfd, ctl, node->pfd.fd, &event); + if (r) { + fdmon_epoll_disable(ctx); + } +} + +static int fdmon_epoll_wait(AioContext *ctx, AioHandlerList *ready_list, + int64_t timeout) +{ + GPollFD pfd = { + .fd = ctx->epollfd, + .events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR, + }; + AioHandler *node; + int i, ret = 0; + struct epoll_event events[128]; + + /* Fall back while external clients are disabled */ + if (atomic_read(&ctx->external_disable_cnt)) { + return fdmon_poll_ops.wait(ctx, ready_list, timeout); + } + + if (timeout > 0) { + ret = qemu_poll_ns(&pfd, 1, timeout); + if (ret > 0) { + timeout = 0; + } + } + if (timeout <= 0 || ret > 0) { + ret = epoll_wait(ctx->epollfd, events, + ARRAY_SIZE(events), + timeout); + if (ret <= 0) { + goto out; + } + for (i = 0; i < ret; i++) { + int ev = events[i].events; + int revents = (ev & EPOLLIN ? G_IO_IN : 0) | + (ev & EPOLLOUT ? G_IO_OUT : 0) | + (ev & EPOLLHUP ? G_IO_HUP : 0) | + (ev & EPOLLERR ? G_IO_ERR : 0); + + node = events[i].data.ptr; + aio_add_ready_handler(ready_list, node, revents); + } + } +out: + return ret; +} + +static const FDMonOps fdmon_epoll_ops = { + .update = fdmon_epoll_update, + .wait = fdmon_epoll_wait, +}; + +static bool fdmon_epoll_try_enable(AioContext *ctx) +{ + AioHandler *node; + struct epoll_event event; + + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { + int r; + if (QLIST_IS_INSERTED(node, node_deleted) || !node->pfd.events) { + continue; + } + event.events = epoll_events_from_pfd(node->pfd.events); + event.data.ptr = node; + r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event); + if (r) { + return false; + } + } + + ctx->fdmon_ops = &fdmon_epoll_ops; + return true; +} + +bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd) +{ + if (ctx->epollfd < 0) { + return false; + } + + /* Do not upgrade while external clients are disabled */ + if (atomic_read(&ctx->external_disable_cnt)) { + return false; + } + + if (npfd >= EPOLL_ENABLE_THRESHOLD) { + if (fdmon_epoll_try_enable(ctx)) { + return true; + } else { + fdmon_epoll_disable(ctx); + } + } + return false; +} + +void fdmon_epoll_setup(AioContext *ctx) +{ + ctx->epollfd = epoll_create1(EPOLL_CLOEXEC); + if (ctx->epollfd == -1) { + fprintf(stderr, "Failed to create epoll instance: %s", strerror(errno)); + } +} diff --git a/util/fdmon-poll.c b/util/fdmon-poll.c new file mode 100644 index 0000000000..67992116b8 --- /dev/null +++ b/util/fdmon-poll.c @@ -0,0 +1,104 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * poll(2) file descriptor monitoring + * + * Uses ppoll(2) when available, g_poll() otherwise. + */ + +#include "qemu/osdep.h" +#include "aio-posix.h" +#include "qemu/rcu_queue.h" + +/* + * These thread-local variables are used only in fdmon_poll_wait() around the + * call to the poll() system call. In particular they are not used while + * aio_poll is performing callbacks, which makes it much easier to think about + * reentrancy! + * + * Stack-allocated arrays would be perfect but they have size limitations; + * heap allocation is expensive enough that we want to reuse arrays across + * calls to aio_poll(). And because poll() has to be called without holding + * any lock, the arrays cannot be stored in AioContext. Thread-local data + * has none of the disadvantages of these three options. + */ +static __thread GPollFD *pollfds; +static __thread AioHandler **nodes; +static __thread unsigned npfd, nalloc; +static __thread Notifier pollfds_cleanup_notifier; + +static void pollfds_cleanup(Notifier *n, void *unused) +{ + g_assert(npfd == 0); + g_free(pollfds); + g_free(nodes); + nalloc = 0; +} + +static void add_pollfd(AioHandler *node) +{ + if (npfd == nalloc) { + if (nalloc == 0) { + pollfds_cleanup_notifier.notify = pollfds_cleanup; + qemu_thread_atexit_add(&pollfds_cleanup_notifier); + nalloc = 8; + } else { + g_assert(nalloc <= INT_MAX); + nalloc *= 2; + } + pollfds = g_renew(GPollFD, pollfds, nalloc); + nodes = g_renew(AioHandler *, nodes, nalloc); + } + nodes[npfd] = node; + pollfds[npfd] = (GPollFD) { + .fd = node->pfd.fd, + .events = node->pfd.events, + }; + npfd++; +} + +static int fdmon_poll_wait(AioContext *ctx, AioHandlerList *ready_list, + int64_t timeout) +{ + AioHandler *node; + int ret; + + assert(npfd == 0); + + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { + if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events + && aio_node_check(ctx, node->is_external)) { + add_pollfd(node); + } + } + + /* epoll(7) is faster above a certain number of fds */ + if (fdmon_epoll_try_upgrade(ctx, npfd)) { + return ctx->fdmon_ops->wait(ctx, ready_list, timeout); + } + + ret = qemu_poll_ns(pollfds, npfd, timeout); + if (ret > 0) { + int i; + + for (i = 0; i < npfd; i++) { + int revents = pollfds[i].revents; + + if (revents) { + aio_add_ready_handler(ready_list, nodes[i], revents); + } + } + } + + npfd = 0; + return ret; +} + +static void fdmon_poll_update(AioContext *ctx, AioHandler *node, bool is_new) +{ + /* Do nothing, AioHandler already contains the state we'll need */ +} + +const FDMonOps fdmon_poll_ops = { + .update = fdmon_poll_update, + .wait = fdmon_poll_wait, +}; From patchwork Wed Mar 11 12:40:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15B8F924 for ; Wed, 11 Mar 2020 12:42:12 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DF895208E4 for ; Wed, 11 Mar 2020 12:42:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gtqXph4I" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF895208E4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51138 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0gc-0000H5-W4 for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:42:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46984) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0fs-0006yP-FG for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0fq-0000It-6Z for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:23 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:46452 helo=us-smtp-delivery-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0fq-0000Ih-3c for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930481; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZaawQ9CCgw9+6/uP2kS2YGY8JFyJ+jcvITPBVKrP3F4=; b=gtqXph4IwPmRBSjwm8l75JcsYhF22hJa94F8mFJGlwL9uSCDk+9DZ7LP9ihrWIvGGqeGpP O6NHNg8/pg6YP1IPUKxsiMwkHtq3O4CngqjKn7ggnhuLR7H/yWVL/K+qDTNzIytUnOFhSE E/Gzlb7NNSIcHMMA4ZKkeQKNgKpo+OA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-167-DNLYj92pMkOEAp924vfxXw-1; Wed, 11 Mar 2020 08:41:18 -0400 X-MC-Unique: DNLYj92pMkOEAp924vfxXw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2E660100550E; Wed, 11 Mar 2020 12:41:16 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id B6A4D10013A1; Wed, 11 Mar 2020 12:41:15 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 6/9] aio-posix: simplify FDMonOps->update() prototype Date: Wed, 11 Mar 2020 12:40:42 +0000 Message-Id: <20200311124045.277969-7-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.81 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" The AioHandler *node, bool is_new arguments are more complicated to think about than simply being given AioHandler *old_node, AioHandler *new_node. Furthermore, the new Linux io_uring file descriptor monitoring mechanism added by the new patch requires access to both the old and the new nodes. Make this change now in preparation. Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com Message-Id: <20200305170806.1313245-5-stefanha@redhat.com> --- include/block/aio.h | 13 ++++++------- util/aio-posix.c | 7 +------ util/fdmon-epoll.c | 21 ++++++++++++--------- util/fdmon-poll.c | 4 +++- 4 files changed, 22 insertions(+), 23 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index 90e07d7507..bd76b08f1a 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -57,17 +57,16 @@ typedef struct { /* * update: * @ctx: the AioContext - * @node: the handler - * @is_new: is the file descriptor already being monitored? + * @old_node: the existing handler or NULL if this file descriptor is being + * monitored for the first time + * @new_node: the new handler or NULL if this file descriptor is being + * removed * - * Add/remove/modify a monitored file descriptor. There are three cases: - * 1. node->pfd.events == 0 means remove the file descriptor. - * 2. !is_new means modify an already monitored file descriptor. - * 3. is_new means add a new file descriptor. + * Add/remove/modify a monitored file descriptor. * * Called with ctx->list_lock acquired. */ - void (*update)(AioContext *ctx, AioHandler *node, bool is_new); + void (*update)(AioContext *ctx, AioHandler *old_node, AioHandler *new_node); /* * wait: diff --git a/util/aio-posix.c b/util/aio-posix.c index bc0b86547c..028b2abded 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -139,12 +139,7 @@ void aio_set_fd_handler(AioContext *ctx, atomic_set(&ctx->poll_disable_cnt, atomic_read(&ctx->poll_disable_cnt) + poll_disable_change); - if (new_node) { - ctx->fdmon_ops->update(ctx, new_node, is_new); - } else if (node) { - /* Unregister deleted fd_handler */ - ctx->fdmon_ops->update(ctx, node, false); - } + ctx->fdmon_ops->update(ctx, node, new_node); qemu_lockcnt_unlock(&ctx->list_lock); aio_notify(ctx); diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c index 29c1454469..d56b69468b 100644 --- a/util/fdmon-epoll.c +++ b/util/fdmon-epoll.c @@ -30,21 +30,24 @@ static inline int epoll_events_from_pfd(int pfd_events) (pfd_events & G_IO_ERR ? EPOLLERR : 0); } -static void fdmon_epoll_update(AioContext *ctx, AioHandler *node, bool is_new) +static void fdmon_epoll_update(AioContext *ctx, + AioHandler *old_node, + AioHandler *new_node) { - struct epoll_event event; + struct epoll_event event = { + .data.ptr = new_node, + .events = new_node ? epoll_events_from_pfd(new_node->pfd.events) : 0, + }; int r; - int ctl; - if (!node->pfd.events) { - ctl = EPOLL_CTL_DEL; + if (!new_node) { + r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, old_node->pfd.fd, &event); + } else if (!old_node) { + r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, new_node->pfd.fd, &event); } else { - event.data.ptr = node; - event.events = epoll_events_from_pfd(node->pfd.events); - ctl = is_new ? EPOLL_CTL_ADD : EPOLL_CTL_MOD; + r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, new_node->pfd.fd, &event); } - r = epoll_ctl(ctx->epollfd, ctl, node->pfd.fd, &event); if (r) { fdmon_epoll_disable(ctx); } diff --git a/util/fdmon-poll.c b/util/fdmon-poll.c index 67992116b8..28114a0f39 100644 --- a/util/fdmon-poll.c +++ b/util/fdmon-poll.c @@ -93,7 +93,9 @@ static int fdmon_poll_wait(AioContext *ctx, AioHandlerList *ready_list, return ret; } -static void fdmon_poll_update(AioContext *ctx, AioHandler *node, bool is_new) +static void fdmon_poll_update(AioContext *ctx, + AioHandler *old_node, + AioHandler *new_node) { /* Do nothing, AioHandler already contains the state we'll need */ } From patchwork Wed Mar 11 12:40:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED5171580 for ; Wed, 11 Mar 2020 12:45:03 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B2A7D208E4 for ; Wed, 11 Mar 2020 12:45:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PfEMMyVI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B2A7D208E4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51205 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0jO-0005Ww-SI for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:45:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47035) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0fz-0007ED-J6 for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0fw-0000KV-QJ for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:31 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:42750 helo=us-smtp-delivery-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0fw-0000KN-L5 for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930488; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZqSw75N9uxdR0L3t48kV/I6fpFhsCTrcP8cU52szmr8=; b=PfEMMyVII0ywZxNC5hPybvvHtaUEHjqg8Xai7NJruXIMRCO8srp1kEide+ZqlLSdgT7rQ9 vZrHFRs1cwx6LvJYxZIog3yaaTaZVaZzKpMVCfrUVoVPPixFfYPQLJwoLBOHz3f6QSokhT 9LId3JnmcGQJvCxWGZ7Q6K9kJYCiMeA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-166-5TjmVCLtN-SvWtUmLHWIhg-1; Wed, 11 Mar 2020 08:41:22 -0400 X-MC-Unique: 5TjmVCLtN-SvWtUmLHWIhg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E1A7E8017CC; Wed, 11 Mar 2020 12:41:20 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id 90A885C1D4; Wed, 11 Mar 2020 12:41:17 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 7/9] aio-posix: add io_uring fd monitoring implementation Date: Wed, 11 Mar 2020 12:40:43 +0000 Message-Id: <20200311124045.277969-8-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 205.139.110.61 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" The recent Linux io_uring API has several advantages over ppoll(2) and epoll(2). Details are given in the source code. Add an io_uring implementation and make it the default on Linux. Performance is the same as with epoll(7) but later patches add optimizations that take advantage of io_uring. It is necessary to change how aio_set_fd_handler() deals with deleting AioHandlers since removing monitored file descriptors is asynchronous in io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and aio_set_fd_handler() will let it handle deletion in that case. Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com Message-Id: <20200305170806.1313245-6-stefanha@redhat.com> --- configure | 5 + include/block/aio.h | 9 ++ util/Makefile.objs | 1 + util/aio-posix.c | 20 ++- util/aio-posix.h | 20 ++- util/fdmon-io_uring.c | 326 ++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 376 insertions(+), 5 deletions(-) create mode 100644 util/fdmon-io_uring.c diff --git a/configure b/configure index cbf864bff1..3c7470096f 100755 --- a/configure +++ b/configure @@ -4093,6 +4093,11 @@ if test "$linux_io_uring" != "no" ; then linux_io_uring_cflags=$($pkg_config --cflags liburing) linux_io_uring_libs=$($pkg_config --libs liburing) linux_io_uring=yes + + # io_uring is used in libqemuutil.a where per-file -libs variables are not + # seen by programs linking the archive. It's not ideal, but just add the + # library dependency globally. + LIBS="$linux_io_uring_libs $LIBS" else if test "$linux_io_uring" = "yes" ; then feature_not_found "linux io_uring" "Install liburing devel" diff --git a/include/block/aio.h b/include/block/aio.h index bd76b08f1a..83fc9b844d 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -14,6 +14,9 @@ #ifndef QEMU_AIO_H #define QEMU_AIO_H +#ifdef CONFIG_LINUX_IO_URING +#include +#endif #include "qemu/queue.h" #include "qemu/event_notifier.h" #include "qemu/thread.h" @@ -96,6 +99,8 @@ struct BHListSlice { QSIMPLEQ_ENTRY(BHListSlice) next; }; +typedef QSLIST_HEAD(, AioHandler) AioHandlerSList; + struct AioContext { GSource source; @@ -181,6 +186,10 @@ struct AioContext { * locking. */ struct LuringState *linux_io_uring; + + /* State for file descriptor monitoring using Linux io_uring */ + struct io_uring fdmon_io_uring; + AioHandlerSList submit_list; #endif /* TimerLists for calling timers - one per clock type. Has its own diff --git a/util/Makefile.objs b/util/Makefile.objs index 6439077a68..6718a38b61 100644 --- a/util/Makefile.objs +++ b/util/Makefile.objs @@ -7,6 +7,7 @@ util-obj-$(call lnot,$(CONFIG_ATOMIC64)) += atomic64.o util-obj-$(CONFIG_POSIX) += aio-posix.o util-obj-$(CONFIG_POSIX) += fdmon-poll.o util-obj-$(CONFIG_EPOLL_CREATE1) += fdmon-epoll.o +util-obj-$(CONFIG_LINUX_IO_URING) += fdmon-io_uring.o util-obj-$(CONFIG_POSIX) += compatfd.o util-obj-$(CONFIG_POSIX) += event_notifier-posix.o util-obj-$(CONFIG_POSIX) += mmap-alloc.o diff --git a/util/aio-posix.c b/util/aio-posix.c index 028b2abded..ffd9cc381b 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -57,10 +57,16 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node) g_source_remove_poll(&ctx->source, &node->pfd); } + node->pfd.revents = 0; + + /* If the fd monitor has already marked it deleted, leave it alone */ + if (QLIST_IS_INSERTED(node, node_deleted)) { + return false; + } + /* If a read is in progress, just mark the node as deleted */ if (qemu_lockcnt_count(&ctx->list_lock)) { QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted); - node->pfd.revents = 0; return false; } /* Otherwise, delete it for real. We can't just mark it as @@ -126,9 +132,6 @@ void aio_set_fd_handler(AioContext *ctx, QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, new_node, node); } - if (node) { - deleted = aio_remove_fd_handler(ctx, node); - } /* No need to order poll_disable_cnt writes against other updates; * the counter is only used to avoid wasting time and latency on @@ -140,6 +143,9 @@ void aio_set_fd_handler(AioContext *ctx, atomic_read(&ctx->poll_disable_cnt) + poll_disable_change); ctx->fdmon_ops->update(ctx, node, new_node); + if (node) { + deleted = aio_remove_fd_handler(ctx, node); + } qemu_lockcnt_unlock(&ctx->list_lock); aio_notify(ctx); @@ -565,11 +571,17 @@ void aio_context_setup(AioContext *ctx) ctx->fdmon_ops = &fdmon_poll_ops; ctx->epollfd = -1; + /* Use the fastest fd monitoring implementation if available */ + if (fdmon_io_uring_setup(ctx)) { + return; + } + fdmon_epoll_setup(ctx); } void aio_context_destroy(AioContext *ctx) { + fdmon_io_uring_destroy(ctx); fdmon_epoll_disable(ctx); } diff --git a/util/aio-posix.h b/util/aio-posix.h index 97899d0fbc..55fc771327 100644 --- a/util/aio-posix.h +++ b/util/aio-posix.h @@ -27,10 +27,14 @@ struct AioHandler { IOHandler *io_poll_begin; IOHandler *io_poll_end; void *opaque; - bool is_external; QLIST_ENTRY(AioHandler) node; QLIST_ENTRY(AioHandler) node_ready; /* only used during aio_poll() */ QLIST_ENTRY(AioHandler) node_deleted; +#ifdef CONFIG_LINUX_IO_URING + QSLIST_ENTRY(AioHandler) node_submitted; + unsigned flags; /* see fdmon-io_uring.c */ +#endif + bool is_external; }; /* Add a handler to a ready list */ @@ -58,4 +62,18 @@ static inline void fdmon_epoll_disable(AioContext *ctx) } #endif /* !CONFIG_EPOLL_CREATE1 */ +#ifdef CONFIG_LINUX_IO_URING +bool fdmon_io_uring_setup(AioContext *ctx); +void fdmon_io_uring_destroy(AioContext *ctx); +#else +static inline bool fdmon_io_uring_setup(AioContext *ctx) +{ + return false; +} + +static inline void fdmon_io_uring_destroy(AioContext *ctx) +{ +} +#endif /* !CONFIG_LINUX_IO_URING */ + #endif /* AIO_POSIX_H */ diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c new file mode 100644 index 0000000000..fb99b4b61e --- /dev/null +++ b/util/fdmon-io_uring.c @@ -0,0 +1,326 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Linux io_uring file descriptor monitoring + * + * The Linux io_uring API supports file descriptor monitoring with a few + * advantages over existing APIs like poll(2) and epoll(7): + * + * 1. Userspace polling of events is possible because the completion queue (cq + * ring) is shared between the kernel and userspace. This allows + * applications that rely on userspace polling to also monitor file + * descriptors in the same userspace polling loop. + * + * 2. Submission and completion is batched and done together in a single system + * call. This minimizes the number of system calls. + * + * 3. File descriptor monitoring is O(1) like epoll(7) so it scales better than + * poll(2). + * + * 4. Nanosecond timeouts are supported so it requires fewer syscalls than + * epoll(7). + * + * This code only monitors file descriptors and does not do asynchronous disk + * I/O. Implementing disk I/O efficiently has other requirements and should + * use a separate io_uring so it does not make sense to unify the code. + * + * File descriptor monitoring is implemented using the following operations: + * + * 1. IORING_OP_POLL_ADD - adds a file descriptor to be monitored. + * 2. IORING_OP_POLL_REMOVE - removes a file descriptor being monitored. When + * the poll mask changes for a file descriptor it is first removed and then + * re-added with the new poll mask, so this operation is also used as part + * of modifying an existing monitored file descriptor. + * 3. IORING_OP_TIMEOUT - added every time a blocking syscall is made to wait + * for events. This operation self-cancels if another event completes + * before the timeout. + * + * io_uring calls the submission queue the "sq ring" and the completion queue + * the "cq ring". Ring entries are called "sqe" and "cqe", respectively. + * + * The code is structured so that sq/cq rings are only modified within + * fdmon_io_uring_wait(). Changes to AioHandlers are made by enqueuing them on + * ctx->submit_list so that fdmon_io_uring_wait() can submit IORING_OP_POLL_ADD + * and/or IORING_OP_POLL_REMOVE sqes for them. + */ + +#include "qemu/osdep.h" +#include +#include "qemu/rcu_queue.h" +#include "aio-posix.h" + +enum { + FDMON_IO_URING_ENTRIES = 128, /* sq/cq ring size */ + + /* AioHandler::flags */ + FDMON_IO_URING_PENDING = (1 << 0), + FDMON_IO_URING_ADD = (1 << 1), + FDMON_IO_URING_REMOVE = (1 << 2), +}; + +static inline int poll_events_from_pfd(int pfd_events) +{ + return (pfd_events & G_IO_IN ? POLLIN : 0) | + (pfd_events & G_IO_OUT ? POLLOUT : 0) | + (pfd_events & G_IO_HUP ? POLLHUP : 0) | + (pfd_events & G_IO_ERR ? POLLERR : 0); +} + +static inline int pfd_events_from_poll(int poll_events) +{ + return (poll_events & POLLIN ? G_IO_IN : 0) | + (poll_events & POLLOUT ? G_IO_OUT : 0) | + (poll_events & POLLHUP ? G_IO_HUP : 0) | + (poll_events & POLLERR ? G_IO_ERR : 0); +} + +/* + * Returns an sqe for submitting a request. Only be called within + * fdmon_io_uring_wait(). + */ +static struct io_uring_sqe *get_sqe(AioContext *ctx) +{ + struct io_uring *ring = &ctx->fdmon_io_uring; + struct io_uring_sqe *sqe = io_uring_get_sqe(ring); + int ret; + + if (likely(sqe)) { + return sqe; + } + + /* No free sqes left, submit pending sqes first */ + ret = io_uring_submit(ring); + assert(ret > 1); + sqe = io_uring_get_sqe(ring); + assert(sqe); + return sqe; +} + +/* Atomically enqueue an AioHandler for sq ring submission */ +static void enqueue(AioHandlerSList *head, AioHandler *node, unsigned flags) +{ + unsigned old_flags; + + old_flags = atomic_fetch_or(&node->flags, FDMON_IO_URING_PENDING | flags); + if (!(old_flags & FDMON_IO_URING_PENDING)) { + QSLIST_INSERT_HEAD_ATOMIC(head, node, node_submitted); + } +} + +/* Dequeue an AioHandler for sq ring submission. Called by fill_sq_ring(). */ +static AioHandler *dequeue(AioHandlerSList *head, unsigned *flags) +{ + AioHandler *node = QSLIST_FIRST(head); + + if (!node) { + return NULL; + } + + /* Doesn't need to be atomic since fill_sq_ring() moves the list */ + QSLIST_REMOVE_HEAD(head, node_submitted); + + /* + * Don't clear FDMON_IO_URING_REMOVE. It's sticky so it can serve two + * purposes: telling fill_sq_ring() to submit IORING_OP_POLL_REMOVE and + * telling process_cqe() to delete the AioHandler when its + * IORING_OP_POLL_ADD completes. + */ + *flags = atomic_fetch_and(&node->flags, ~(FDMON_IO_URING_PENDING | + FDMON_IO_URING_ADD)); + return node; +} + +static void fdmon_io_uring_update(AioContext *ctx, + AioHandler *old_node, + AioHandler *new_node) +{ + if (new_node) { + enqueue(&ctx->submit_list, new_node, FDMON_IO_URING_ADD); + } + + if (old_node) { + /* + * Deletion is tricky because IORING_OP_POLL_ADD and + * IORING_OP_POLL_REMOVE are async. We need to wait for the original + * IORING_OP_POLL_ADD to complete before this handler can be freed + * safely. + * + * It's possible that the file descriptor becomes ready and the + * IORING_OP_POLL_ADD cqe is enqueued before IORING_OP_POLL_REMOVE is + * submitted, too. + * + * Mark this handler deleted right now but don't place it on + * ctx->deleted_aio_handlers yet. Instead, manually fudge the list + * entry to make QLIST_IS_INSERTED() think this handler has been + * inserted and other code recognizes this AioHandler as deleted. + * + * Once the original IORING_OP_POLL_ADD completes we enqueue the + * handler on the real ctx->deleted_aio_handlers list to be freed. + */ + assert(!QLIST_IS_INSERTED(old_node, node_deleted)); + old_node->node_deleted.le_prev = &old_node->node_deleted.le_next; + + enqueue(&ctx->submit_list, old_node, FDMON_IO_URING_REMOVE); + } +} + +static void add_poll_add_sqe(AioContext *ctx, AioHandler *node) +{ + struct io_uring_sqe *sqe = get_sqe(ctx); + int events = poll_events_from_pfd(node->pfd.events); + + io_uring_prep_poll_add(sqe, node->pfd.fd, events); + io_uring_sqe_set_data(sqe, node); +} + +static void add_poll_remove_sqe(AioContext *ctx, AioHandler *node) +{ + struct io_uring_sqe *sqe = get_sqe(ctx); + + io_uring_prep_poll_remove(sqe, node); +} + +/* Add a timeout that self-cancels when another cqe becomes ready */ +static void add_timeout_sqe(AioContext *ctx, int64_t ns) +{ + struct io_uring_sqe *sqe; + struct __kernel_timespec ts = { + .tv_sec = ns / NANOSECONDS_PER_SECOND, + .tv_nsec = ns % NANOSECONDS_PER_SECOND, + }; + + sqe = get_sqe(ctx); + io_uring_prep_timeout(sqe, &ts, 1, 0); +} + +/* Add sqes from ctx->submit_list for submission */ +static void fill_sq_ring(AioContext *ctx) +{ + AioHandlerSList submit_list; + AioHandler *node; + unsigned flags; + + QSLIST_MOVE_ATOMIC(&submit_list, &ctx->submit_list); + + while ((node = dequeue(&submit_list, &flags))) { + /* Order matters, just in case both flags were set */ + if (flags & FDMON_IO_URING_ADD) { + add_poll_add_sqe(ctx, node); + } + if (flags & FDMON_IO_URING_REMOVE) { + add_poll_remove_sqe(ctx, node); + } + } +} + +/* Returns true if a handler became ready */ +static bool process_cqe(AioContext *ctx, + AioHandlerList *ready_list, + struct io_uring_cqe *cqe) +{ + AioHandler *node = io_uring_cqe_get_data(cqe); + unsigned flags; + + /* poll_timeout and poll_remove have a zero user_data field */ + if (!node) { + return false; + } + + /* + * Deletion can only happen when IORING_OP_POLL_ADD completes. If we race + * with enqueue() here then we can safely clear the FDMON_IO_URING_REMOVE + * bit before IORING_OP_POLL_REMOVE is submitted. + */ + flags = atomic_fetch_and(&node->flags, ~FDMON_IO_URING_REMOVE); + if (flags & FDMON_IO_URING_REMOVE) { + QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted); + return false; + } + + aio_add_ready_handler(ready_list, node, pfd_events_from_poll(cqe->res)); + + /* IORING_OP_POLL_ADD is one-shot so we must re-arm it */ + add_poll_add_sqe(ctx, node); + return true; +} + +static int process_cq_ring(AioContext *ctx, AioHandlerList *ready_list) +{ + struct io_uring *ring = &ctx->fdmon_io_uring; + struct io_uring_cqe *cqe; + unsigned num_cqes = 0; + unsigned num_ready = 0; + unsigned head; + + io_uring_for_each_cqe(ring, head, cqe) { + if (process_cqe(ctx, ready_list, cqe)) { + num_ready++; + } + + num_cqes++; + } + + io_uring_cq_advance(ring, num_cqes); + return num_ready; +} + +static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list, + int64_t timeout) +{ + unsigned wait_nr = 1; /* block until at least one cqe is ready */ + int ret; + + /* Fall back while external clients are disabled */ + if (atomic_read(&ctx->external_disable_cnt)) { + return fdmon_poll_ops.wait(ctx, ready_list, timeout); + } + + if (timeout == 0) { + wait_nr = 0; /* non-blocking */ + } else if (timeout > 0) { + add_timeout_sqe(ctx, timeout); + } + + fill_sq_ring(ctx); + + ret = io_uring_submit_and_wait(&ctx->fdmon_io_uring, wait_nr); + assert(ret >= 0); + + return process_cq_ring(ctx, ready_list); +} + +static const FDMonOps fdmon_io_uring_ops = { + .update = fdmon_io_uring_update, + .wait = fdmon_io_uring_wait, +}; + +bool fdmon_io_uring_setup(AioContext *ctx) +{ + int ret; + + ret = io_uring_queue_init(FDMON_IO_URING_ENTRIES, &ctx->fdmon_io_uring, 0); + if (ret != 0) { + return false; + } + + QSLIST_INIT(&ctx->submit_list); + ctx->fdmon_ops = &fdmon_io_uring_ops; + return true; +} + +void fdmon_io_uring_destroy(AioContext *ctx) +{ + if (ctx->fdmon_ops == &fdmon_io_uring_ops) { + AioHandler *node; + + io_uring_queue_exit(&ctx->fdmon_io_uring); + + /* No need to submit these anymore, just free them. */ + while ((node = QSLIST_FIRST_RCU(&ctx->submit_list))) { + QSLIST_REMOVE_HEAD_RCU(&ctx->submit_list, node_submitted); + QLIST_REMOVE(node, node); + g_free(node); + } + + ctx->fdmon_ops = &fdmon_poll_ops; + } +} From patchwork Wed Mar 11 12:40:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431605 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 41858924 for ; Wed, 11 Mar 2020 12:45:44 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1911E208E4 for ; Wed, 11 Mar 2020 12:45:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="N3bT0Hd3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1911E208E4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51212 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0k3-0006Ex-Aq for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:45:43 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47052) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0g2-0007Lj-CF for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:35 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0g0-0000Mr-Ss for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:34 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:30832 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0g0-0000Ml-Og for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930492; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UOPPWZZHASZTVyNhgRUoBjPgRsVz4P/bYh4pQhKe7FM=; b=N3bT0Hd3y081TYA7WtInIVxor9JXbvPHs1MK7aRftsDCTMAV+I2Eb8s/8yKVvuxr3GzjX9 kcPPTRDqdZH8/1FVdWTrNzMYdT21utg8CopCfi9gvbPKhGzzBFcqeRnMoY0J5cpWMi93m/ tSI5EEu5j2XbUeZJymmRZ82cZvV3O4k= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-310-giofkviPMti6e68MzRXybA-1; Wed, 11 Mar 2020 08:41:26 -0400 X-MC-Unique: giofkviPMti6e68MzRXybA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 63A20801E66; Wed, 11 Mar 2020 12:41:25 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id 494D373880; Wed, 11 Mar 2020 12:41:22 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 8/9] aio-posix: support userspace polling of fd monitoring Date: Wed, 11 Mar 2020 12:40:44 +0000 Message-Id: <20200311124045.277969-9-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 205.139.110.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled from userspace. Previously userspace polling was only allowed when all AioHandler's had an ->io_poll() callback. This prevented starvation of fds by userspace pollable handlers. Add the FDMonOps->need_wait() callback that enables userspace polling even when some AioHandlers lack ->io_poll(). For example, it's now possible to do userspace polling when a TCP/IP socket is monitored thanks to Linux io_uring. Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com Message-Id: <20200305170806.1313245-7-stefanha@redhat.com> --- include/block/aio.h | 19 +++++++++++++++++++ util/aio-posix.c | 11 ++++++++--- util/fdmon-epoll.c | 1 + util/fdmon-io_uring.c | 6 ++++++ util/fdmon-poll.c | 1 + 5 files changed, 35 insertions(+), 3 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index 83fc9b844d..f07ebb76b8 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -55,6 +55,9 @@ struct ThreadPool; struct LinuxAioState; struct LuringState; +/* Is polling disabled? */ +bool aio_poll_disabled(AioContext *ctx); + /* Callbacks for file descriptor monitoring implementations */ typedef struct { /* @@ -84,6 +87,22 @@ typedef struct { * Returns: number of ready file descriptors. */ int (*wait)(AioContext *ctx, AioHandlerList *ready_list, int64_t timeout); + + /* + * need_wait: + * @ctx: the AioContext + * + * Tell aio_poll() when to stop userspace polling early because ->wait() + * has fds ready. + * + * File descriptor monitoring implementations that cannot poll fd readiness + * from userspace should use aio_poll_disabled() here. This ensures that + * file descriptors are not starved by handlers that frequently make + * progress via userspace polling. + * + * Returns: true if ->wait() should be called, false otherwise. + */ + bool (*need_wait)(AioContext *ctx); } FDMonOps; /* diff --git a/util/aio-posix.c b/util/aio-posix.c index ffd9cc381b..759989b45b 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -22,6 +22,11 @@ #include "trace.h" #include "aio-posix.h" +bool aio_poll_disabled(AioContext *ctx) +{ + return atomic_read(&ctx->poll_disable_cnt); +} + void aio_add_ready_handler(AioHandlerList *ready_list, AioHandler *node, int revents) @@ -423,7 +428,7 @@ static bool run_poll_handlers(AioContext *ctx, int64_t max_ns, int64_t *timeout) elapsed_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time; max_ns = qemu_soonest_timeout(*timeout, max_ns); assert(!(max_ns && progress)); - } while (elapsed_time < max_ns && !atomic_read(&ctx->poll_disable_cnt)); + } while (elapsed_time < max_ns && !ctx->fdmon_ops->need_wait(ctx)); /* If time has passed with no successful polling, adjust *timeout to * keep the same ending time. @@ -451,7 +456,7 @@ static bool try_poll_mode(AioContext *ctx, int64_t *timeout) { int64_t max_ns = qemu_soonest_timeout(*timeout, ctx->poll_ns); - if (max_ns && !atomic_read(&ctx->poll_disable_cnt)) { + if (max_ns && !ctx->fdmon_ops->need_wait(ctx)) { poll_set_started(ctx, true); if (run_poll_handlers(ctx, max_ns, timeout)) { @@ -501,7 +506,7 @@ bool aio_poll(AioContext *ctx, bool blocking) /* If polling is allowed, non-blocking aio_poll does not need the * system call---a single round of run_poll_handlers_once suffices. */ - if (timeout || atomic_read(&ctx->poll_disable_cnt)) { + if (timeout || ctx->fdmon_ops->need_wait(ctx)) { ret = ctx->fdmon_ops->wait(ctx, &ready_list, timeout); } diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c index d56b69468b..fcd989d47d 100644 --- a/util/fdmon-epoll.c +++ b/util/fdmon-epoll.c @@ -100,6 +100,7 @@ out: static const FDMonOps fdmon_epoll_ops = { .update = fdmon_epoll_update, .wait = fdmon_epoll_wait, + .need_wait = aio_poll_disabled, }; static bool fdmon_epoll_try_enable(AioContext *ctx) diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index fb99b4b61e..893b79b622 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -288,9 +288,15 @@ static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list, return process_cq_ring(ctx, ready_list); } +static bool fdmon_io_uring_need_wait(AioContext *ctx) +{ + return io_uring_cq_ready(&ctx->fdmon_io_uring); +} + static const FDMonOps fdmon_io_uring_ops = { .update = fdmon_io_uring_update, .wait = fdmon_io_uring_wait, + .need_wait = fdmon_io_uring_need_wait, }; bool fdmon_io_uring_setup(AioContext *ctx) diff --git a/util/fdmon-poll.c b/util/fdmon-poll.c index 28114a0f39..488067b679 100644 --- a/util/fdmon-poll.c +++ b/util/fdmon-poll.c @@ -103,4 +103,5 @@ static void fdmon_poll_update(AioContext *ctx, const FDMonOps fdmon_poll_ops = { .update = fdmon_poll_update, .wait = fdmon_poll_wait, + .need_wait = aio_poll_disabled, }; From patchwork Wed Mar 11 12:40:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 11431621 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B02A6924 for ; Wed, 11 Mar 2020 12:47:02 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 74C0121D7E for ; Wed, 11 Mar 2020 12:47:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Fo5/PWOY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 74C0121D7E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:51240 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0lJ-00084f-Kd for patchwork-qemu-devel@patchwork.kernel.org; Wed, 11 Mar 2020 08:47:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47068) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jC0g3-0007P6-PN for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jC0g1-0000NE-R2 for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:35 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:53716 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jC0g1-0000N7-Mw for qemu-devel@nongnu.org; Wed, 11 Mar 2020 08:41:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1583930493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=leJr3MXSJ3QTF/22E6Ng4FIzd4XR3y97poH/PpdJLPs=; b=Fo5/PWOYF/XB6FlPtC7AvSaaXBLHK8xsNc7Q1RZcvYGlB+ivr48II7OnjNnhiTPUe8zTEE +soStXUa8yh39hkPoW38/mAIuR+ozU8OWPfRgTqlUzIkIO43yKnV+QVblE9HZg+rEdWxg1 lxXoBYT3c9WM1XS5++CrMrK7EG2gzrM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-349-IS6bbNFzOGCFTO17AUpiqw-1; Wed, 11 Mar 2020 08:41:28 -0400 X-MC-Unique: IS6bbNFzOGCFTO17AUpiqw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6E4B6100550E; Wed, 11 Mar 2020 12:41:27 +0000 (UTC) Received: from localhost (unknown [10.36.118.127]) by smtp.corp.redhat.com (Postfix) with ESMTP id C671C73880; Wed, 11 Mar 2020 12:41:26 +0000 (UTC) From: Stefan Hajnoczi To: qemu-devel@nongnu.org Subject: [PULL 9/9] aio-posix: remove idle poll handlers to improve scalability Date: Wed, 11 Mar 2020 12:40:45 +0000 Message-Id: <20200311124045.277969-10-stefanha@redhat.com> In-Reply-To: <20200311124045.277969-1-stefanha@redhat.com> References: <20200311124045.277969-1-stefanha@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 207.211.31.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Fam Zheng , Peter Maydell , qemu-block@nongnu.org, Max Reitz , Stefan Hajnoczi , Paolo Bonzini , Kevin Wolf Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" When there are many poll handlers it's likely that some of them are idle most of the time. Remove handlers that haven't had activity recently so that the polling loop scales better for guests with a large number of devices. This feature only takes effect for the Linux io_uring fd monitoring implementation because it is capable of combining fd monitoring with userspace polling. The other implementations can't do that and risk starving fds in favor of poll handlers, so don't try this optimization when they are in use. IOPS improves from 10k to 105k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe. [Clarified aio_poll_handlers locking discipline explanation in comment after discussion with Paolo Bonzini . --Stefan] Signed-off-by: Stefan Hajnoczi Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com Message-Id: <20200305170806.1313245-8-stefanha@redhat.com> --- include/block/aio.h | 8 ++++ util/aio-posix.c | 93 +++++++++++++++++++++++++++++++++++++++++---- util/aio-posix.h | 2 + util/trace-events | 2 + 4 files changed, 98 insertions(+), 7 deletions(-) diff --git a/include/block/aio.h b/include/block/aio.h index f07ebb76b8..cb1989105a 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -227,6 +227,14 @@ struct AioContext { int64_t poll_grow; /* polling time growth factor */ int64_t poll_shrink; /* polling time shrink factor */ + /* + * List of handlers participating in userspace polling. Protected by + * ctx->list_lock. Iterated and modified mostly by the event loop thread + * from aio_poll() with ctx->list_lock incremented. aio_set_fd_handler() + * only touches the list to delete nodes if ctx->list_lock's count is zero. + */ + AioHandlerList poll_aio_handlers; + /* Are we in polling mode or monitoring file descriptors? */ bool poll_started; diff --git a/util/aio-posix.c b/util/aio-posix.c index 759989b45b..cd6cf0a4a9 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -22,6 +22,9 @@ #include "trace.h" #include "aio-posix.h" +/* Stop userspace polling on a handler if it isn't active for some time */ +#define POLL_IDLE_INTERVAL_NS (7 * NANOSECONDS_PER_SECOND) + bool aio_poll_disabled(AioContext *ctx) { return atomic_read(&ctx->poll_disable_cnt); @@ -78,6 +81,7 @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node) * deleted because deleted nodes are only cleaned up while * no one is walking the handlers list. */ + QLIST_SAFE_REMOVE(node, node_poll); QLIST_REMOVE(node, node); return true; } @@ -205,7 +209,7 @@ static bool poll_set_started(AioContext *ctx, bool started) ctx->poll_started = started; qemu_lockcnt_inc(&ctx->list_lock); - QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { + QLIST_FOREACH(node, &ctx->poll_aio_handlers, node_poll) { IOHandler *fn; if (QLIST_IS_INSERTED(node, node_deleted)) { @@ -286,6 +290,7 @@ static void aio_free_deleted_handlers(AioContext *ctx) while ((node = QLIST_FIRST_RCU(&ctx->deleted_aio_handlers))) { QLIST_REMOVE(node, node); QLIST_REMOVE(node, node_deleted); + QLIST_SAFE_REMOVE(node, node_poll); g_free(node); } @@ -300,6 +305,22 @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node) revents = node->pfd.revents & node->pfd.events; node->pfd.revents = 0; + /* + * Start polling AioHandlers when they become ready because activity is + * likely to continue. Note that starvation is theoretically possible when + * fdmon_supports_polling(), but only until the fd fires for the first + * time. + */ + if (!QLIST_IS_INSERTED(node, node_deleted) && + !QLIST_IS_INSERTED(node, node_poll) && + node->io_poll) { + trace_poll_add(ctx, node, node->pfd.fd, revents); + if (ctx->poll_started && node->io_poll_begin) { + node->io_poll_begin(node->opaque); + } + QLIST_INSERT_HEAD(&ctx->poll_aio_handlers, node, node_poll); + } + if (!QLIST_IS_INSERTED(node, node_deleted) && (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) && aio_node_check(ctx, node->is_external) && @@ -364,15 +385,19 @@ void aio_dispatch(AioContext *ctx) timerlistgroup_run_timers(&ctx->tlg); } -static bool run_poll_handlers_once(AioContext *ctx, int64_t *timeout) +static bool run_poll_handlers_once(AioContext *ctx, + int64_t now, + int64_t *timeout) { bool progress = false; AioHandler *node; + AioHandler *tmp; - QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { - if (!QLIST_IS_INSERTED(node, node_deleted) && node->io_poll && - aio_node_check(ctx, node->is_external) && + QLIST_FOREACH_SAFE(node, &ctx->poll_aio_handlers, node_poll, tmp) { + if (aio_node_check(ctx, node->is_external) && node->io_poll(node->opaque)) { + node->poll_idle_timeout = now + POLL_IDLE_INTERVAL_NS; + /* * Polling was successful, exit try_poll_mode immediately * to adjust the next polling time. @@ -389,6 +414,50 @@ static bool run_poll_handlers_once(AioContext *ctx, int64_t *timeout) return progress; } +static bool fdmon_supports_polling(AioContext *ctx) +{ + return ctx->fdmon_ops->need_wait != aio_poll_disabled; +} + +static bool remove_idle_poll_handlers(AioContext *ctx, int64_t now) +{ + AioHandler *node; + AioHandler *tmp; + bool progress = false; + + /* + * File descriptor monitoring implementations without userspace polling + * support suffer from starvation when a subset of handlers is polled + * because fds will not be processed in a timely fashion. Don't remove + * idle poll handlers. + */ + if (!fdmon_supports_polling(ctx)) { + return false; + } + + QLIST_FOREACH_SAFE(node, &ctx->poll_aio_handlers, node_poll, tmp) { + if (node->poll_idle_timeout == 0LL) { + node->poll_idle_timeout = now + POLL_IDLE_INTERVAL_NS; + } else if (now >= node->poll_idle_timeout) { + trace_poll_remove(ctx, node, node->pfd.fd); + node->poll_idle_timeout = 0LL; + QLIST_SAFE_REMOVE(node, node_poll); + if (ctx->poll_started && node->io_poll_end) { + node->io_poll_end(node->opaque); + + /* + * Final poll in case ->io_poll_end() races with an event. + * Nevermind about re-adding the handler in the rare case where + * this causes progress. + */ + progress = node->io_poll(node->opaque) || progress; + } + } + } + + return progress; +} + /* run_poll_handlers: * @ctx: the AioContext * @max_ns: maximum time to poll for, in nanoseconds @@ -424,12 +493,17 @@ static bool run_poll_handlers(AioContext *ctx, int64_t max_ns, int64_t *timeout) start_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); do { - progress = run_poll_handlers_once(ctx, timeout); + progress = run_poll_handlers_once(ctx, start_time, timeout); elapsed_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time; max_ns = qemu_soonest_timeout(*timeout, max_ns); assert(!(max_ns && progress)); } while (elapsed_time < max_ns && !ctx->fdmon_ops->need_wait(ctx)); + if (remove_idle_poll_handlers(ctx, start_time + elapsed_time)) { + *timeout = 0; + progress = true; + } + /* If time has passed with no successful polling, adjust *timeout to * keep the same ending time. */ @@ -454,8 +528,13 @@ static bool run_poll_handlers(AioContext *ctx, int64_t max_ns, int64_t *timeout) */ static bool try_poll_mode(AioContext *ctx, int64_t *timeout) { - int64_t max_ns = qemu_soonest_timeout(*timeout, ctx->poll_ns); + int64_t max_ns; + + if (QLIST_EMPTY_RCU(&ctx->poll_aio_handlers)) { + return false; + } + max_ns = qemu_soonest_timeout(*timeout, ctx->poll_ns); if (max_ns && !ctx->fdmon_ops->need_wait(ctx)) { poll_set_started(ctx, true); diff --git a/util/aio-posix.h b/util/aio-posix.h index 55fc771327..c80c04506a 100644 --- a/util/aio-posix.h +++ b/util/aio-posix.h @@ -30,10 +30,12 @@ struct AioHandler { QLIST_ENTRY(AioHandler) node; QLIST_ENTRY(AioHandler) node_ready; /* only used during aio_poll() */ QLIST_ENTRY(AioHandler) node_deleted; + QLIST_ENTRY(AioHandler) node_poll; #ifdef CONFIG_LINUX_IO_URING QSLIST_ENTRY(AioHandler) node_submitted; unsigned flags; /* see fdmon-io_uring.c */ #endif + int64_t poll_idle_timeout; /* when to stop userspace polling */ bool is_external; }; diff --git a/util/trace-events b/util/trace-events index 83b6639018..0ce42822eb 100644 --- a/util/trace-events +++ b/util/trace-events @@ -5,6 +5,8 @@ run_poll_handlers_begin(void *ctx, int64_t max_ns, int64_t timeout) "ctx %p max_ run_poll_handlers_end(void *ctx, bool progress, int64_t timeout) "ctx %p progress %d new timeout %"PRId64 poll_shrink(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64 poll_grow(void *ctx, int64_t old, int64_t new) "ctx %p old %"PRId64" new %"PRId64 +poll_add(void *ctx, void *node, int fd, unsigned revents) "ctx %p node %p fd %d revents 0x%x" +poll_remove(void *ctx, void *node, int fd) "ctx %p node %p fd %d" # async.c aio_co_schedule(void *ctx, void *co) "ctx %p co %p"