From patchwork Wed Sep 5 09:33:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sergio Lopez Pascual X-Patchwork-Id: 10588641 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83F2813AC for ; Wed, 5 Sep 2018 09:35:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7155F29BFA for ; Wed, 5 Sep 2018 09:35:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6474529BF8; Wed, 5 Sep 2018 09:35:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0EEF329BF8 for ; Wed, 5 Sep 2018 09:35:27 +0000 (UTC) Received: from localhost ([::1]:54793 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fxUDf-0006CO-3l for patchwork-qemu-devel@patchwork.kernel.org; Wed, 05 Sep 2018 05:35:27 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49678) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fxUCx-0005mP-Ph for qemu-devel@nongnu.org; Wed, 05 Sep 2018 05:34:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fxUCx-0007zb-0W for qemu-devel@nongnu.org; Wed, 05 Sep 2018 05:34:43 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:49564 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fxUCs-0007kj-OT; Wed, 05 Sep 2018 05:34:38 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 80BF0804BAAC; Wed, 5 Sep 2018 09:34:37 +0000 (UTC) Received: from dritchie.redhat.com (unknown [10.33.36.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id A1A4EB27BA; Wed, 5 Sep 2018 09:34:30 +0000 (UTC) From: Sergio Lopez To: kwolf@redhat.com, famz@redhat.com, stefanha@redhat.com, qemu-block@nongnu.org Date: Wed, 5 Sep 2018 11:33:51 +0200 Message-Id: <20180905093351.21954-1-slp@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 05 Sep 2018 09:34:37 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Wed, 05 Sep 2018 09:34:37 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'slp@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, Sergio Lopez Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP AIO Coroutines shouldn't by managed by an AioContext different than the one assigned when they are created. aio_co_enter avoids entering a coroutine from a different AioContext, calling aio_co_schedule instead. Scheduled coroutines are then entered by co_schedule_bh_cb using qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the current AioContext obtained with qemu_get_current_aio_context. Eventually, co->ctx will be set to the AioContext passed as an argument to qemu_aio_coroutine_enter. This means that, if an IO Thread's AioConext is being processed by the Main Thread (due to aio_poll being called with a BDS AioContext, as it happens in AIO_WAIT_WHILE among other places), the AioContext from some coroutines may be wrongly replaced with the one from the Main Thread. This is the root cause behind some crashes, mainly triggered by the drain code at block/io.c. The most common are these abort and failed assertion: util/async.c:aio_co_schedule 456 if (scheduled) { 457 fprintf(stderr, 458 "%s: Co-routine was already scheduled in '%s'\n", 459 __func__, scheduled); 460 abort(); 461 } util/qemu-coroutine-lock.c: 286 assert(mutex->holder == self); But it's also known to cause random errors at different locations, and even SIGSEGV with broken coroutine backtraces. By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can pass the correct AioContext as an argument, making sure co->ctx is not wrongly altered. Signed-off-by: Sergio Lopez Reviewed-by: Fam Zheng Reviewed-by: Paolo Bonzini --- util/async.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/async.c b/util/async.c index 05979f8014..c10642a385 100644 --- a/util/async.c +++ b/util/async.c @@ -400,7 +400,7 @@ static void co_schedule_bh_cb(void *opaque) /* Protected by write barrier in qemu_aio_coroutine_enter */ atomic_set(&co->scheduled, NULL); - qemu_coroutine_enter(co); + qemu_aio_coroutine_enter(ctx, co); aio_context_release(ctx); } }