From patchwork Tue Nov 21 17:03:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Cody X-Patchwork-Id: 10068623 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E581060375 for ; Tue, 21 Nov 2017 17:06:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1AB6297A4 for ; Tue, 21 Nov 2017 17:06:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C584A29839; Tue, 21 Nov 2017 17:06:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 447EF297A4 for ; Tue, 21 Nov 2017 17:06:00 +0000 (UTC) Received: from localhost ([::1]:35601 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eHBzj-0008V2-En for patchwork-qemu-devel@patchwork.kernel.org; Tue, 21 Nov 2017 12:05:59 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48467) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eHByC-0008Pd-Dg for qemu-devel@nongnu.org; Tue, 21 Nov 2017 12:04:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eHByB-0005rG-49 for qemu-devel@nongnu.org; Tue, 21 Nov 2017 12:04:24 -0500 Received: from mx1.redhat.com ([209.132.183.28]:18461) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eHBxv-0005g4-Qu; Tue, 21 Nov 2017 12:04:08 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0122A356DC; Tue, 21 Nov 2017 17:04:07 +0000 (UTC) Received: from localhost (ovpn-124-90.rdu2.redhat.com [10.10.124.90]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BB40260BEC; Tue, 21 Nov 2017 17:04:05 +0000 (UTC) From: Jeff Cody To: qemu-block@nongnu.org Date: Tue, 21 Nov 2017 12:03:48 -0500 Message-Id: <20171121170350.31290-3-jcody@redhat.com> In-Reply-To: <20171121170350.31290-1-jcody@redhat.com> References: <20171121170350.31290-1-jcody@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 21 Nov 2017 17:04:07 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 2/4] coroutine: abort if we try to schedule or enter a pending coroutine X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, jcody@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP The previous patch fixed a race condition, in which there were coroutines being executing doubly, or after coroutine deletion. We can detect common scenarios when this happens, and print an error message and abort before we corrupt memory / data, or segfault. This patch will abort if an attempt to enter a coroutine is made while it is currently pending execution, either in a specific AioContext bh, or pending execution via a timer. It will also abort if a coroutine is scheduled, before a prior scheduled run has occurred. We cannot rely on the existing co->caller check for recursive re-entry to catch this, as the coroutine may run and exit with COROUTINE_TERMINATE before the scheduled coroutine executes. (This is the scenario that was occurring and fixed in the previous patch). This patch also re-orders the Coroutine struct elements in an attempt to optimize caching. Signed-off-by: Jeff Cody Reviewed-by: Stefan Hajnoczi --- include/qemu/coroutine_int.h | 13 ++++++++++--- util/async.c | 13 +++++++++++++ util/qemu-coroutine-sleep.c | 12 ++++++++++++ util/qemu-coroutine.c | 14 ++++++++++++++ 4 files changed, 49 insertions(+), 3 deletions(-) diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h index cb98892..59e8406 100644 --- a/include/qemu/coroutine_int.h +++ b/include/qemu/coroutine_int.h @@ -46,14 +46,21 @@ struct Coroutine { size_t locks_held; + /* Only used when the coroutine has yielded. */ + AioContext *ctx; + + /* Used to catch and abort on illegal co-routine entry. + * Will contain the name of the function that had first + * scheduled the coroutine. */ + const char *scheduled; + + QSIMPLEQ_ENTRY(Coroutine) co_queue_next; + /* Coroutines that should be woken up when we yield or terminate. * Only used when the coroutine is running. */ QSIMPLEQ_HEAD(, Coroutine) co_queue_wakeup; - /* Only used when the coroutine has yielded. */ - AioContext *ctx; - QSIMPLEQ_ENTRY(Coroutine) co_queue_next; QSLIST_ENTRY(Coroutine) co_scheduled_next; }; diff --git a/util/async.c b/util/async.c index 0e1bd87..4dd9d95 100644 --- a/util/async.c +++ b/util/async.c @@ -388,6 +388,9 @@ static void co_schedule_bh_cb(void *opaque) QSLIST_REMOVE_HEAD(&straight, co_scheduled_next); trace_aio_co_schedule_bh_cb(ctx, co); aio_context_acquire(ctx); + + /* Protected by write barrier in qemu_aio_coroutine_enter */ + atomic_set(&co->scheduled, NULL); qemu_coroutine_enter(co); aio_context_release(ctx); } @@ -438,6 +441,16 @@ fail: void aio_co_schedule(AioContext *ctx, Coroutine *co) { trace_aio_co_schedule(ctx, co); + const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL, + __func__); + + if (scheduled) { + fprintf(stderr, + "%s: Co-routine was already scheduled in '%s'\n", + __func__, scheduled); + abort(); + } + QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, co, co_scheduled_next); qemu_bh_schedule(ctx->co_schedule_bh); diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c index 9c56550..254349c 100644 --- a/util/qemu-coroutine-sleep.c +++ b/util/qemu-coroutine-sleep.c @@ -13,6 +13,7 @@ #include "qemu/osdep.h" #include "qemu/coroutine.h" +#include "qemu/coroutine_int.h" #include "qemu/timer.h" #include "block/aio.h" @@ -25,6 +26,8 @@ static void co_sleep_cb(void *opaque) { CoSleepCB *sleep_cb = opaque; + /* Write of schedule protected by barrier write in aio_co_schedule */ + atomic_set(&sleep_cb->co->scheduled, NULL); aio_co_wake(sleep_cb->co); } @@ -34,6 +37,15 @@ void coroutine_fn co_aio_sleep_ns(AioContext *ctx, QEMUClockType type, CoSleepCB sleep_cb = { .co = qemu_coroutine_self(), }; + + const char *scheduled = atomic_cmpxchg(&sleep_cb.co->scheduled, NULL, + __func__); + if (scheduled) { + fprintf(stderr, + "%s: Co-routine was already scheduled in '%s'\n", + __func__, scheduled); + abort(); + } sleep_cb.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, &sleep_cb); timer_mod(sleep_cb.ts, qemu_clock_get_ns(type) + ns); qemu_coroutine_yield(); diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c index d6095c1..9eff7fd 100644 --- a/util/qemu-coroutine.c +++ b/util/qemu-coroutine.c @@ -107,8 +107,22 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co) Coroutine *self = qemu_coroutine_self(); CoroutineAction ret; + /* Cannot rely on the read barrier for co in aio_co_wake(), as there are + * callers outside of aio_co_wake() */ + const char *scheduled = atomic_mb_read(&co->scheduled); + trace_qemu_aio_coroutine_enter(ctx, self, co, co->entry_arg); + /* if the Coroutine has already been scheduled, entering it again will + * cause us to enter it twice, potentially even after the coroutine has + * been deleted */ + if (scheduled) { + fprintf(stderr, + "%s: Co-routine was already scheduled in '%s'\n", + __func__, scheduled); + abort(); + } + if (co->caller) { fprintf(stderr, "Co-routine re-entered recursively\n"); abort();