From patchwork Fri Jan 17 14:47:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Jackson X-Patchwork-Id: 11339387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02B2A17EA for ; Fri, 17 Jan 2020 14:48:43 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D30A82073A for ; Fri, 17 Jan 2020 14:48:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=citrix.com header.i=@citrix.com header.b="acn0nqgD" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D30A82073A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=eu.citrix.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1isSuY-0003Vr-Fp; Fri, 17 Jan 2020 14:47:46 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1isSuW-0003VB-O5 for xen-devel@lists.xenproject.org; Fri, 17 Jan 2020 14:47:44 +0000 X-Inumbo-ID: 4c5a6759-3938-11ea-b549-12813bfff9fa Received: from esa5.hc3370-68.iphmx.com (unknown [216.71.155.168]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 4c5a6759-3938-11ea-b549-12813bfff9fa; Fri, 17 Jan 2020 14:47:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=citrix.com; s=securemail; t=1579272455; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=ed5rCI1SJFMUJWFKcw/KB8tZT/hQHzA/nXtpKtUxPBc=; b=acn0nqgDP5RVHHjrhVVlsr+sXo42l6K5ijB6a7JaAWpl7xkNJbGF6K/t Jm6+ys23sNcA1sEWSTL6lJUQOOvsVKyK6qgv9YE5Md7IZcsFgfwseHE2U JxMbao6VcYwGf3A6iuAprtYQn/m0mTnjFUwD7pPabWRxSK4kO2qblvH8J 4=; Authentication-Results: esa5.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=ian.jackson@eu.citrix.com; spf=Pass smtp.mailfrom=Ian.Jackson@citrix.com; spf=None smtp.helo=postmaster@mail.citrix.com Received-SPF: None (esa5.hc3370-68.iphmx.com: no sender authenticity information available from domain of ian.jackson@eu.citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa5.hc3370-68.iphmx.com; envelope-from="Ian.Jackson@citrix.com"; x-sender="ian.jackson@eu.citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa5.hc3370-68.iphmx.com: domain of Ian.Jackson@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa5.hc3370-68.iphmx.com; envelope-from="Ian.Jackson@citrix.com"; x-sender="Ian.Jackson@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa5.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa5.hc3370-68.iphmx.com; envelope-from="Ian.Jackson@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: 6S7f2ICCMzzSKKBCiwYLGPSOqkGSzyCvjVs2XZtB0pzGXCEsNGW/edRiie/T6MIX5+YhY1Ncn2 mtxFmUAaCLHv0hfqqvEfait2A2G+2ezKCnJ4GfD1gCqULa9vssTiFqk+Z9peR7/AXxFs0JuwJV qwYJLlHrTbZtNq0sndx+6hzNm8LIq99ocaPjuJfJiA1IuZMROPGX2+g9FqeN0Np2p8ZvhxoG5E 7itAJHTfedDMZnleeANm4Bp3mvWaxNphf2eUejZW1QYgZe7GQlxYSgvHAdve0Uiow+WHbN+Vaw L+Y= X-SBRS: 2.7 X-MesageID: 11441872 X-Ironport-Server: esa5.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.70,330,1574139600"; d="scan'208";a="11441872" From: Ian Jackson To: Date: Fri, 17 Jan 2020 14:47:25 +0000 Message-ID: <20200117144726.582-10-ian.jackson@eu.citrix.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20200117144726.582-1-ian.jackson@eu.citrix.com> References: <20200117144726.582-1-ian.jackson@eu.citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v3 09/10] libxl: event: Fix possible hang with libxl_osevent_beforepoll X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Anthony PERARD , Ian Jackson , George Dunlap , Wei Liu Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" If the application uses libxl_osevent_beforepoll, a similar hang is possible to the one described and fixed in libxl: event: Fix hang when mixing blocking and eventy calls Application behaviour would have to be fairly unusual, but it doesn't seem sensible to just leave this latent bug. We fix the latent bug by waking up the "poller_app" pipe every time we add osevents. If the application does not ever call beforepoll, we write one byte to the pipe and set pipe_nonempty and then we ignore it. We only write another byte if beforepoll is called again. Normally in an eventy program there would only be one thread calling libxl_osevent_beforepoll. The effect in such a program is to sometimes needlessly go round the poll loop again if a timeout callback becomes interested in a new osevent. We'll fix that in a moment. Signed-off-by: Ian Jackson Reviewed-by: George Dunlap Tested-by: George Dunlap --- v2: New addition to correctness arguments in libxl_event.c comment. --- tools/libxl/libxl_event.c | 54 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 11 deletions(-) diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c index 45cc67942d..5f6a607d80 100644 --- a/tools/libxl/libxl_event.c +++ b/tools/libxl/libxl_event.c @@ -41,18 +41,25 @@ static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao); * * We need the following property (the "unstale liveness property"): * - * Whenever any thread is blocking in the libxl event loop[1], at - * least one thread must be using an up to date osevent set. It is OK - * for all but one threads to have stale event sets, because so long - * as one waiting thread has the right event set, any actually - * interesting event will, if nothing else, wake that "right" thread - * up. It will then make some progress and/or, if it exits, ensure - * that some other thread becomes the "right" thread. + * Whenever any thread is blocking as a result of being given an fd + * set or timeout by libxl, at least one thread must be using an up to + * date osevent set. It is OK for all but one threads to have stale + * event sets, because so long as one waiting thread has the right + * event set, any actually interesting event will, if nothing else, + * wake that "right" thread up. It will then make some progress + * and/or, if it exits, ensure that some other thread becomes the + * "right" thread. * - * [1] TODO: Right now we are considering only the libxl event loop. - * We need to consider application event loop outside libxl too. + * For threads blocking outside libxl and which are receiving libxl's + * fd and timeout information via the libxl_osevent_hooks callbacks, + * libxl calls this function as soon as it becomes interested. It is + * the responsiblity of a provider of these functions in a + * multithreaded environment to make arrangements to wake up event + * waiting thread(s) with stale event sets. * - * Argument that our approach is sound: + * Waiters outside libxl using _beforepoll are dealt with below. + * + * For the libxl event loop, the argument is as follows: * * The issue we are concerned about is libxl sleeping on an out of * date fd set, or too long a timeout, so that it doesn't make @@ -132,7 +139,29 @@ static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao); * will reenter libxl when it gains the lock and necessarily then * becomes a baton holder in category (a). * - * So the "baton invariant" is maintained. QED. + * So the "baton invariant" is maintained. + * QED (for waiters in libxl). + * + * + * For waiters outside libxl which used libxl_osevent_beforepoll + * to get the fd set: + * + * As above, adding an osevent involves having an egc or an ao. + * It sets poller->osevents_added on all active pollers. Notably + * it sets it on poller_app, which is always active. + * + * The thread which does this will dispose of its egc or ao before + * exiting libxl so it will always wake up the poller_app if the last + * call to _beforepoll was before the osevents were added. So the + * application's fd set contains at least a wakeup in the form of the + * poller_app fd. The application cannot sleep on the libxl fd set + * until it has called _afterpoll which empties the pipe, and it + * is expected to then call _beforepoll again before sleeping. + * + * So all the application's event waiting thread(s) will always have + * an up to date osevent set, and will be woken up if necessary to + * achieve this. (This is in contrast libxl's own event loop where + * only one thread need be up to date, as discussed above.) */ static void pollers_note_osevent_added(libxl_ctx *ctx) { libxl__poller *poller; @@ -157,6 +186,9 @@ void libxl__egc_ao_cleanup_1_baton(libxl__gc *gc) { libxl__poller *search, *wake=0; + if (CTX->poller_app->osevents_added) + baton_wake(gc, CTX->poller_app); + LIBXL_LIST_FOREACH(search, &CTX->pollers_active, active_entry) { if (search == CTX->poller_app) /* This one is special. We can't give it the baton. */