From patchwork Mon Jan 13 17:08:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Jackson X-Patchwork-Id: 11330633 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BC83138D for ; Mon, 13 Jan 2020 17:10:31 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 07FA32084D for ; Mon, 13 Jan 2020 17:10:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=citrix.com header.i=@citrix.com header.b="H9+FGMK5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 07FA32084D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=eu.citrix.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1ir3DM-000079-8w; Mon, 13 Jan 2020 17:09:20 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1ir3DL-00005p-2H for xen-devel@lists.xenproject.org; Mon, 13 Jan 2020 17:09:19 +0000 X-Inumbo-ID: 5dfbe8a8-3627-11ea-82b0-12813bfff9fa Received: from esa3.hc3370-68.iphmx.com (unknown [216.71.145.155]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 5dfbe8a8-3627-11ea-82b0-12813bfff9fa; Mon, 13 Jan 2020 17:08:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=citrix.com; s=securemail; t=1578935332; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=ZHk98AbpVsUcxJqYNL9yGHsexHbThfpD0VsQHyjZp/4=; b=H9+FGMK5VOZFKWPuEtpYAPlRKEDaQKEW64nNrKCWnM3lXsrusVwJVMoa y1lOlCXMVOaaLwSQLXSv8n53XwC0Q0szQxwacd69l2y1lMs72eWqkHl3T wVlN8MK5v9Kb0N6awtDf63p0j/41iAXrfbJ4ipkNVq4KvcK5lkZ0OqKon w=; Authentication-Results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=ian.jackson@eu.citrix.com; spf=Pass smtp.mailfrom=Ian.Jackson@citrix.com; spf=None smtp.helo=postmaster@mail.citrix.com Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of ian.jackson@eu.citrix.com) identity=pra; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="Ian.Jackson@citrix.com"; x-sender="ian.jackson@eu.citrix.com"; x-conformance=sidf_compatible Received-SPF: Pass (esa3.hc3370-68.iphmx.com: domain of Ian.Jackson@citrix.com designates 162.221.158.21 as permitted sender) identity=mailfrom; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="Ian.Jackson@citrix.com"; x-sender="Ian.Jackson@citrix.com"; x-conformance=sidf_compatible; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:209.167.231.154 ip4:178.63.86.133 ip4:195.66.111.40/30 ip4:85.115.9.32/28 ip4:199.102.83.4 ip4:192.28.146.160 ip4:192.28.146.107 ip4:216.52.6.88 ip4:216.52.6.188 ip4:162.221.158.21 ip4:162.221.156.83 ip4:168.245.78.127 ~all" Received-SPF: None (esa3.hc3370-68.iphmx.com: no sender authenticity information available from domain of postmaster@mail.citrix.com) identity=helo; client-ip=162.221.158.21; receiver=esa3.hc3370-68.iphmx.com; envelope-from="Ian.Jackson@citrix.com"; x-sender="postmaster@mail.citrix.com"; x-conformance=sidf_compatible IronPort-SDR: m14wLfXMm7G5yoD2R7RSLGEFMY93wWcxO7KahnD6qi/o9v73amvspA3Gb/Eln/zbpL+1DSXR2o DxSe/dTW9XMrH0hmhBKIdckuSB4SI95a+LU/fBLYMrQbCpNteE32VU2TEYuxEIeuytq/Rg49lQ ik4Q90CXpT0Y5C03m/Z8kM6t7pgNn8CUGDcaZnbZB+O++gslGIBKgEOIP0a1fNadrUF2A533Ib 1JplpG7CHmO04U1xGydrzwCTuh/8qzjn3lbSMx0TsmRZ6vH/f0iOuaBfuYbUdhLDel8KueGRFR YnA= X-SBRS: 2.7 X-MesageID: 10838972 X-Ironport-Server: esa3.hc3370-68.iphmx.com X-Remote-IP: 162.221.158.21 X-Policy: $RELAYED X-IronPort-AV: E=Sophos;i="5.69,429,1571716800"; d="scan'208";a="10838972" From: Ian Jackson To: Date: Mon, 13 Jan 2020 17:08:42 +0000 Message-ID: <20200113170843.21332-10-ian.jackson@eu.citrix.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20200113170843.21332-1-ian.jackson@eu.citrix.com> References: <20200113170843.21332-1-ian.jackson@eu.citrix.com> MIME-Version: 1.0 Subject: [Xen-devel] [PATCH v2 09/10] libxl: event: Fix possible hang with libxl_osevent_beforepoll X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Ian Jackson , George Dunlap Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" If the application uses libxl_osevent_beforepoll, a similar hang is possible to the one described and fixed in libxl: event: Fix hang when mixing blocking and eventy calls Application behaviour would have to be fairly unusual, but it doesn't seem sensible to just leave this latent bug. We fix the latent bug by waking up the "poller_app" pipe every time we add osevents. If the application does not ever call beforepoll, we write one byte to the pipe and set pipe_nonempty and then we ignore it. We only write another byte if beforepoll is called again. Normally in an eventy program there would only be one thread calling libxl_osevent_beforepoll. The effect in such a program is to sometimes needlessly go round the poll loop again if a timeout callback becomes interested in a new osevent. We'll fix that in a moment. Signed-off-by: Ian Jackson Reviewed-by: George Dunlap --- v2: New addition to correctness arguments in libxl_event.c comment. --- tools/libxl/libxl_event.c | 54 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 11 deletions(-) diff --git a/tools/libxl/libxl_event.c b/tools/libxl/libxl_event.c index 45cc67942d..5f6a607d80 100644 --- a/tools/libxl/libxl_event.c +++ b/tools/libxl/libxl_event.c @@ -41,18 +41,25 @@ static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao); * * We need the following property (the "unstale liveness property"): * - * Whenever any thread is blocking in the libxl event loop[1], at - * least one thread must be using an up to date osevent set. It is OK - * for all but one threads to have stale event sets, because so long - * as one waiting thread has the right event set, any actually - * interesting event will, if nothing else, wake that "right" thread - * up. It will then make some progress and/or, if it exits, ensure - * that some other thread becomes the "right" thread. + * Whenever any thread is blocking as a result of being given an fd + * set or timeout by libxl, at least one thread must be using an up to + * date osevent set. It is OK for all but one threads to have stale + * event sets, because so long as one waiting thread has the right + * event set, any actually interesting event will, if nothing else, + * wake that "right" thread up. It will then make some progress + * and/or, if it exits, ensure that some other thread becomes the + * "right" thread. * - * [1] TODO: Right now we are considering only the libxl event loop. - * We need to consider application event loop outside libxl too. + * For threads blocking outside libxl and which are receiving libxl's + * fd and timeout information via the libxl_osevent_hooks callbacks, + * libxl calls this function as soon as it becomes interested. It is + * the responsiblity of a provider of these functions in a + * multithreaded environment to make arrangements to wake up event + * waiting thread(s) with stale event sets. * - * Argument that our approach is sound: + * Waiters outside libxl using _beforepoll are dealt with below. + * + * For the libxl event loop, the argument is as follows: * * The issue we are concerned about is libxl sleeping on an out of * date fd set, or too long a timeout, so that it doesn't make @@ -132,7 +139,29 @@ static void ao__check_destroy(libxl_ctx *ctx, libxl__ao *ao); * will reenter libxl when it gains the lock and necessarily then * becomes a baton holder in category (a). * - * So the "baton invariant" is maintained. QED. + * So the "baton invariant" is maintained. + * QED (for waiters in libxl). + * + * + * For waiters outside libxl which used libxl_osevent_beforepoll + * to get the fd set: + * + * As above, adding an osevent involves having an egc or an ao. + * It sets poller->osevents_added on all active pollers. Notably + * it sets it on poller_app, which is always active. + * + * The thread which does this will dispose of its egc or ao before + * exiting libxl so it will always wake up the poller_app if the last + * call to _beforepoll was before the osevents were added. So the + * application's fd set contains at least a wakeup in the form of the + * poller_app fd. The application cannot sleep on the libxl fd set + * until it has called _afterpoll which empties the pipe, and it + * is expected to then call _beforepoll again before sleeping. + * + * So all the application's event waiting thread(s) will always have + * an up to date osevent set, and will be woken up if necessary to + * achieve this. (This is in contrast libxl's own event loop where + * only one thread need be up to date, as discussed above.) */ static void pollers_note_osevent_added(libxl_ctx *ctx) { libxl__poller *poller; @@ -157,6 +186,9 @@ void libxl__egc_ao_cleanup_1_baton(libxl__gc *gc) { libxl__poller *search, *wake=0; + if (CTX->poller_app->osevents_added) + baton_wake(gc, CTX->poller_app); + LIBXL_LIST_FOREACH(search, &CTX->pollers_active, active_entry) { if (search == CTX->poller_app) /* This one is special. We can't give it the baton. */