From patchwork Mon Mar 8 12:31:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kurz X-Patchwork-Id: 12122389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFE41C433DB for ; Mon, 8 Mar 2021 13:53:11 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4D311651C2 for ; Mon, 8 Mar 2021 13:53:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D311651C2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kaod.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:44030 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lJGJq-0000iS-C2 for qemu-devel@archiver.kernel.org; Mon, 08 Mar 2021 08:53:10 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:37598) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lJF3P-0000Ry-Ap for qemu-devel@nongnu.org; Mon, 08 Mar 2021 07:32:07 -0500 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:30766) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.90_1) (envelope-from ) id 1lJF3K-0001uM-7q for qemu-devel@nongnu.org; Mon, 08 Mar 2021 07:32:06 -0500 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-494-Fggn1500NR2zJjuEYvadDA-1; Mon, 08 Mar 2021 07:31:52 -0500 X-MC-Unique: Fggn1500NR2zJjuEYvadDA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7A8B62F7A6; Mon, 8 Mar 2021 12:31:51 +0000 (UTC) Received: from bahia.redhat.com (ovpn-113-236.ams2.redhat.com [10.36.113.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2BA9A5D9D0; Mon, 8 Mar 2021 12:31:50 +0000 (UTC) From: Greg Kurz To: qemu-devel@nongnu.org Subject: [PATCH 1/4] vhost-user: Introduce nested event loop in vhost_user_read() Date: Mon, 8 Mar 2021 13:31:38 +0100 Message-Id: <20210308123141.26444-2-groug@kaod.org> In-Reply-To: <20210308123141.26444-1-groug@kaod.org> References: <20210308123141.26444-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Received-SPF: softfail client-ip=205.139.111.44; envelope-from=groug@kaod.org; helo=us-smtp-delivery-44.mimecast.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Greg Kurz , "Michael S. Tsirkin" , Vivek Goyal , Stefan Hajnoczi , "Dr. David Alan Gilbert" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" A deadlock condition potentially exists if a vhost-user process needs to request something to QEMU on the slave channel while processing a vhost-user message. This doesn't seem to affect any vhost-user implementation so far, but this is currently biting the upcoming enablement of DAX with virtio-fs. The issue is being observed when the guest does an emergency reboot while a mapping still exits in the DAX window, which is very easy to get with a busy enough workload (e.g. as simulated by blogbench [1]) : - QEMU sends VHOST_USER_GET_VRING_BASE to virtiofsd. - In order to complete the request, virtiofsd then asks QEMU to remove the mapping on the slave channel. All these dialogs are synchronous, hence the deadlock. As pointed out by Stefan Hajnoczi: When QEMU's vhost-user master implementation sends a vhost-user protocol message, vhost_user_read() does a "blocking" read during which slave_fd is not monitored by QEMU. As a preliminary step to address this, split vhost_user_read() into a nested even loop and a one-shot callback that does the actual reading. A subsequent patch will teach the loop to monitor and process messages from the slave channel as well. [1] https://github.com/jedisct1/Blogbench Suggested-by: Stefan Hajnoczi Signed-off-by: Greg Kurz --- hw/virtio/vhost-user.c | 59 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 54 insertions(+), 5 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 2fdd5daf74bb..8a0574d5f959 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -294,15 +294,27 @@ static int vhost_user_read_header(struct vhost_dev *dev, VhostUserMsg *msg) return 0; } -static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg) +struct vhost_user_read_cb_data { + struct vhost_dev *dev; + VhostUserMsg *msg; + GMainLoop *loop; + int ret; +}; + +static gboolean vhost_user_read_cb(GIOChannel *source, GIOCondition condition, + gpointer opaque) { + struct vhost_user_read_cb_data *data = opaque; + struct vhost_dev *dev = data->dev; + VhostUserMsg *msg = data->msg; struct vhost_user *u = dev->opaque; CharBackend *chr = u->user->chr; uint8_t *p = (uint8_t *) msg; int r, size; if (vhost_user_read_header(dev, msg) < 0) { - return -1; + data->ret = -1; + goto end; } /* validate message size is sane */ @@ -310,7 +322,8 @@ static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg) error_report("Failed to read msg header." " Size %d exceeds the maximum %zu.", msg->hdr.size, VHOST_USER_PAYLOAD_SIZE); - return -1; + data->ret = -1; + goto end; } if (msg->hdr.size) { @@ -320,11 +333,47 @@ static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg) if (r != size) { error_report("Failed to read msg payload." " Read %d instead of %d.", r, msg->hdr.size); - return -1; + data->ret = -1; + goto end; } } - return 0; +end: + g_main_loop_quit(data->loop); + return G_SOURCE_REMOVE; +} + +static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg) +{ + struct vhost_user *u = dev->opaque; + CharBackend *chr = u->user->chr; + GMainContext *prev_ctxt = chr->chr->gcontext; + GMainContext *ctxt = g_main_context_new(); + GMainLoop *loop = g_main_loop_new(ctxt, FALSE); + struct vhost_user_read_cb_data data = { + .dev = dev, + .loop = loop, + .msg = msg, + .ret = 0 + }; + + /* Switch context and add a new watch to monitor chardev activity */ + qemu_chr_be_update_read_handlers(chr->chr, ctxt); + qemu_chr_fe_add_watch(chr, G_IO_IN | G_IO_HUP, vhost_user_read_cb, &data); + + g_main_loop_run(loop); + + /* + * Restore the previous context. This also destroys/recreates event + * sources : this guarantees that all pending events in the original + * context that have been processed by the nested loop are purged. + */ + qemu_chr_be_update_read_handlers(chr->chr, prev_ctxt); + + g_main_loop_unref(loop); + g_main_context_unref(ctxt); + + return data.ret; } static int process_message_reply(struct vhost_dev *dev,