From patchwork Wed Mar 1 16:08:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Halil Pasic X-Patchwork-Id: 9598705 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DB7AA60453 for ; Wed, 1 Mar 2017 16:09:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9D7B284F5 for ; Wed, 1 Mar 2017 16:09:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BCB892855C; Wed, 1 Mar 2017 16:09:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8641F284F5 for ; Wed, 1 Mar 2017 16:09:06 +0000 (UTC) Received: from localhost ([::1]:47377 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cj6oJ-0000Ov-8K for patchwork-qemu-devel@patchwork.kernel.org; Wed, 01 Mar 2017 11:09:03 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43839) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cj6o8-0000OU-0P for qemu-devel@nongnu.org; Wed, 01 Mar 2017 11:08:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cj6o4-0001sA-LR for qemu-devel@nongnu.org; Wed, 01 Mar 2017 11:08:51 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:35331) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cj6o4-0001rW-Cr for qemu-devel@nongnu.org; Wed, 01 Mar 2017 11:08:48 -0500 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v21Fx262048759 for ; Wed, 1 Mar 2017 11:08:46 -0500 Received: from e06smtp13.uk.ibm.com (e06smtp13.uk.ibm.com [195.75.94.109]) by mx0a-001b2d01.pphosted.com with ESMTP id 28wxrb208m-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 01 Mar 2017 11:08:46 -0500 Received: from localhost by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 1 Mar 2017 16:08:43 -0000 Received: from d06dlp02.portsmouth.uk.ibm.com (9.149.20.14) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 1 Mar 2017 16:08:40 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by d06dlp02.portsmouth.uk.ibm.com (Postfix) with ESMTP id CFEEF2190023; Wed, 1 Mar 2017 16:07:40 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v21G8dBm2687370; Wed, 1 Mar 2017 16:08:39 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C108EA4053; Wed, 1 Mar 2017 16:08:35 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 830E8A405F; Wed, 1 Mar 2017 16:08:35 +0000 (GMT) Received: from oc3836556865.ibm.com (unknown [9.152.224.150]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 1 Mar 2017 16:08:35 +0000 (GMT) To: Paolo Bonzini , qemu-devel@nongnu.org, "Michael S. Tsirkin" References: <20170301115004.96073-1-pasic@linux.vnet.ibm.com> <331bf747-0c32-0f1a-eda0-40e6fa507494@redhat.com> From: Halil Pasic Date: Wed, 1 Mar 2017 17:08:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17030116-0012-0000-0000-000004D8F3AA X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17030116-0013-0000-0000-0000176EE716 Message-Id: <08ca0c91-4a6d-1750-ed79-a0f6e2ca7eaf@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-03-01_11:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1703010148 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] [fuzzy] X-Received-From: 148.163.156.1 Subject: Re: [Qemu-devel] [PATCH 1/1] virtio: fallback from irqfd to non-irqfd notify X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cornelia Huck , Stefan Hajnoczi Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP On 03/01/2017 03:29 PM, Paolo Bonzini wrote: > > > On 01/03/2017 14:22, Halil Pasic wrote: >> Here a trace: >> >> 135871@1488304024.512533:virtio_blk_req_complete req 0x2aa6b117e10 status 0 >> 135871@1488304024.512541:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> 135871@1488304024.522607:virtio_blk_req_complete req 0x2aa6b118980 status 0 >> 135871@1488304024.522616:virtio_blk_req_complete req 0x2aa6b119260 status 0 >> 135871@1488304024.522627:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> 135871@1488304024.527386:virtio_blk_req_complete req 0x2aa6b118980 status 0 >> 135871@1488304024.527431:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> 135871@1488304024.528611:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de880 >> 135871@1488304024.528628:virtio_guest_notifier_read vdev 0x2aa6b0e61c8 vq 0x2aa6b4de8f8 >> 135871@1488304024.528753:virtio_blk_data_plane_stop dataplane 0x2aa6b0e5540 >> ^== DATAPLANE STOP >> 135871@1488304024.530709:virtio_blk_req_complete req 0x2aa6b117e10 status 0 >> 135871@1488304024.530752:virtio_guest_notifier_read vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> ^== comes from k->set_guest_notifiers(qbus->parent, nvqs, false); >> in virtio_blk_data_plane_stop and done immediately after >> irqfd is cleaned up by the transport >> 135871@1488304024.530836:virtio_notify_irqfd vdev 0x2aa6b0e19d8 vq 0x2aa6b4c0870 >> halil: error in event_notifier_set: Bad file descriptor >> ^== here we have the problem >> >> If you want a stacktrace that can be arranged to. >> >>> like a reset should cause it (the only call in virtio-blk is from >>> virtio_blk_data_plane_stop), and then the guest doesn't care anymore >>> about interrupts. >> I do not understand this with 'doesn't care anymore about interrupts'. >> I was debugging a virtio-blk device being stuck waiting for a host >> notification (interrupt) after migration. > > Ok, this explains it better then. The issue is that > virtio_blk_data_plane_stop doesn't flush the bottom half, which you want > to do when the caller is, for example, virtio_ccw_vmstate_change. > > Does it work if you call to qemu_bh_cancel(s->bh) and notify_guest_bh(s) > after > > blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); > > ? > With thinking (see questions below) until tomorrow. I should probably cc stable, or? I would also like to do some diagnostic stuff if virtio_notify_irqfd fails. Maybe assert success for event_notifier_set. Would that be OK with you? I have a couple of questions about the ways of the dataplane code. If you are too busy, feel free to not answer -- I will keep thinking myself. Q1. For this to work correctly, it seems to me, we need to be sure that virtio_blk_req_complete can not be happen between the newly added notify_guest_bh(s); and vblk->dataplane_started = false; becomes visible. How is this ensured? Q2. The virtio_blk_data_plane_stop should be from the thread/context associated with the main event loop, and with that vblk->dataplane_started = false too. But I think dataplane_started may end up being used form a different thread (e.g. req_complete). How does the sequencing work there and/or is it even important? Regards, Halil --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -260,6 +260,8 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev) /* Drain and switch bs back to the QEMU main loop */ blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context()); + qemu_bh_cancel(s->bh); + notify_guest_bh(s); applied I do not see the problem any more. I will most likely turn this into a patch tomorrow. I would like to give it some more testing and