From patchwork Wed May 4 07:40:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12837215 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5AC2BC433F5 for ; Wed, 4 May 2022 07:44:19 +0000 (UTC) Received: from localhost ([::1]:59544 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nm9gI-0005TD-09 for qemu-devel@archiver.kernel.org; Wed, 04 May 2022 03:44:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52304) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nm9cs-0002au-Lh for qemu-devel@nongnu.org; Wed, 04 May 2022 03:40:48 -0400 Received: from mail-pg1-x534.google.com ([2607:f8b0:4864:20::534]:43693) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nm9co-0006xW-1F for qemu-devel@nongnu.org; Wed, 04 May 2022 03:40:43 -0400 Received: by mail-pg1-x534.google.com with SMTP id q76so521415pgq.10 for ; Wed, 04 May 2022 00:40:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FI0rEWO2Hrb8q+RXCZbtt8y5SbloA1chKG+gNH/8y4Y=; b=LtQ1XQ2HRyOunBnThtjwnSJOmazQ76m2xf4+DbIXbfbNjoE6zhgxB7nblsspctYX6t PVA6T4Dl905+in1p5SGZQXZ4i7g4lN7cMBJQizQTAYaSUxaCOGSQLsNSA3SgNhD0RWM9 +cdKDkf8kQBTu0XXjU91PWOw/Sy2j4YzkaHWIEX788FBrIpArRxdZVIGdGNPc94n8KYp 2UrV1OyqIP1ayh4nX+yV/4x8Vw312+0nRJwexGeIHQsdzR0g9thXCTHPUqFSwdSrcLp5 Jji03Mti69bocG91X+uWQESnBi/SWiWTODilCTZ7rJXdGEY4Kxa/DPwahpuY1up2rvRY 4w4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FI0rEWO2Hrb8q+RXCZbtt8y5SbloA1chKG+gNH/8y4Y=; b=ohvwJQy08CeWjNBffrlvmTAYmI9xbL85i3VbwufaFFT7AxukNnHqjXOcHHP+TCe/iU VRM5Uny+vhFt0F08sJFpft81cjs60u9q1ZMKHLg49bdY0B75YRqefIghcKmLatLOQ+6/ KdsrjYp9z+yDuaqg6OpZ4f3h2q7TwkGe2I10TjZLhYtu+BODxjkLIioI+8eVCl98S2oY 91atUb4Su71gtuv9rghV8G//IpXTzw8EzdRsp+X3+jrQWTa+taxbckCKn/k4/IcWHr8j zPs6vWE9cIyf7wZjrBUAhVdU7Pf+/vP/MzveIuhv/niy9S0e5SNheYB4EtaJm9G/ebwm EssQ== X-Gm-Message-State: AOAM5330/P/Tbroi7JKt/uTfQO+JNJAyGclv2yQxbJMEKdoGxYX5s3Qf qgyt9ET4jkK3zdODT/QHQXlvn1FMccyV X-Google-Smtp-Source: ABdhPJwaAbm+EDc3SSvRwcBasMj+BYuBKJyv5jQQFlO6+iJYs5tAp0h3PrKCK5hzGrjEgxDLM3ZvCg== X-Received: by 2002:a65:63d9:0:b0:374:6b38:c6b3 with SMTP id n25-20020a6563d9000000b003746b38c6b3mr16850831pgv.195.1651650040721; Wed, 04 May 2022 00:40:40 -0700 (PDT) Received: from localhost ([139.177.225.233]) by smtp.gmail.com with ESMTPSA id q3-20020a17090311c300b0015e8d4eb2e9sm7301876plh.307.2022.05.04.00.40.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 00:40:40 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com, jsnow@redhat.com, eblake@redhat.com, Coiby.Xu@gmail.com, hreitz@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v5 1/8] block: Support passing NULL ops to blk_set_dev_ops() Date: Wed, 4 May 2022 15:40:44 +0800 Message-Id: <20220504074051.90-2-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220504074051.90-1-xieyongji@bytedance.com> References: <20220504074051.90-1-xieyongji@bytedance.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::534; envelope-from=xieyongji@bytedance.com; helo=mail-pg1-x534.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This supports passing NULL ops to blk_set_dev_ops() so that we can remove stale ops in some cases. Signed-off-by: Xie Yongji Reviewed-by: Stefan Hajnoczi --- block/block-backend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index e0e1aff4b1..35457a6a1d 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1062,7 +1062,7 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, blk->dev_opaque = opaque; /* Are we currently quiesced? Should we enforce this right now? */ - if (blk->quiesce_counter && ops->drained_begin) { + if (blk->quiesce_counter && ops && ops->drained_begin) { ops->drained_begin(opaque); } } From patchwork Wed May 4 07:40:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12837223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8CF1AC433FE for ; Wed, 4 May 2022 07:46:53 +0000 (UTC) Received: from localhost ([::1]:60978 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nm9im-0006R0-CL for qemu-devel@archiver.kernel.org; Wed, 04 May 2022 03:46:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52356) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nm9cu-0002bl-DH for qemu-devel@nongnu.org; Wed, 04 May 2022 03:40:50 -0400 Received: from mail-pj1-x102a.google.com ([2607:f8b0:4864:20::102a]:45828) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nm9cs-0006yC-7D for qemu-devel@nongnu.org; Wed, 04 May 2022 03:40:47 -0400 Received: by mail-pj1-x102a.google.com with SMTP id w17-20020a17090a529100b001db302efed6so557181pjh.4 for ; Wed, 04 May 2022 00:40:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1mmuW0WG9pZLriirR2lzZ1R1C9YhJbJYjicDLqi3XII=; b=ERVZdtf8yY5V6B8P18j/NDasAVmiopxlBfhiPxaNf1OpmiINSq2UcTPbvRLJyhFDYi mZ4ZIYHL2zzFG63tChkO45RkEwKSaja4Gc2m6wXHvnhBM5AIzj7deLPaXOTxBHx5LSJW lJ4hVYqaESEqFEZOb3pEZbIVJX7RcSW2Em/H0AGZTTsTCxoGCxEmziwlyykF+uEtFkV/ ZA/HQbNQZaQRFGFyyPqXae7lX7Aw6uOT6HnAjvSg4TWezUnE5imS7OigLdIfde/6QHW0 c/LsuBN+ArgAdTJRiJvyfBZswMXNp3oRSmz6ZKG2khnYTiPl4pCcXSfIT1ovedzgCHpT ZKOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1mmuW0WG9pZLriirR2lzZ1R1C9YhJbJYjicDLqi3XII=; b=xVPZrs//wXsXHAtDa3HiuSaA4aBkVzzUj+g7/uZc53wbzG2Bhckmp/RB+CASlT1d2b QFYmWxhqzWaI+l0kMVxLFpi1kICE1ZDf5W04FBwtkzpm/3XMfApf+P1/HFcRnquuGwp8 zyxGFagG6B+r90KivX4V9QL6x6FWfTY3JS6KKZ7ggu0qmN+Xg4b7DEtB2pz7OrTPrlgr vNXqKB3Ao4y21jqB/3uocCPCQtI7EfYOmE4T7awBSa3/tGBcKwfi0S8EZJytoIYNDK3Z Ic8WQEbX93XeU177E4uCzAvzafzBKnHvRTAfgCq+OpmGGHY3Fjlfc9pnkMycfaOW6A75 S1Iw== X-Gm-Message-State: AOAM532YknhrpTDD7qlujmIeioWqtiGCGNx+T9bf/lel0PFrDI6ox2mS OUgohsACLrr52qPFky7+q7+I X-Google-Smtp-Source: ABdhPJzHpsGgC/pyZy/EitP2/C+VFRIGnaJ07CaSKrHkz1H+4s95gYOdLPMe3w7E9TWihksdxqsAPQ== X-Received: by 2002:a17:90b:314e:b0:1dc:767d:a7ce with SMTP id ip14-20020a17090b314e00b001dc767da7cemr9020420pjb.0.1651650044645; Wed, 04 May 2022 00:40:44 -0700 (PDT) Received: from localhost ([139.177.225.233]) by smtp.gmail.com with ESMTPSA id g7-20020a170902d5c700b0015e8d4eb234sm7541000plh.126.2022.05.04.00.40.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 00:40:44 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com, jsnow@redhat.com, eblake@redhat.com, Coiby.Xu@gmail.com, hreitz@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v5 2/8] block-backend: Introduce blk_get_guest_block_size() Date: Wed, 4 May 2022 15:40:45 +0800 Message-Id: <20220504074051.90-3-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220504074051.90-1-xieyongji@bytedance.com> References: <20220504074051.90-1-xieyongji@bytedance.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::102a; envelope-from=xieyongji@bytedance.com; helo=mail-pj1-x102a.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Support getting the guest block size for the block backend. It's needed for the following commit. Signed-off-by: Xie Yongji --- block/block-backend.c | 6 ++++++ include/sysemu/block-backend-io.h | 1 + 2 files changed, 7 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 35457a6a1d..1582ff81c9 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2106,6 +2106,12 @@ void blk_set_guest_block_size(BlockBackend *blk, int align) blk->guest_block_size = align; } +int blk_get_guest_block_size(BlockBackend *blk) +{ + IO_CODE(); + return blk->guest_block_size; +} + void *blk_try_blockalign(BlockBackend *blk, size_t size) { IO_CODE(); diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index 6517c39295..7600822196 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -73,6 +73,7 @@ void blk_iostatus_set_err(BlockBackend *blk, int error); int blk_get_max_iov(BlockBackend *blk); int blk_get_max_hw_iov(BlockBackend *blk); void blk_set_guest_block_size(BlockBackend *blk, int align); +int blk_get_guest_block_size(BlockBackend *blk); void blk_io_plug(BlockBackend *blk); void blk_io_unplug(BlockBackend *blk); From patchwork Wed May 4 07:40:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12837267 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D35EFC433F5 for ; Wed, 4 May 2022 07:53:09 +0000 (UTC) Received: from localhost ([::1]:40742 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nm9op-0003rL-DJ for qemu-devel@archiver.kernel.org; Wed, 04 May 2022 03:53:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52412) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nm9cy-0002cK-24 for qemu-devel@nongnu.org; Wed, 04 May 2022 03:40:52 -0400 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]:34517) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nm9cv-0006x6-F3 for qemu-devel@nongnu.org; Wed, 04 May 2022 03:40:51 -0400 Received: by mail-pl1-x635.google.com with SMTP id n8so731429plh.1 for ; Wed, 04 May 2022 00:40:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DlwjnB3ppQFMGTkb5V9gcNR44nngB39VoWLTeVe3H48=; b=4uaLcu5Z2LAcv9AEBys8qzjQgMg88QzWotMggkxrOF/Ki9u9/bhVMvRvtt3R2cJ4XN 0s2z74hIRzbAuRoLKhF7Q+t+ZaCjlXmi1ZNAL7JLLTKIChvCZjdF9JchxYDefgWWspKV AY02XHxpFTKrtvAHumzoUnDCi/BSHGSpEGbOjvHPjx035btV7JY+wg+GFDwTBDGYa9je BPeegZipB8sP0v96ydkDn6OfZSqZzDIELnlHv3QyuA3blruabtSJv3RcjsJl6THDhbOm V7hVRhGBglsYeiff0Y5wD7vQbr+26Yrh4owoedckZbp1jy5yKi7g//nwu0qCMxJQCuDH KG4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DlwjnB3ppQFMGTkb5V9gcNR44nngB39VoWLTeVe3H48=; b=2DVV68OJmXQF8vlBKYBLQ3ap0T8HcvuxxVaiqIVbMaN1e3G3BWxgVyfBP9m7hogY56 SRrFLjwgWsaK/TImQjU8xJXLVWJnoXAGl8afF0bEvyb/Xoz0G0NGBYxK2sD9erQ9gbXa EraND5dnKpLVbTxjmbtcRf4JE9iwezOMzmjHI7FtO4tkewrjcCJrE+kutHNq75gJtn+q 2qBkSDx1iJoo28YOxXyBlW8upkg2T3eTDgGAZgWr3DB/sFyvaybpxQ2MQ6qC65VsQ2Gi bjK3P958LGSurR549cYEnxXFBUTr2AxOPD7aI+6C6w2jdqrzflToSId4vtDQWCaOzRva FWkA== X-Gm-Message-State: AOAM531KCyjkzDtnRiTkS1P63rkLzIcc9i04WICPH1+yu0W9a6/OijYq zP1fM2ZH59lRldQEZmoEXwuP X-Google-Smtp-Source: ABdhPJz8ImvuReRRSxwXyp+3X0vgaIOK6kcIqbVLSOuWGlaUV54aI2yexoy7bYZ6dmx1QTj13z8UKA== X-Received: by 2002:a17:902:ea11:b0:15e:ae19:f36a with SMTP id s17-20020a170902ea1100b0015eae19f36amr10417811plg.52.1651650048663; Wed, 04 May 2022 00:40:48 -0700 (PDT) Received: from localhost ([139.177.225.233]) by smtp.gmail.com with ESMTPSA id 6-20020aa79146000000b0050dc76281aasm1646036pfi.132.2022.05.04.00.40.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 00:40:48 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com, jsnow@redhat.com, eblake@redhat.com, Coiby.Xu@gmail.com, hreitz@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v5 3/8] block/export: Abstract out the logic of virtio-blk I/O process Date: Wed, 4 May 2022 15:40:46 +0800 Message-Id: <20220504074051.90-4-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220504074051.90-1-xieyongji@bytedance.com> References: <20220504074051.90-1-xieyongji@bytedance.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=xieyongji@bytedance.com; helo=mail-pl1-x635.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Abstract the common logic of virtio-blk I/O process to a function named virtio_blk_process_req(). It's needed for the following commit. Signed-off-by: Xie Yongji --- MAINTAINERS | 2 + block/export/meson.build | 2 +- block/export/vhost-user-blk-server.c | 249 ++------------------------- block/export/virtio-blk-handler.c | 237 +++++++++++++++++++++++++ block/export/virtio-blk-handler.h | 33 ++++ 5 files changed, 287 insertions(+), 236 deletions(-) create mode 100644 block/export/virtio-blk-handler.c create mode 100644 block/export/virtio-blk-handler.h diff --git a/MAINTAINERS b/MAINTAINERS index 294c88ace9..4113b6fc5c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3536,6 +3536,8 @@ M: Coiby Xu S: Maintained F: block/export/vhost-user-blk-server.c F: block/export/vhost-user-blk-server.h +F: block/export/virtio-blk-handler.c +F: block/export/virtio-blk-handler.h F: include/qemu/vhost-user-server.h F: tests/qtest/libqos/vhost-user-blk.c F: tests/qtest/libqos/vhost-user-blk.h diff --git a/block/export/meson.build b/block/export/meson.build index 0a08e384c7..431e47ca51 100644 --- a/block/export/meson.build +++ b/block/export/meson.build @@ -1,7 +1,7 @@ blockdev_ss.add(files('export.c')) if have_vhost_user_blk_server - blockdev_ss.add(files('vhost-user-blk-server.c')) + blockdev_ss.add(files('vhost-user-blk-server.c', 'virtio-blk-handler.c')) endif blockdev_ss.add(when: fuse, if_true: files('fuse.c')) diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c index a129204c44..8705f7c27c 100644 --- a/block/export/vhost-user-blk-server.c +++ b/block/export/vhost-user-blk-server.c @@ -17,31 +17,15 @@ #include "vhost-user-blk-server.h" #include "qapi/error.h" #include "qom/object_interfaces.h" -#include "sysemu/block-backend.h" #include "util/block-helpers.h" - -/* - * Sector units are 512 bytes regardless of the - * virtio_blk_config->blk_size value. - */ -#define VIRTIO_BLK_SECTOR_BITS 9 -#define VIRTIO_BLK_SECTOR_SIZE (1ull << VIRTIO_BLK_SECTOR_BITS) +#include "virtio-blk-handler.h" enum { VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1, - VHOST_USER_BLK_MAX_DISCARD_SECTORS = 32768, - VHOST_USER_BLK_MAX_WRITE_ZEROES_SECTORS = 32768, -}; -struct virtio_blk_inhdr { - unsigned char status; }; typedef struct VuBlkReq { VuVirtqElement elem; - int64_t sector_num; - size_t size; - struct virtio_blk_inhdr *in; - struct virtio_blk_outhdr out; VuServer *server; struct VuVirtq *vq; } VuBlkReq; @@ -50,248 +34,44 @@ typedef struct VuBlkReq { typedef struct { BlockExport export; VuServer vu_server; - uint32_t blk_size; QIOChannelSocket *sioc; struct virtio_blk_config blkcfg; bool writable; } VuBlkExport; -static void vu_blk_req_complete(VuBlkReq *req) +static void vu_blk_req_complete(VuBlkReq *req, size_t in_len) { VuDev *vu_dev = &req->server->vu_dev; - /* IO size with 1 extra status byte */ - vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1); + vu_queue_push(vu_dev, req->vq, &req->elem, in_len); vu_queue_notify(vu_dev, req->vq); free(req); } -static bool vu_blk_sect_range_ok(VuBlkExport *vexp, uint64_t sector, - size_t size) -{ - uint64_t nb_sectors; - uint64_t total_sectors; - - if (size % VIRTIO_BLK_SECTOR_SIZE) { - return false; - } - - nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS; - - QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE); - if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) { - return false; - } - if ((sector << VIRTIO_BLK_SECTOR_BITS) % vexp->blk_size) { - return false; - } - blk_get_geometry(vexp->export.blk, &total_sectors); - if (sector > total_sectors || nb_sectors > total_sectors - sector) { - return false; - } - return true; -} - -static int coroutine_fn -vu_blk_discard_write_zeroes(VuBlkExport *vexp, struct iovec *iov, - uint32_t iovcnt, uint32_t type) -{ - BlockBackend *blk = vexp->export.blk; - struct virtio_blk_discard_write_zeroes desc; - ssize_t size; - uint64_t sector; - uint32_t num_sectors; - uint32_t max_sectors; - uint32_t flags; - int bytes; - - /* Only one desc is currently supported */ - if (unlikely(iov_size(iov, iovcnt) > sizeof(desc))) { - return VIRTIO_BLK_S_UNSUPP; - } - - size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc)); - if (unlikely(size != sizeof(desc))) { - error_report("Invalid size %zd, expected %zu", size, sizeof(desc)); - return VIRTIO_BLK_S_IOERR; - } - - sector = le64_to_cpu(desc.sector); - num_sectors = le32_to_cpu(desc.num_sectors); - flags = le32_to_cpu(desc.flags); - max_sectors = (type == VIRTIO_BLK_T_WRITE_ZEROES) ? - VHOST_USER_BLK_MAX_WRITE_ZEROES_SECTORS : - VHOST_USER_BLK_MAX_DISCARD_SECTORS; - - /* This check ensures that 'bytes' fits in an int */ - if (unlikely(num_sectors > max_sectors)) { - return VIRTIO_BLK_S_IOERR; - } - - bytes = num_sectors << VIRTIO_BLK_SECTOR_BITS; - - if (unlikely(!vu_blk_sect_range_ok(vexp, sector, bytes))) { - return VIRTIO_BLK_S_IOERR; - } - - /* - * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard - * and write zeroes commands if any unknown flag is set. - */ - if (unlikely(flags & ~VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) { - return VIRTIO_BLK_S_UNSUPP; - } - - if (type == VIRTIO_BLK_T_WRITE_ZEROES) { - int blk_flags = 0; - - if (flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP) { - blk_flags |= BDRV_REQ_MAY_UNMAP; - } - - if (blk_co_pwrite_zeroes(blk, sector << VIRTIO_BLK_SECTOR_BITS, - bytes, blk_flags) == 0) { - return VIRTIO_BLK_S_OK; - } - } else if (type == VIRTIO_BLK_T_DISCARD) { - /* - * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for - * discard commands if the unmap flag is set. - */ - if (unlikely(flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) { - return VIRTIO_BLK_S_UNSUPP; - } - - if (blk_co_pdiscard(blk, sector << VIRTIO_BLK_SECTOR_BITS, - bytes) == 0) { - return VIRTIO_BLK_S_OK; - } - } - - return VIRTIO_BLK_S_IOERR; -} - /* Called with server refcount increased, must decrease before returning */ static void coroutine_fn vu_blk_virtio_process_req(void *opaque) { VuBlkReq *req = opaque; VuServer *server = req->server; VuVirtqElement *elem = &req->elem; - uint32_t type; - VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server); BlockBackend *blk = vexp->export.blk; - struct iovec *in_iov = elem->in_sg; struct iovec *out_iov = elem->out_sg; unsigned in_num = elem->in_num; unsigned out_num = elem->out_num; - - /* refer to hw/block/virtio_blk.c */ - if (elem->out_num < 1 || elem->in_num < 1) { - error_report("virtio-blk request missing headers"); - goto err; - } - - if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out, - sizeof(req->out)) != sizeof(req->out))) { - error_report("virtio-blk request outhdr too short"); - goto err; + int in_len; + + in_len = virtio_blk_process_req(blk, vexp->writable, "vhost_user_blk", + in_iov, out_iov, in_num, out_num); + if (in_len < 0) { + free(req); + vhost_user_server_unref(server); + return; } - iov_discard_front(&out_iov, &out_num, sizeof(req->out)); - - if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) { - error_report("virtio-blk request inhdr too short"); - goto err; - } - - /* We always touch the last byte, so just see how big in_iov is. */ - req->in = (void *)in_iov[in_num - 1].iov_base - + in_iov[in_num - 1].iov_len - - sizeof(struct virtio_blk_inhdr); - iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr)); - - type = le32_to_cpu(req->out.type); - switch (type & ~VIRTIO_BLK_T_BARRIER) { - case VIRTIO_BLK_T_IN: - case VIRTIO_BLK_T_OUT: { - QEMUIOVector qiov; - int64_t offset; - ssize_t ret = 0; - bool is_write = type & VIRTIO_BLK_T_OUT; - req->sector_num = le64_to_cpu(req->out.sector); - - if (is_write && !vexp->writable) { - req->in->status = VIRTIO_BLK_S_IOERR; - break; - } - - if (is_write) { - qemu_iovec_init_external(&qiov, out_iov, out_num); - } else { - qemu_iovec_init_external(&qiov, in_iov, in_num); - } - - if (unlikely(!vu_blk_sect_range_ok(vexp, - req->sector_num, - qiov.size))) { - req->in->status = VIRTIO_BLK_S_IOERR; - break; - } - - offset = req->sector_num << VIRTIO_BLK_SECTOR_BITS; - - if (is_write) { - ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0); - } else { - ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0); - } - if (ret >= 0) { - req->in->status = VIRTIO_BLK_S_OK; - } else { - req->in->status = VIRTIO_BLK_S_IOERR; - } - break; - } - case VIRTIO_BLK_T_FLUSH: - if (blk_co_flush(blk) == 0) { - req->in->status = VIRTIO_BLK_S_OK; - } else { - req->in->status = VIRTIO_BLK_S_IOERR; - } - break; - case VIRTIO_BLK_T_GET_ID: { - size_t size = MIN(iov_size(&elem->in_sg[0], in_num), - VIRTIO_BLK_ID_BYTES); - snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk"); - req->in->status = VIRTIO_BLK_S_OK; - req->size = elem->in_sg[0].iov_len; - break; - } - case VIRTIO_BLK_T_DISCARD: - case VIRTIO_BLK_T_WRITE_ZEROES: { - if (!vexp->writable) { - req->in->status = VIRTIO_BLK_S_IOERR; - break; - } - - req->in->status = vu_blk_discard_write_zeroes(vexp, out_iov, out_num, - type); - break; - } - default: - req->in->status = VIRTIO_BLK_S_UNSUPP; - break; - } - - vu_blk_req_complete(req); - vhost_user_server_unref(server); - return; - -err: - free(req); + vu_blk_req_complete(req, in_len); vhost_user_server_unref(server); } @@ -455,12 +235,12 @@ vu_blk_initialize_config(BlockDriverState *bs, config->opt_io_size = cpu_to_le32(1); config->num_queues = cpu_to_le16(num_queues); config->max_discard_sectors = - cpu_to_le32(VHOST_USER_BLK_MAX_DISCARD_SECTORS); + cpu_to_le32(VIRTIO_BLK_MAX_DISCARD_SECTORS); config->max_discard_seg = cpu_to_le32(1); config->discard_sector_alignment = cpu_to_le32(blk_size >> VIRTIO_BLK_SECTOR_BITS); config->max_write_zeroes_sectors - = cpu_to_le32(VHOST_USER_BLK_MAX_WRITE_ZEROES_SECTORS); + = cpu_to_le32(VIRTIO_BLK_MAX_WRITE_ZEROES_SECTORS); config->max_write_zeroes_seg = cpu_to_le32(1); } @@ -494,7 +274,6 @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, error_propagate(errp, local_err); return -EINVAL; } - vexp->blk_size = logical_block_size; blk_set_guest_block_size(exp->blk, logical_block_size); if (vu_opts->has_num_queues) { diff --git a/block/export/virtio-blk-handler.c b/block/export/virtio-blk-handler.c new file mode 100644 index 0000000000..3fe83154ef --- /dev/null +++ b/block/export/virtio-blk-handler.c @@ -0,0 +1,237 @@ +/* + * Handler for virtio-blk I/O + * + * Copyright (c) 2020 Red Hat, Inc. + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: + * Coiby Xu + * Xie Yongji + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "virtio-blk-handler.h" + +#include "standard-headers/linux/virtio_blk.h" + +struct virtio_blk_inhdr { + unsigned char status; +}; + +static bool virtio_blk_sect_range_ok(BlockBackend *blk, + uint64_t sector, size_t size) +{ + uint64_t nb_sectors; + uint64_t total_sectors; + + if (size % VIRTIO_BLK_SECTOR_SIZE) { + return false; + } + + nb_sectors = size >> VIRTIO_BLK_SECTOR_BITS; + + QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE != VIRTIO_BLK_SECTOR_SIZE); + if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) { + return false; + } + if ((sector << VIRTIO_BLK_SECTOR_BITS) % blk_get_guest_block_size(blk)) { + return false; + } + blk_get_geometry(blk, &total_sectors); + if (sector > total_sectors || nb_sectors > total_sectors - sector) { + return false; + } + return true; +} + +static int coroutine_fn +virtio_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov, + uint32_t iovcnt, uint32_t type) +{ + struct virtio_blk_discard_write_zeroes desc; + ssize_t size; + uint64_t sector; + uint32_t num_sectors; + uint32_t max_sectors; + uint32_t flags; + int bytes; + + /* Only one desc is currently supported */ + if (unlikely(iov_size(iov, iovcnt) > sizeof(desc))) { + return VIRTIO_BLK_S_UNSUPP; + } + + size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc)); + if (unlikely(size != sizeof(desc))) { + error_report("Invalid size %zd, expected %zu", size, sizeof(desc)); + return VIRTIO_BLK_S_IOERR; + } + + sector = le64_to_cpu(desc.sector); + num_sectors = le32_to_cpu(desc.num_sectors); + flags = le32_to_cpu(desc.flags); + max_sectors = (type == VIRTIO_BLK_T_WRITE_ZEROES) ? + VIRTIO_BLK_MAX_WRITE_ZEROES_SECTORS : + VIRTIO_BLK_MAX_DISCARD_SECTORS; + + /* This check ensures that 'bytes' fits in an int */ + if (unlikely(num_sectors > max_sectors)) { + return VIRTIO_BLK_S_IOERR; + } + + bytes = num_sectors << VIRTIO_BLK_SECTOR_BITS; + + if (unlikely(!virtio_blk_sect_range_ok(blk, sector, bytes))) { + return VIRTIO_BLK_S_IOERR; + } + + /* + * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard + * and write zeroes commands if any unknown flag is set. + */ + if (unlikely(flags & ~VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) { + return VIRTIO_BLK_S_UNSUPP; + } + + if (type == VIRTIO_BLK_T_WRITE_ZEROES) { + int blk_flags = 0; + + if (flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP) { + blk_flags |= BDRV_REQ_MAY_UNMAP; + } + + if (blk_co_pwrite_zeroes(blk, sector << VIRTIO_BLK_SECTOR_BITS, + bytes, blk_flags) == 0) { + return VIRTIO_BLK_S_OK; + } + } else if (type == VIRTIO_BLK_T_DISCARD) { + /* + * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for + * discard commands if the unmap flag is set. + */ + if (unlikely(flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) { + return VIRTIO_BLK_S_UNSUPP; + } + + if (blk_co_pdiscard(blk, sector << VIRTIO_BLK_SECTOR_BITS, + bytes) == 0) { + return VIRTIO_BLK_S_OK; + } + } + + return VIRTIO_BLK_S_IOERR; +} + +int coroutine_fn virtio_blk_process_req(BlockBackend *blk, bool writable, + const char *serial, + struct iovec *in_iov, + struct iovec *out_iov, + unsigned int in_num, + unsigned int out_num) +{ + struct virtio_blk_inhdr *in; + struct virtio_blk_outhdr out; + uint32_t type; + int in_len; + + if (out_num < 1 || in_num < 1) { + error_report("virtio-blk request missing headers"); + return -EINVAL; + } + + if (unlikely(iov_to_buf(out_iov, out_num, 0, &out, + sizeof(out)) != sizeof(out))) { + error_report("virtio-blk request outhdr too short"); + return -EINVAL; + } + + iov_discard_front(&out_iov, &out_num, sizeof(out)); + + if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) { + error_report("virtio-blk request inhdr too short"); + return -EINVAL; + } + + /* We always touch the last byte, so just see how big in_iov is. */ + in_len = iov_size(in_iov, in_num); + in = (void *)in_iov[in_num - 1].iov_base + + in_iov[in_num - 1].iov_len + - sizeof(struct virtio_blk_inhdr); + iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr)); + + type = le32_to_cpu(out.type); + switch (type & ~VIRTIO_BLK_T_BARRIER) { + case VIRTIO_BLK_T_IN: + case VIRTIO_BLK_T_OUT: { + QEMUIOVector qiov; + int64_t offset; + ssize_t ret = 0; + bool is_write = type & VIRTIO_BLK_T_OUT; + int64_t sector_num = le64_to_cpu(out.sector); + + if (is_write && !writable) { + in->status = VIRTIO_BLK_S_IOERR; + break; + } + + if (is_write) { + qemu_iovec_init_external(&qiov, out_iov, out_num); + } else { + qemu_iovec_init_external(&qiov, in_iov, in_num); + } + + if (unlikely(!virtio_blk_sect_range_ok(blk, sector_num, + qiov.size))) { + in->status = VIRTIO_BLK_S_IOERR; + break; + } + + offset = sector_num << VIRTIO_BLK_SECTOR_BITS; + + if (is_write) { + ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0); + } else { + ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0); + } + if (ret >= 0) { + in->status = VIRTIO_BLK_S_OK; + } else { + in->status = VIRTIO_BLK_S_IOERR; + } + break; + } + case VIRTIO_BLK_T_FLUSH: + if (blk_co_flush(blk) == 0) { + in->status = VIRTIO_BLK_S_OK; + } else { + in->status = VIRTIO_BLK_S_IOERR; + } + break; + case VIRTIO_BLK_T_GET_ID: { + size_t size = MIN(strlen(serial) + 1, + MIN(iov_size(in_iov, in_num), + VIRTIO_BLK_ID_BYTES)); + iov_from_buf(in_iov, in_num, 0, serial, size); + in->status = VIRTIO_BLK_S_OK; + break; + } + case VIRTIO_BLK_T_DISCARD: + case VIRTIO_BLK_T_WRITE_ZEROES: + if (!writable) { + in->status = VIRTIO_BLK_S_IOERR; + break; + } + in->status = virtio_blk_discard_write_zeroes(blk, out_iov, + out_num, type); + break; + default: + in->status = VIRTIO_BLK_S_UNSUPP; + break; + } + + return in_len; +} diff --git a/block/export/virtio-blk-handler.h b/block/export/virtio-blk-handler.h new file mode 100644 index 0000000000..19f907b3f8 --- /dev/null +++ b/block/export/virtio-blk-handler.h @@ -0,0 +1,33 @@ +/* + * Handler for virtio-blk I/O + * + * Copyright (c) 2020 Red Hat, Inc. + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: + * Coiby Xu + * Xie Yongji + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#ifndef VIRTIO_BLK_HANDLER_H +#define VIRTIO_BLK_HANDLER_H + +#include "sysemu/block-backend.h" + +#define VIRTIO_BLK_SECTOR_BITS 9 +#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS) + +#define VIRTIO_BLK_MAX_DISCARD_SECTORS 32768 +#define VIRTIO_BLK_MAX_WRITE_ZEROES_SECTORS 32768 + +int coroutine_fn virtio_blk_process_req(BlockBackend *blk, bool writable, + const char *serial, + struct iovec *in_iov, + struct iovec *out_iov, + unsigned int in_num, + unsigned int out_num); + +#endif /* VHOST_USER_BLK_SERVER_H */ From patchwork Wed May 4 07:40:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12837290 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AEC69C433F5 for ; Wed, 4 May 2022 07:59:09 +0000 (UTC) Received: from localhost ([::1]:49928 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nm9ue-0002Gm-Ps for qemu-devel@archiver.kernel.org; Wed, 04 May 2022 03:59:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52440) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nm9d4-0002cm-Dc for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:02 -0400 Received: from mail-pg1-x532.google.com ([2607:f8b0:4864:20::532]:44567) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nm9d0-0006zX-8l for qemu-devel@nongnu.org; Wed, 04 May 2022 03:40:56 -0400 Received: by mail-pg1-x532.google.com with SMTP id v10so518911pgl.11 for ; Wed, 04 May 2022 00:40:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JFy5oldQsCcRDdubSm6Gbxtaov+f1/NOMG8hKhIB3B8=; b=kHU4VmPbIitvgkC3HUknw+FdqXZvkZFbIV8IcphLknDX1tLoziT5aU+AtsNPXjPNIc 7kgJdF0HHkuuF1uRt8vXzGX7eBiQMeI+ql1wNKG5RtBzVsn0LHou7KJJkW9CIsGbPKkF KKvbS3U9yjWuVQTdgjS8ZbGZDAOTZXX45rSU0/GuK+bsceBPAMQzdb454qaNqziTMpLh 5cS0iarmG9Vln7wp9OUtt2lbQPb4chEycWRFcQQdhXgM/4STw6VbKctx2CzpIY+469kd aMi0nn9+lF+MSi/bDNjiy+dkYJns05YTMqF6AcF6tnjlW6qG1iBtOTEYtA0Ki1PHSMxX Vzgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JFy5oldQsCcRDdubSm6Gbxtaov+f1/NOMG8hKhIB3B8=; b=kipgnk6bBqavhcPIJ0CpRMB86RUSdeMoUykLOxsETVMHAc5bLQEKwpCY5Kxx5dEi8g U0OFYYsdXmlNnFIOf+FNkP4wl7ZBPX/UdV0eGbTzm2KGKvxPXa8Nr6I2NUttV1rmtUVL zUmgcVAeOXH0ZYyew58WYn6m142wFOtjDGs682c3p/jWOL5ZJxLZB0TSOSP/2ohY0zfF +VH7WMjipAfOEhmPJ0UHIrtGxL34DCmTOV+mVgUfGBRt446asAFXZzHzYjyU2TZcz9Vz 8eZu1fgNWYxU1JMMVo3PqB7Kbj82zdHiOmtwzT+5uFZK7eeN/BorWKX0MQET8xItQpHa WYxA== X-Gm-Message-State: AOAM5318H5xNheT1bd5wXNwdvWvZaY2X/nwqRsMcRhOIVPHUf2Dm0sz9 lIZdIoPBdMVUEhV2SBM5h8OE X-Google-Smtp-Source: ABdhPJyXzyYrrv9oeORFUAHUIrJ3WMfThvyq8ca+75h3K47NirNrCQFhXPP+3UnJsTvCq8hMY8JPvw== X-Received: by 2002:a05:6a00:8c5:b0:4fe:134d:30d3 with SMTP id s5-20020a056a0008c500b004fe134d30d3mr19858144pfu.3.1651650052888; Wed, 04 May 2022 00:40:52 -0700 (PDT) Received: from localhost ([139.177.225.233]) by smtp.gmail.com with ESMTPSA id j12-20020a62e90c000000b0050dc76281e7sm7571723pfh.193.2022.05.04.00.40.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 00:40:52 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com, jsnow@redhat.com, eblake@redhat.com, Coiby.Xu@gmail.com, hreitz@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v5 4/8] linux-headers: Add vduse.h Date: Wed, 4 May 2022 15:40:47 +0800 Message-Id: <20220504074051.90-5-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220504074051.90-1-xieyongji@bytedance.com> References: <20220504074051.90-1-xieyongji@bytedance.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::532; envelope-from=xieyongji@bytedance.com; helo=mail-pg1-x532.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This adds vduse header to linux headers so that the relevant VDUSE API can be used in subsequent patches. Signed-off-by: Xie Yongji Reviewed-by: Stefan Hajnoczi --- linux-headers/linux/vduse.h | 306 ++++++++++++++++++++++++++++++++ scripts/update-linux-headers.sh | 2 +- 2 files changed, 307 insertions(+), 1 deletion(-) create mode 100644 linux-headers/linux/vduse.h diff --git a/linux-headers/linux/vduse.h b/linux-headers/linux/vduse.h new file mode 100644 index 0000000000..d47b004ce6 --- /dev/null +++ b/linux-headers/linux/vduse.h @@ -0,0 +1,306 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _VDUSE_H_ +#define _VDUSE_H_ + +#include + +#define VDUSE_BASE 0x81 + +/* The ioctls for control device (/dev/vduse/control) */ + +#define VDUSE_API_VERSION 0 + +/* + * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION). + * This is used for future extension. + */ +#define VDUSE_GET_API_VERSION _IOR(VDUSE_BASE, 0x00, __u64) + +/* Set the version of VDUSE API that userspace supported. */ +#define VDUSE_SET_API_VERSION _IOW(VDUSE_BASE, 0x01, __u64) + +/** + * struct vduse_dev_config - basic configuration of a VDUSE device + * @name: VDUSE device name, needs to be NUL terminated + * @vendor_id: virtio vendor id + * @device_id: virtio device id + * @features: virtio features + * @vq_num: the number of virtqueues + * @vq_align: the allocation alignment of virtqueue's metadata + * @reserved: for future use, needs to be initialized to zero + * @config_size: the size of the configuration space + * @config: the buffer of the configuration space + * + * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device. + */ +struct vduse_dev_config { +#define VDUSE_NAME_MAX 256 + char name[VDUSE_NAME_MAX]; + __u32 vendor_id; + __u32 device_id; + __u64 features; + __u32 vq_num; + __u32 vq_align; + __u32 reserved[13]; + __u32 config_size; + __u8 config[]; +}; + +/* Create a VDUSE device which is represented by a char device (/dev/vduse/$NAME) */ +#define VDUSE_CREATE_DEV _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config) + +/* + * Destroy a VDUSE device. Make sure there are no more references + * to the char device (/dev/vduse/$NAME). + */ +#define VDUSE_DESTROY_DEV _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX]) + +/* The ioctls for VDUSE device (/dev/vduse/$NAME) */ + +/** + * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region [start, last] + * @offset: the mmap offset on returned file descriptor + * @start: start of the IOVA region + * @last: last of the IOVA region + * @perm: access permission of the IOVA region + * + * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region. + */ +struct vduse_iotlb_entry { + __u64 offset; + __u64 start; + __u64 last; +#define VDUSE_ACCESS_RO 0x1 +#define VDUSE_ACCESS_WO 0x2 +#define VDUSE_ACCESS_RW 0x3 + __u8 perm; +}; + +/* + * Find the first IOVA region that overlaps with the range [start, last] + * and return the corresponding file descriptor. Return -EINVAL means the + * IOVA region doesn't exist. Caller should set start and last fields. + */ +#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct vduse_iotlb_entry) + +/* + * Get the negotiated virtio features. It's a subset of the features in + * struct vduse_dev_config which can be accepted by virtio driver. It's + * only valid after FEATURES_OK status bit is set. + */ +#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64) + +/** + * struct vduse_config_data - data used to update configuration space + * @offset: the offset from the beginning of configuration space + * @length: the length to write to configuration space + * @buffer: the buffer used to write from + * + * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device + * configuration space. + */ +struct vduse_config_data { + __u32 offset; + __u32 length; + __u8 buffer[]; +}; + +/* Set device configuration space */ +#define VDUSE_DEV_SET_CONFIG _IOW(VDUSE_BASE, 0x12, struct vduse_config_data) + +/* + * Inject a config interrupt. It's usually used to notify virtio driver + * that device configuration space has changed. + */ +#define VDUSE_DEV_INJECT_CONFIG_IRQ _IO(VDUSE_BASE, 0x13) + +/** + * struct vduse_vq_config - basic configuration of a virtqueue + * @index: virtqueue index + * @max_size: the max size of virtqueue + * @reserved: for future use, needs to be initialized to zero + * + * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue. + */ +struct vduse_vq_config { + __u32 index; + __u16 max_size; + __u16 reserved[13]; +}; + +/* + * Setup the specified virtqueue. Make sure all virtqueues have been + * configured before the device is attached to vDPA bus. + */ +#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config) + +/** + * struct vduse_vq_state_split - split virtqueue state + * @avail_index: available index + */ +struct vduse_vq_state_split { + __u16 avail_index; +}; + +/** + * struct vduse_vq_state_packed - packed virtqueue state + * @last_avail_counter: last driver ring wrap counter observed by device + * @last_avail_idx: device available index + * @last_used_counter: device ring wrap counter + * @last_used_idx: used index + */ +struct vduse_vq_state_packed { + __u16 last_avail_counter; + __u16 last_avail_idx; + __u16 last_used_counter; + __u16 last_used_idx; +}; + +/** + * struct vduse_vq_info - information of a virtqueue + * @index: virtqueue index + * @num: the size of virtqueue + * @desc_addr: address of desc area + * @driver_addr: address of driver area + * @device_addr: address of device area + * @split: split virtqueue state + * @packed: packed virtqueue state + * @ready: ready status of virtqueue + * + * Structure used by VDUSE_VQ_GET_INFO ioctl to get virtqueue's information. + */ +struct vduse_vq_info { + __u32 index; + __u32 num; + __u64 desc_addr; + __u64 driver_addr; + __u64 device_addr; + union { + struct vduse_vq_state_split split; + struct vduse_vq_state_packed packed; + }; + __u8 ready; +}; + +/* Get the specified virtqueue's information. Caller should set index field. */ +#define VDUSE_VQ_GET_INFO _IOWR(VDUSE_BASE, 0x15, struct vduse_vq_info) + +/** + * struct vduse_vq_eventfd - eventfd configuration for a virtqueue + * @index: virtqueue index + * @fd: eventfd, -1 means de-assigning the eventfd + * + * Structure used by VDUSE_VQ_SETUP_KICKFD ioctl to setup kick eventfd. + */ +struct vduse_vq_eventfd { + __u32 index; +#define VDUSE_EVENTFD_DEASSIGN -1 + int fd; +}; + +/* + * Setup kick eventfd for specified virtqueue. The kick eventfd is used + * by VDUSE kernel module to notify userspace to consume the avail vring. + */ +#define VDUSE_VQ_SETUP_KICKFD _IOW(VDUSE_BASE, 0x16, struct vduse_vq_eventfd) + +/* + * Inject an interrupt for specific virtqueue. It's used to notify virtio driver + * to consume the used vring. + */ +#define VDUSE_VQ_INJECT_IRQ _IOW(VDUSE_BASE, 0x17, __u32) + +/* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */ + +/** + * enum vduse_req_type - request type + * @VDUSE_GET_VQ_STATE: get the state for specified virtqueue from userspace + * @VDUSE_SET_STATUS: set the device status + * @VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for + * specified IOVA range via VDUSE_IOTLB_GET_FD ioctl + */ +enum vduse_req_type { + VDUSE_GET_VQ_STATE, + VDUSE_SET_STATUS, + VDUSE_UPDATE_IOTLB, +}; + +/** + * struct vduse_vq_state - virtqueue state + * @index: virtqueue index + * @split: split virtqueue state + * @packed: packed virtqueue state + */ +struct vduse_vq_state { + __u32 index; + union { + struct vduse_vq_state_split split; + struct vduse_vq_state_packed packed; + }; +}; + +/** + * struct vduse_dev_status - device status + * @status: device status + */ +struct vduse_dev_status { + __u8 status; +}; + +/** + * struct vduse_iova_range - IOVA range [start, last] + * @start: start of the IOVA range + * @last: last of the IOVA range + */ +struct vduse_iova_range { + __u64 start; + __u64 last; +}; + +/** + * struct vduse_dev_request - control request + * @type: request type + * @request_id: request id + * @reserved: for future use + * @vq_state: virtqueue state, only index field is available + * @s: device status + * @iova: IOVA range for updating + * @padding: padding + * + * Structure used by read(2) on /dev/vduse/$NAME. + */ +struct vduse_dev_request { + __u32 type; + __u32 request_id; + __u32 reserved[4]; + union { + struct vduse_vq_state vq_state; + struct vduse_dev_status s; + struct vduse_iova_range iova; + __u32 padding[32]; + }; +}; + +/** + * struct vduse_dev_response - response to control request + * @request_id: corresponding request id + * @result: the result of request + * @reserved: for future use, needs to be initialized to zero + * @vq_state: virtqueue state + * @padding: padding + * + * Structure used by write(2) on /dev/vduse/$NAME. + */ +struct vduse_dev_response { + __u32 request_id; +#define VDUSE_REQ_RESULT_OK 0x00 +#define VDUSE_REQ_RESULT_FAILED 0x01 + __u32 result; + __u32 reserved[4]; + union { + struct vduse_vq_state vq_state; + __u32 padding[32]; + }; +}; + +#endif /* _VDUSE_H_ */ diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 839a5ec614..b1ad99cba8 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -161,7 +161,7 @@ done rm -rf "$output/linux-headers/linux" mkdir -p "$output/linux-headers/linux" for header in kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \ - psci.h psp-sev.h userfaultfd.h mman.h; do + psci.h psp-sev.h userfaultfd.h mman.h vduse.h; do cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux" done From patchwork Wed May 4 07:40:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12837294 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DED05C433EF for ; Wed, 4 May 2022 08:10:18 +0000 (UTC) Received: from localhost ([::1]:57592 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nmA5R-000818-W3 for qemu-devel@archiver.kernel.org; Wed, 04 May 2022 04:10:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52558) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nm9dD-0002d5-6t for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:20 -0400 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]:36823) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nm9d4-00071j-FF for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:06 -0400 Received: by mail-pl1-x631.google.com with SMTP id j14so725185plx.3 for ; Wed, 04 May 2022 00:40:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qkkPcdiU131/DjQPREghV5QoLOEEv6F39/IE5pShiSk=; b=bC56yj3Mmy/d0vy6onPsXNSz80a58nDQQJ1GT07e9wbKzS7U4FGRN9jJ1mqSGa20hU 1d7mFIePwHhPPcMCqT5qCZ/qGR9PwEolD6VTV3bhoVySomy7VyXdHa9/heQKJN63L7wI 7C3LRWAa10JYYoZkTo8o8Q7Z3XIq4GjctA/Wcb+ZdJ9BvccX1m3Ys8Z4jomYt4fwGZE9 4kuDFEf2uuouSAtiSs1vRsGJFpPJhxhjqLMgTIvTuxy3/XXY4h2HiJPkHy4lQiXt2+Dp wPdfP65371uYlR4CnAvVUCQVKd0KaKpvhUzsZZKeU5WhQk/HlaG/gJP5ONlPnpZbD7j+ Tg+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qkkPcdiU131/DjQPREghV5QoLOEEv6F39/IE5pShiSk=; b=wK+nVPA2NFDSRaUkmYJw3UbyF9uB8bzzddBiIeHxP3Pq0uK/UlkoDeF/yLNMRaOrJM DePGKLrXvLbREdR00tSJS8LJXSYxgIOfySNsH9ldXUEESUyZ+59GygreFK79lLZlVyS2 KvCmO4MRBKq3qshKSgwH56uoTfjsSw0paUFpsYD53Db0HWo0k+XIg64Ma17hWsXDGQA+ wROKdLuabUHyLpxnUQQ3ISStPbZXGWe2l0uJaBFF/2U4UdvopCWJKwjyiUpUDYGRvL36 RZgeh5ass2u3gSgnathMmQZqQiDPQ/xSUG03mfe0gAANAFomLidwLZ7PdxsYXajviWbV cDcQ== X-Gm-Message-State: AOAM531qjg+kUwIXJKkCPXOu3Js7q38BcAcNumpxSglFKV6jAPLnma6X Ex2G9Qx3qDphAHLzlrRyE3fz X-Google-Smtp-Source: ABdhPJzN7yTKKhcy06opycK8LHD4lvOB1JQ8QNWE6luLtbSyTgGGmwxNQXZjOBuVq9sDP0ULMUlLAg== X-Received: by 2002:a17:90a:930b:b0:1d5:684b:8e13 with SMTP id p11-20020a17090a930b00b001d5684b8e13mr8943477pjo.153.1651650056985; Wed, 04 May 2022 00:40:56 -0700 (PDT) Received: from localhost ([139.177.225.233]) by smtp.gmail.com with ESMTPSA id m1-20020a637d41000000b003c14af5063esm14235702pgn.86.2022.05.04.00.40.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 00:40:56 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com, jsnow@redhat.com, eblake@redhat.com, Coiby.Xu@gmail.com, hreitz@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v5 5/8] libvduse: Add VDUSE (vDPA Device in Userspace) library Date: Wed, 4 May 2022 15:40:48 +0800 Message-Id: <20220504074051.90-6-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220504074051.90-1-xieyongji@bytedance.com> References: <20220504074051.90-1-xieyongji@bytedance.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::631; envelope-from=xieyongji@bytedance.com; helo=mail-pl1-x631.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" VDUSE [1] is a linux framework that makes it possible to implement software-emulated vDPA devices in userspace. This adds a library as a subproject to help implementing VDUSE backends in QEMU. [1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html Signed-off-by: Xie Yongji --- MAINTAINERS | 5 + meson.build | 15 + meson_options.txt | 2 + scripts/meson-buildoptions.sh | 3 + subprojects/libvduse/include/atomic.h | 1 + subprojects/libvduse/include/compiler.h | 1 + subprojects/libvduse/libvduse.c | 1161 +++++++++++++++++++ subprojects/libvduse/libvduse.h | 235 ++++ subprojects/libvduse/linux-headers/linux | 1 + subprojects/libvduse/meson.build | 10 + subprojects/libvduse/standard-headers/linux | 1 + 11 files changed, 1435 insertions(+) create mode 120000 subprojects/libvduse/include/atomic.h create mode 120000 subprojects/libvduse/include/compiler.h create mode 100644 subprojects/libvduse/libvduse.c create mode 100644 subprojects/libvduse/libvduse.h create mode 120000 subprojects/libvduse/linux-headers/linux create mode 100644 subprojects/libvduse/meson.build create mode 120000 subprojects/libvduse/standard-headers/linux diff --git a/MAINTAINERS b/MAINTAINERS index 4113b6fc5c..6de3cbaa1e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3550,6 +3550,11 @@ L: qemu-block@nongnu.org S: Supported F: block/export/fuse.c +VDUSE library +M: Xie Yongji +S: Maintained +F: subprojects/libvduse/ + Replication M: Wen Congyang M: Xie Changlong diff --git a/meson.build b/meson.build index 1fe7d257ff..4a0f1a2016 100644 --- a/meson.build +++ b/meson.build @@ -1392,6 +1392,21 @@ if get_option('fuse_lseek').allowed() endif endif +have_libvduse = (targetos == 'linux') +if get_option('libvduse').enabled() + if targetos != 'linux' + error('libvduse requires linux') + endif +elif get_option('libvduse').disabled() + have_libvduse = false +endif + +libvduse = not_found +if have_libvduse + libvduse_proj = subproject('libvduse') + libvduse = libvduse_proj.get_variable('libvduse_dep') +endif + # libbpf libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config') if libbpf.found() and not cc.links(''' diff --git a/meson_options.txt b/meson_options.txt index af432a4ee6..47955f972d 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -231,6 +231,8 @@ option('virtfs', type: 'feature', value: 'auto', description: 'virtio-9p support') option('virtiofsd', type: 'feature', value: 'auto', description: 'build virtiofs daemon (virtiofsd)') +option('libvduse', type: 'feature', value: 'auto', + description: 'build VDUSE Library') option('capstone', type: 'combo', value: 'auto', choices: ['disabled', 'enabled', 'auto', 'system', 'internal'], diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh index 21366b2102..f725636ea8 100644 --- a/scripts/meson-buildoptions.sh +++ b/scripts/meson-buildoptions.sh @@ -81,6 +81,7 @@ meson_options_help() { printf "%s\n" ' libssh ssh block device support' printf "%s\n" ' libudev Use libudev to enumerate host devices' printf "%s\n" ' libusb libusb support for USB passthrough' + printf "%s\n" ' libvduse build VDUSE Library' printf "%s\n" ' linux-aio Linux AIO support' printf "%s\n" ' linux-io-uring Linux io_uring support' printf "%s\n" ' live-block-migration' @@ -255,6 +256,8 @@ _meson_option_parse() { --disable-libudev) printf "%s" -Dlibudev=disabled ;; --enable-libusb) printf "%s" -Dlibusb=enabled ;; --disable-libusb) printf "%s" -Dlibusb=disabled ;; + --enable-libvduse) printf "%s" -Dlibvduse=enabled ;; + --disable-libvduse) printf "%s" -Dlibvduse=disabled ;; --enable-linux-aio) printf "%s" -Dlinux_aio=enabled ;; --disable-linux-aio) printf "%s" -Dlinux_aio=disabled ;; --enable-linux-io-uring) printf "%s" -Dlinux_io_uring=enabled ;; diff --git a/subprojects/libvduse/include/atomic.h b/subprojects/libvduse/include/atomic.h new file mode 120000 index 0000000000..8c2be64f7b --- /dev/null +++ b/subprojects/libvduse/include/atomic.h @@ -0,0 +1 @@ +../../../include/qemu/atomic.h \ No newline at end of file diff --git a/subprojects/libvduse/include/compiler.h b/subprojects/libvduse/include/compiler.h new file mode 120000 index 0000000000..de7b70697c --- /dev/null +++ b/subprojects/libvduse/include/compiler.h @@ -0,0 +1 @@ +../../../include/qemu/compiler.h \ No newline at end of file diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c new file mode 100644 index 0000000000..ecee9c0568 --- /dev/null +++ b/subprojects/libvduse/libvduse.c @@ -0,0 +1,1161 @@ +/* + * VDUSE (vDPA Device in Userspace) library + * + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved. + * Portions of codes and concepts borrowed from libvhost-user.c, so: + * Copyright IBM, Corp. 2007 + * Copyright (c) 2016 Red Hat, Inc. + * + * Author: + * Xie Yongji + * Anthony Liguori + * Marc-André Lureau + * Victor Kaplansky + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "include/atomic.h" +#include "linux-headers/linux/virtio_ring.h" +#include "linux-headers/linux/virtio_config.h" +#include "linux-headers/linux/vduse.h" +#include "libvduse.h" + +#define VDUSE_VQ_ALIGN 4096 +#define MAX_IOVA_REGIONS 256 + +/* Round number down to multiple */ +#define ALIGN_DOWN(n, m) ((n) / (m) * (m)) + +/* Round number up to multiple */ +#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m)) + +#ifndef unlikely +#define unlikely(x) __builtin_expect(!!(x), 0) +#endif + +typedef struct VduseRing { + unsigned int num; + uint64_t desc_addr; + uint64_t avail_addr; + uint64_t used_addr; + struct vring_desc *desc; + struct vring_avail *avail; + struct vring_used *used; +} VduseRing; + +struct VduseVirtq { + VduseRing vring; + uint16_t last_avail_idx; + uint16_t shadow_avail_idx; + uint16_t used_idx; + uint16_t signalled_used; + bool signalled_used_valid; + int index; + int inuse; + bool ready; + int fd; + VduseDev *dev; +}; + +typedef struct VduseIovaRegion { + uint64_t iova; + uint64_t size; + uint64_t mmap_offset; + uint64_t mmap_addr; +} VduseIovaRegion; + +struct VduseDev { + VduseVirtq *vqs; + VduseIovaRegion regions[MAX_IOVA_REGIONS]; + int num_regions; + char *name; + uint32_t device_id; + uint32_t vendor_id; + uint16_t num_queues; + uint16_t queue_size; + uint64_t features; + const VduseOps *ops; + int fd; + int ctrl_fd; + void *priv; +}; + +static inline bool has_feature(uint64_t features, unsigned int fbit) +{ + assert(fbit < 64); + return !!(features & (1ULL << fbit)); +} + +static inline bool vduse_dev_has_feature(VduseDev *dev, unsigned int fbit) +{ + return has_feature(dev->features, fbit); +} + +uint64_t vduse_get_virtio_features(void) +{ + return (1ULL << VIRTIO_F_IOMMU_PLATFORM) | + (1ULL << VIRTIO_F_VERSION_1) | + (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | + (1ULL << VIRTIO_RING_F_EVENT_IDX) | + (1ULL << VIRTIO_RING_F_INDIRECT_DESC); +} + +VduseDev *vduse_queue_get_dev(VduseVirtq *vq) +{ + return vq->dev; +} + +int vduse_queue_get_fd(VduseVirtq *vq) +{ + return vq->fd; +} + +void *vduse_dev_get_priv(VduseDev *dev) +{ + return dev->priv; +} + +VduseVirtq *vduse_dev_get_queue(VduseDev *dev, int index) +{ + return &dev->vqs[index]; +} + +int vduse_dev_get_fd(VduseDev *dev) +{ + return dev->fd; +} + +static int vduse_inject_irq(VduseDev *dev, int index) +{ + return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index); +} + +static void vduse_iova_remove_region(VduseDev *dev, uint64_t start, + uint64_t last) +{ + int i; + + if (last == start) { + return; + } + + for (i = 0; i < MAX_IOVA_REGIONS; i++) { + if (!dev->regions[i].mmap_addr) { + continue; + } + + if (start <= dev->regions[i].iova && + last >= (dev->regions[i].iova + dev->regions[i].size - 1)) { + munmap((void *)dev->regions[i].mmap_addr, + dev->regions[i].mmap_offset + dev->regions[i].size); + dev->regions[i].mmap_addr = 0; + dev->num_regions--; + } + } +} + +static int vduse_iova_add_region(VduseDev *dev, int fd, + uint64_t offset, uint64_t start, + uint64_t last, int prot) +{ + int i; + uint64_t size = last - start + 1; + void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0); + + if (mmap_addr == MAP_FAILED) { + close(fd); + return -EINVAL; + } + + for (i = 0; i < MAX_IOVA_REGIONS; i++) { + if (!dev->regions[i].mmap_addr) { + dev->regions[i].mmap_addr = (uint64_t)(uintptr_t)mmap_addr; + dev->regions[i].mmap_offset = offset; + dev->regions[i].iova = start; + dev->regions[i].size = size; + dev->num_regions++; + break; + } + } + assert(i < MAX_IOVA_REGIONS); + close(fd); + + return 0; +} + +static int perm_to_prot(uint8_t perm) +{ + int prot = 0; + + switch (perm) { + case VDUSE_ACCESS_WO: + prot |= PROT_WRITE; + break; + case VDUSE_ACCESS_RO: + prot |= PROT_READ; + break; + case VDUSE_ACCESS_RW: + prot |= PROT_READ | PROT_WRITE; + break; + default: + break; + } + + return prot; +} + +static inline void *iova_to_va(VduseDev *dev, uint64_t *plen, uint64_t iova) +{ + int i, ret; + struct vduse_iotlb_entry entry; + + for (i = 0; i < MAX_IOVA_REGIONS; i++) { + VduseIovaRegion *r = &dev->regions[i]; + + if (!r->mmap_addr) { + continue; + } + + if ((iova >= r->iova) && (iova < (r->iova + r->size))) { + if ((iova + *plen) > (r->iova + r->size)) { + *plen = r->iova + r->size - iova; + } + return (void *)(uintptr_t)(iova - r->iova + + r->mmap_addr + r->mmap_offset); + } + } + + entry.start = iova; + entry.last = iova + 1; + ret = ioctl(dev->fd, VDUSE_IOTLB_GET_FD, &entry); + if (ret < 0) { + return NULL; + } + + if (!vduse_iova_add_region(dev, ret, entry.offset, entry.start, + entry.last, perm_to_prot(entry.perm))) { + return iova_to_va(dev, plen, iova); + } + + return NULL; +} + +static inline uint16_t vring_avail_flags(VduseVirtq *vq) +{ + return le16toh(vq->vring.avail->flags); +} + +static inline uint16_t vring_avail_idx(VduseVirtq *vq) +{ + vq->shadow_avail_idx = le16toh(vq->vring.avail->idx); + + return vq->shadow_avail_idx; +} + +static inline uint16_t vring_avail_ring(VduseVirtq *vq, int i) +{ + return le16toh(vq->vring.avail->ring[i]); +} + +static inline uint16_t vring_get_used_event(VduseVirtq *vq) +{ + return vring_avail_ring(vq, vq->vring.num); +} + +static bool vduse_queue_get_head(VduseVirtq *vq, unsigned int idx, + unsigned int *head) +{ + /* + * Grab the next descriptor number they're advertising, and increment + * the index we've seen. + */ + *head = vring_avail_ring(vq, idx % vq->vring.num); + + /* If their number is silly, that's a fatal mistake. */ + if (*head >= vq->vring.num) { + fprintf(stderr, "Guest says index %u is available\n", *head); + return false; + } + + return true; +} + +static int +vduse_queue_read_indirect_desc(VduseDev *dev, struct vring_desc *desc, + uint64_t addr, size_t len) +{ + struct vring_desc *ori_desc; + uint64_t read_len; + + if (len > (VIRTQUEUE_MAX_SIZE * sizeof(struct vring_desc))) { + return -1; + } + + if (len == 0) { + return -1; + } + + while (len) { + read_len = len; + ori_desc = iova_to_va(dev, &read_len, addr); + if (!ori_desc) { + return -1; + } + + memcpy(desc, ori_desc, read_len); + len -= read_len; + addr += read_len; + desc += read_len; + } + + return 0; +} + +enum { + VIRTQUEUE_READ_DESC_ERROR = -1, + VIRTQUEUE_READ_DESC_DONE = 0, /* end of chain */ + VIRTQUEUE_READ_DESC_MORE = 1, /* more buffers in chain */ +}; + +static int vduse_queue_read_next_desc(struct vring_desc *desc, int i, + unsigned int max, unsigned int *next) +{ + /* If this descriptor says it doesn't chain, we're done. */ + if (!(le16toh(desc[i].flags) & VRING_DESC_F_NEXT)) { + return VIRTQUEUE_READ_DESC_DONE; + } + + /* Check they're not leading us off end of descriptors. */ + *next = desc[i].next; + /* Make sure compiler knows to grab that: we don't want it changing! */ + smp_wmb(); + + if (*next >= max) { + fprintf(stderr, "Desc next is %u\n", *next); + return VIRTQUEUE_READ_DESC_ERROR; + } + + return VIRTQUEUE_READ_DESC_MORE; +} + +/* + * Fetch avail_idx from VQ memory only when we really need to know if + * guest has added some buffers. + */ +static bool vduse_queue_empty(VduseVirtq *vq) +{ + if (unlikely(!vq->vring.avail)) { + return true; + } + + if (vq->shadow_avail_idx != vq->last_avail_idx) { + return false; + } + + return vring_avail_idx(vq) == vq->last_avail_idx; +} + +static bool vduse_queue_should_notify(VduseVirtq *vq) +{ + VduseDev *dev = vq->dev; + uint16_t old, new; + bool v; + + /* We need to expose used array entries before checking used event. */ + smp_mb(); + + /* Always notify when queue is empty (when feature acknowledge) */ + if (vduse_dev_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY) && + !vq->inuse && vduse_queue_empty(vq)) { + return true; + } + + if (!vduse_dev_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) { + return !(vring_avail_flags(vq) & VRING_AVAIL_F_NO_INTERRUPT); + } + + v = vq->signalled_used_valid; + vq->signalled_used_valid = true; + old = vq->signalled_used; + new = vq->signalled_used = vq->used_idx; + return !v || vring_need_event(vring_get_used_event(vq), new, old); +} + +void vduse_queue_notify(VduseVirtq *vq) +{ + VduseDev *dev = vq->dev; + + if (unlikely(!vq->vring.avail)) { + return; + } + + if (!vduse_queue_should_notify(vq)) { + return; + } + + if (vduse_inject_irq(dev, vq->index) < 0) { + fprintf(stderr, "Error inject irq for vq %d: %s\n", + vq->index, strerror(errno)); + } +} + +static inline void vring_used_flags_set_bit(VduseVirtq *vq, int mask) +{ + uint16_t *flags; + + flags = (uint16_t *)((char*)vq->vring.used + + offsetof(struct vring_used, flags)); + *flags = htole16(le16toh(*flags) | mask); +} + +static inline void vring_used_flags_unset_bit(VduseVirtq *vq, int mask) +{ + uint16_t *flags; + + flags = (uint16_t *)((char*)vq->vring.used + + offsetof(struct vring_used, flags)); + *flags = htole16(le16toh(*flags) & ~mask); +} + +static inline void vring_set_avail_event(VduseVirtq *vq, uint16_t val) +{ + *((uint16_t *)&vq->vring.used->ring[vq->vring.num]) = htole16(val); +} + +static bool vduse_queue_map_single_desc(VduseVirtq *vq, unsigned int *p_num_sg, + struct iovec *iov, unsigned int max_num_sg, + bool is_write, uint64_t pa, size_t sz) +{ + unsigned num_sg = *p_num_sg; + VduseDev *dev = vq->dev; + + assert(num_sg <= max_num_sg); + + if (!sz) { + fprintf(stderr, "virtio: zero sized buffers are not allowed\n"); + return false; + } + + while (sz) { + uint64_t len = sz; + + if (num_sg == max_num_sg) { + fprintf(stderr, + "virtio: too many descriptors in indirect table\n"); + return false; + } + + iov[num_sg].iov_base = iova_to_va(dev, &len, pa); + if (iov[num_sg].iov_base == NULL) { + fprintf(stderr, "virtio: invalid address for buffers\n"); + return false; + } + iov[num_sg++].iov_len = len; + sz -= len; + pa += len; + } + + *p_num_sg = num_sg; + return true; +} + +static void *vduse_queue_alloc_element(size_t sz, unsigned out_num, + unsigned in_num) +{ + VduseVirtqElement *elem; + size_t in_sg_ofs = ALIGN_UP(sz, __alignof__(elem->in_sg[0])); + size_t out_sg_ofs = in_sg_ofs + in_num * sizeof(elem->in_sg[0]); + size_t out_sg_end = out_sg_ofs + out_num * sizeof(elem->out_sg[0]); + + assert(sz >= sizeof(VduseVirtqElement)); + elem = malloc(out_sg_end); + if (!elem) { + return NULL; + } + elem->out_num = out_num; + elem->in_num = in_num; + elem->in_sg = (void *)elem + in_sg_ofs; + elem->out_sg = (void *)elem + out_sg_ofs; + return elem; +} + +static void *vduse_queue_map_desc(VduseVirtq *vq, unsigned int idx, size_t sz) +{ + struct vring_desc *desc = vq->vring.desc; + VduseDev *dev = vq->dev; + uint64_t desc_addr, read_len; + unsigned int desc_len; + unsigned int max = vq->vring.num; + unsigned int i = idx; + VduseVirtqElement *elem; + struct iovec iov[VIRTQUEUE_MAX_SIZE]; + struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE]; + unsigned int out_num = 0, in_num = 0; + int rc; + + if (le16toh(desc[i].flags) & VRING_DESC_F_INDIRECT) { + if (le32toh(desc[i].len) % sizeof(struct vring_desc)) { + fprintf(stderr, "Invalid size for indirect buffer table\n"); + return NULL; + } + + /* loop over the indirect descriptor table */ + desc_addr = le64toh(desc[i].addr); + desc_len = le32toh(desc[i].len); + max = desc_len / sizeof(struct vring_desc); + read_len = desc_len; + desc = iova_to_va(dev, &read_len, desc_addr); + if (unlikely(desc && read_len != desc_len)) { + /* Failed to use zero copy */ + desc = NULL; + if (!vduse_queue_read_indirect_desc(dev, desc_buf, + desc_addr, + desc_len)) { + desc = desc_buf; + } + } + if (!desc) { + fprintf(stderr, "Invalid indirect buffer table\n"); + return NULL; + } + i = 0; + } + + /* Collect all the descriptors */ + do { + if (le16toh(desc[i].flags) & VRING_DESC_F_WRITE) { + if (!vduse_queue_map_single_desc(vq, &in_num, iov + out_num, + VIRTQUEUE_MAX_SIZE - out_num, + true, le64toh(desc[i].addr), + le32toh(desc[i].len))) { + return NULL; + } + } else { + if (in_num) { + fprintf(stderr, "Incorrect order for descriptors\n"); + return NULL; + } + if (!vduse_queue_map_single_desc(vq, &out_num, iov, + VIRTQUEUE_MAX_SIZE, false, + le64toh(desc[i].addr), + le32toh(desc[i].len))) { + return NULL; + } + } + + /* If we've got too many, that implies a descriptor loop. */ + if ((in_num + out_num) > max) { + fprintf(stderr, "Looped descriptor\n"); + return NULL; + } + rc = vduse_queue_read_next_desc(desc, i, max, &i); + } while (rc == VIRTQUEUE_READ_DESC_MORE); + + if (rc == VIRTQUEUE_READ_DESC_ERROR) { + fprintf(stderr, "read descriptor error\n"); + return NULL; + } + + /* Now copy what we have collected and mapped */ + elem = vduse_queue_alloc_element(sz, out_num, in_num); + if (!elem) { + fprintf(stderr, "read descriptor error\n"); + return NULL; + } + elem->index = idx; + for (i = 0; i < out_num; i++) { + elem->out_sg[i] = iov[i]; + } + for (i = 0; i < in_num; i++) { + elem->in_sg[i] = iov[out_num + i]; + } + + return elem; +} + +void *vduse_queue_pop(VduseVirtq *vq, size_t sz) +{ + unsigned int head; + VduseVirtqElement *elem; + VduseDev *dev = vq->dev; + + if (unlikely(!vq->vring.avail)) { + return NULL; + } + + if (vduse_queue_empty(vq)) { + return NULL; + } + /* Needed after virtio_queue_empty() */ + smp_rmb(); + + if (vq->inuse >= vq->vring.num) { + fprintf(stderr, "Virtqueue size exceeded: %d\n", vq->inuse); + return NULL; + } + + if (!vduse_queue_get_head(vq, vq->last_avail_idx++, &head)) { + return NULL; + } + + if (vduse_dev_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) { + vring_set_avail_event(vq, vq->last_avail_idx); + } + + elem = vduse_queue_map_desc(vq, head, sz); + + if (!elem) { + return NULL; + } + + vq->inuse++; + + return elem; +} + +static inline void vring_used_write(VduseVirtq *vq, + struct vring_used_elem *uelem, int i) +{ + struct vring_used *used = vq->vring.used; + + used->ring[i] = *uelem; +} + +static void vduse_queue_fill(VduseVirtq *vq, const VduseVirtqElement *elem, + unsigned int len, unsigned int idx) +{ + struct vring_used_elem uelem; + + if (unlikely(!vq->vring.used)) { + return; + } + + idx = (idx + vq->used_idx) % vq->vring.num; + + uelem.id = htole32(elem->index); + uelem.len = htole32(len); + vring_used_write(vq, &uelem, idx); +} + +static inline void vring_used_idx_set(VduseVirtq *vq, uint16_t val) +{ + vq->vring.used->idx = htole16(val); + vq->used_idx = val; +} + +static void vduse_queue_flush(VduseVirtq *vq, unsigned int count) +{ + uint16_t old, new; + + if (unlikely(!vq->vring.used)) { + return; + } + + /* Make sure buffer is written before we update index. */ + smp_wmb(); + + old = vq->used_idx; + new = old + count; + vring_used_idx_set(vq, new); + vq->inuse -= count; + if (unlikely((int16_t)(new - vq->signalled_used) < (uint16_t)(new - old))) { + vq->signalled_used_valid = false; + } +} + +void vduse_queue_push(VduseVirtq *vq, const VduseVirtqElement *elem, + unsigned int len) +{ + vduse_queue_fill(vq, elem, len, 0); + vduse_queue_flush(vq, 1); +} + +static int vduse_queue_update_vring(VduseVirtq *vq, uint64_t desc_addr, + uint64_t avail_addr, uint64_t used_addr) +{ + struct VduseDev *dev = vq->dev; + uint64_t len; + + len = sizeof(struct vring_desc); + vq->vring.desc = iova_to_va(dev, &len, desc_addr); + assert(len == sizeof(struct vring_desc)); + + len = sizeof(struct vring_avail); + vq->vring.avail = iova_to_va(dev, &len, avail_addr); + assert(len == sizeof(struct vring_avail)); + + len = sizeof(struct vring_used); + vq->vring.used = iova_to_va(dev, &len, used_addr); + assert(len == sizeof(struct vring_used)); + + if (!vq->vring.desc || !vq->vring.avail || !vq->vring.used) { + fprintf(stderr, "Failed to get vq[%d] iova mapping\n", vq->index); + return -EINVAL; + } + + return 0; +} + +static void vduse_queue_enable(VduseVirtq *vq) +{ + struct VduseDev *dev = vq->dev; + struct vduse_vq_info vq_info; + struct vduse_vq_eventfd vq_eventfd; + int fd; + + vq_info.index = vq->index; + if (ioctl(dev->fd, VDUSE_VQ_GET_INFO, &vq_info)) { + fprintf(stderr, "Failed to get vq[%d] info: %s\n", + vq->index, strerror(errno)); + return; + } + + if (!vq_info.ready) { + return; + } + + vq->vring.num = vq_info.num; + vq->vring.desc_addr = vq_info.desc_addr; + vq->vring.avail_addr = vq_info.driver_addr; + vq->vring.used_addr = vq_info.device_addr; + + if (vduse_queue_update_vring(vq, vq_info.desc_addr, + vq_info.driver_addr, vq_info.device_addr)) { + fprintf(stderr, "Failed to update vring for vq[%d]\n", vq->index); + return; + } + + fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + if (fd < 0) { + fprintf(stderr, "Failed to init eventfd for vq[%d]\n", vq->index); + return; + } + + vq_eventfd.index = vq->index; + vq_eventfd.fd = fd; + if (ioctl(dev->fd, VDUSE_VQ_SETUP_KICKFD, &vq_eventfd)) { + fprintf(stderr, "Failed to setup kick fd for vq[%d]\n", vq->index); + close(fd); + return; + } + + vq->fd = fd; + vq->shadow_avail_idx = vq->last_avail_idx = vq_info.split.avail_index; + vq->inuse = 0; + vq->used_idx = 0; + vq->signalled_used_valid = false; + vq->ready = true; + + dev->ops->enable_queue(dev, vq); +} + +static void vduse_queue_disable(VduseVirtq *vq) +{ + struct VduseDev *dev = vq->dev; + struct vduse_vq_eventfd eventfd; + + if (!vq->ready) { + return; + } + + dev->ops->disable_queue(dev, vq); + + eventfd.index = vq->index; + eventfd.fd = VDUSE_EVENTFD_DEASSIGN; + ioctl(dev->fd, VDUSE_VQ_SETUP_KICKFD, &eventfd); + close(vq->fd); + + assert(vq->inuse == 0); + + vq->vring.num = 0; + vq->vring.desc_addr = 0; + vq->vring.avail_addr = 0; + vq->vring.used_addr = 0; + vq->vring.desc = 0; + vq->vring.avail = 0; + vq->vring.used = 0; + vq->ready = false; + vq->fd = -1; +} + +static void vduse_dev_start_dataplane(VduseDev *dev) +{ + int i; + + if (ioctl(dev->fd, VDUSE_DEV_GET_FEATURES, &dev->features)) { + fprintf(stderr, "Failed to get features: %s\n", strerror(errno)); + return; + } + assert(vduse_dev_has_feature(dev, VIRTIO_F_VERSION_1)); + + for (i = 0; i < dev->num_queues; i++) { + vduse_queue_enable(&dev->vqs[i]); + } +} + +static void vduse_dev_stop_dataplane(VduseDev *dev) +{ + int i; + + for (i = 0; i < dev->num_queues; i++) { + vduse_queue_disable(&dev->vqs[i]); + } + dev->features = 0; + vduse_iova_remove_region(dev, 0, ULONG_MAX); +} + +int vduse_dev_handler(VduseDev *dev) +{ + struct vduse_dev_request req; + struct vduse_dev_response resp = { 0 }; + VduseVirtq *vq; + int i, ret; + + ret = read(dev->fd, &req, sizeof(req)); + if (ret != sizeof(req)) { + fprintf(stderr, "Read request error [%d]: %s\n", + ret, strerror(errno)); + return -errno; + } + resp.request_id = req.request_id; + + switch (req.type) { + case VDUSE_GET_VQ_STATE: + vq = &dev->vqs[req.vq_state.index]; + resp.vq_state.split.avail_index = vq->last_avail_idx; + resp.result = VDUSE_REQ_RESULT_OK; + break; + case VDUSE_SET_STATUS: + if (req.s.status & VIRTIO_CONFIG_S_DRIVER_OK) { + vduse_dev_start_dataplane(dev); + } else if (req.s.status == 0) { + vduse_dev_stop_dataplane(dev); + } + resp.result = VDUSE_REQ_RESULT_OK; + break; + case VDUSE_UPDATE_IOTLB: + /* The iova will be updated by iova_to_va() later, so just remove it */ + vduse_iova_remove_region(dev, req.iova.start, req.iova.last); + for (i = 0; i < dev->num_queues; i++) { + VduseVirtq *vq = &dev->vqs[i]; + if (vq->ready) { + if (vduse_queue_update_vring(vq, vq->vring.desc_addr, + vq->vring.avail_addr, + vq->vring.used_addr)) { + fprintf(stderr, "Failed to update vring for vq[%d]\n", + vq->index); + } + } + } + resp.result = VDUSE_REQ_RESULT_OK; + break; + default: + resp.result = VDUSE_REQ_RESULT_FAILED; + break; + } + + ret = write(dev->fd, &resp, sizeof(resp)); + if (ret != sizeof(resp)) { + fprintf(stderr, "Write request %d error [%d]: %s\n", + req.type, ret, strerror(errno)); + return -errno; + } + return 0; +} + +int vduse_dev_update_config(VduseDev *dev, uint32_t size, + uint32_t offset, char *buffer) +{ + int ret; + struct vduse_config_data *data; + + data = malloc(offsetof(struct vduse_config_data, buffer) + size); + if (!data) { + return -ENOMEM; + } + + data->offset = offset; + data->length = size; + memcpy(data->buffer, buffer, size); + + ret = ioctl(dev->fd, VDUSE_DEV_SET_CONFIG, data); + free(data); + + if (ret) { + return -errno; + } + + if (ioctl(dev->fd, VDUSE_DEV_INJECT_CONFIG_IRQ)) { + return -errno; + } + + return 0; +} + +int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size) +{ + VduseVirtq *vq = &dev->vqs[index]; + struct vduse_vq_config vq_config = { 0 }; + + if (max_size > VIRTQUEUE_MAX_SIZE) { + return -EINVAL; + } + + vq_config.index = vq->index; + vq_config.max_size = max_size; + + if (ioctl(dev->fd, VDUSE_VQ_SETUP, &vq_config)) { + return -errno; + } + + return 0; +} + +static int vduse_dev_init_vqs(VduseDev *dev, uint16_t num_queues) +{ + VduseVirtq *vqs; + int i; + + vqs = calloc(sizeof(VduseVirtq), num_queues); + if (!vqs) { + return -ENOMEM; + } + + for (i = 0; i < num_queues; i++) { + vqs[i].index = i; + vqs[i].dev = dev; + vqs[i].fd = -1; + } + dev->vqs = vqs; + + return 0; +} + +static int vduse_dev_init(VduseDev *dev, const char *name, + uint16_t num_queues, const VduseOps *ops, + void *priv) +{ + char *dev_path, *dev_name; + int ret, fd; + + dev_path = malloc(strlen(name) + strlen("/dev/vduse/") + 1); + if (!dev_path) { + return -ENOMEM; + } + sprintf(dev_path, "/dev/vduse/%s", name); + + fd = open(dev_path, O_RDWR); + free(dev_path); + if (fd < 0) { + fprintf(stderr, "Failed to open vduse dev %s: %s\n", + name, strerror(errno)); + return -errno; + } + + dev_name = strdup(name); + if (!dev_name) { + close(fd); + return -ENOMEM; + } + + ret = vduse_dev_init_vqs(dev, num_queues); + if (ret) { + free(dev_name); + close(fd); + return ret; + } + + dev->name = dev_name; + dev->num_queues = num_queues; + dev->fd = fd; + dev->ops = ops; + dev->priv = priv; + + return 0; +} + +static inline bool vduse_name_is_valid(const char *name) +{ + return strlen(name) >= VDUSE_NAME_MAX || strstr(name, ".."); +} + +VduseDev *vduse_dev_create_by_fd(int fd, uint16_t num_queues, + const VduseOps *ops, void *priv) +{ + VduseDev *dev; + int ret; + + if (!ops || !ops->enable_queue || !ops->disable_queue) { + fprintf(stderr, "Invalid parameter for vduse\n"); + return NULL; + } + + dev = calloc(sizeof(VduseDev), 1); + if (!dev) { + fprintf(stderr, "Failed to allocate vduse device\n"); + return NULL; + } + + ret = vduse_dev_init_vqs(dev, num_queues); + if (ret) { + fprintf(stderr, "Failed to init vqs\n"); + free(dev); + return NULL; + } + + dev->num_queues = num_queues; + dev->fd = fd; + dev->ops = ops; + dev->priv = priv; + + return dev; +} + +VduseDev *vduse_dev_create_by_name(const char *name, uint16_t num_queues, + const VduseOps *ops, void *priv) +{ + VduseDev *dev; + int ret; + + if (!name || vduse_name_is_valid(name) || !ops || + !ops->enable_queue || !ops->disable_queue) { + fprintf(stderr, "Invalid parameter for vduse\n"); + return NULL; + } + + dev = calloc(sizeof(VduseDev), 1); + if (!dev) { + fprintf(stderr, "Failed to allocate vduse device\n"); + return NULL; + } + + ret = vduse_dev_init(dev, name, num_queues, ops, priv); + if (ret < 0) { + fprintf(stderr, "Failed to init vduse device %s: %s\n", + name, strerror(ret)); + free(dev); + return NULL; + } + + return dev; +} + +VduseDev *vduse_dev_create(const char *name, uint32_t device_id, + uint32_t vendor_id, uint64_t features, + uint16_t num_queues, uint32_t config_size, + char *config, const VduseOps *ops, void *priv) +{ + VduseDev *dev; + int ret, ctrl_fd; + uint64_t version; + struct vduse_dev_config *dev_config; + size_t size = offsetof(struct vduse_dev_config, config); + + if (!name || vduse_name_is_valid(name) || + !has_feature(features, VIRTIO_F_VERSION_1) || !config || + !config_size || !ops || !ops->enable_queue || !ops->disable_queue) { + fprintf(stderr, "Invalid parameter for vduse\n"); + return NULL; + } + + dev = calloc(sizeof(VduseDev), 1); + if (!dev) { + fprintf(stderr, "Failed to allocate vduse device\n"); + return NULL; + } + + ctrl_fd = open("/dev/vduse/control", O_RDWR); + if (ctrl_fd < 0) { + fprintf(stderr, "Failed to open /dev/vduse/control: %s\n", + strerror(errno)); + goto err_ctrl; + } + + version = VDUSE_API_VERSION; + if (ioctl(ctrl_fd, VDUSE_SET_API_VERSION, &version)) { + fprintf(stderr, "Failed to set api version %lu: %s\n", + version, strerror(errno)); + goto err_dev; + } + + dev_config = calloc(size + config_size, 1); + if (!dev_config) { + fprintf(stderr, "Failed to allocate config space\n"); + goto err_dev; + } + + strcpy(dev_config->name, name); + dev_config->device_id = device_id; + dev_config->vendor_id = vendor_id; + dev_config->features = features; + dev_config->vq_num = num_queues; + dev_config->vq_align = VDUSE_VQ_ALIGN; + dev_config->config_size = config_size; + memcpy(dev_config->config, config, config_size); + + ret = ioctl(ctrl_fd, VDUSE_CREATE_DEV, dev_config); + free(dev_config); + if (ret < 0) { + fprintf(stderr, "Failed to create vduse device %s: %s\n", + name, strerror(errno)); + goto err_dev; + } + dev->ctrl_fd = ctrl_fd; + + ret = vduse_dev_init(dev, name, num_queues, ops, priv); + if (ret < 0) { + fprintf(stderr, "Failed to init vduse device %s: %s\n", + name, strerror(ret)); + goto err; + } + + return dev; +err: + ioctl(ctrl_fd, VDUSE_DESTROY_DEV, name); +err_dev: + close(ctrl_fd); +err_ctrl: + free(dev); + + return NULL; +} + +int vduse_dev_destroy(VduseDev *dev) +{ + int ret = 0; + + free(dev->vqs); + if (dev->fd > 0) { + close(dev->fd); + dev->fd = -1; + } + if (dev->ctrl_fd > 0) { + if (ioctl(dev->ctrl_fd, VDUSE_DESTROY_DEV, dev->name)) { + ret = -errno; + } + close(dev->ctrl_fd); + dev->ctrl_fd = -1; + } + free(dev->name); + free(dev); + + return ret; +} diff --git a/subprojects/libvduse/libvduse.h b/subprojects/libvduse/libvduse.h new file mode 100644 index 0000000000..6c2fe98213 --- /dev/null +++ b/subprojects/libvduse/libvduse.h @@ -0,0 +1,235 @@ +/* + * VDUSE (vDPA Device in Userspace) library + * + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: + * Xie Yongji + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#ifndef LIBVDUSE_H +#define LIBVDUSE_H + +#include +#include + +#define VIRTQUEUE_MAX_SIZE 1024 + +/* VDUSE device structure */ +typedef struct VduseDev VduseDev; + +/* Virtqueue structure */ +typedef struct VduseVirtq VduseVirtq; + +/* Some operation of VDUSE backend */ +typedef struct VduseOps { + /* Called when virtqueue can be processed */ + void (*enable_queue)(VduseDev *dev, VduseVirtq *vq); + /* Called when virtqueue processing should be stopped */ + void (*disable_queue)(VduseDev *dev, VduseVirtq *vq); +} VduseOps; + +/* Describing elements of the I/O buffer */ +typedef struct VduseVirtqElement { + /* Descriptor table index */ + unsigned int index; + /* Number of physically-contiguous device-readable descriptors */ + unsigned int out_num; + /* Number of physically-contiguous device-writable descriptors */ + unsigned int in_num; + /* Array to store physically-contiguous device-writable descriptors */ + struct iovec *in_sg; + /* Array to store physically-contiguous device-readable descriptors */ + struct iovec *out_sg; +} VduseVirtqElement; + + +/** + * vduse_get_virtio_features: + * + * Get supported virtio features + * + * Returns: supported feature bits + */ +uint64_t vduse_get_virtio_features(void); + +/** + * vduse_queue_get_dev: + * @vq: specified virtqueue + * + * Get corresponding VDUSE device from the virtqueue. + * + * Returns: a pointer to VDUSE device on success, NULL on failure. + */ +VduseDev *vduse_queue_get_dev(VduseVirtq *vq); + +/** + * vduse_queue_get_fd: + * @vq: specified virtqueue + * + * Get the kick fd for the virtqueue. + * + * Returns: file descriptor on success, -1 on failure. + */ +int vduse_queue_get_fd(VduseVirtq *vq); + +/** + * vduse_queue_pop: + * @vq: specified virtqueue + * @sz: the size of struct to return (must be >= VduseVirtqElement) + * + * Pop an element from virtqueue available ring. + * + * Returns: a pointer to a structure containing VduseVirtqElement on success, + * NULL on failure. + */ +void *vduse_queue_pop(VduseVirtq *vq, size_t sz); + +/** + * vduse_queue_push: + * @vq: specified virtqueue + * @elem: pointer to VduseVirtqElement returned by vduse_queue_pop() + * @len: length in bytes to write + * + * Push an element to virtqueue used ring. + */ +void vduse_queue_push(VduseVirtq *vq, const VduseVirtqElement *elem, + unsigned int len); +/** + * vduse_queue_notify: + * @vq: specified virtqueue + * + * Request to notify the queue. + */ +void vduse_queue_notify(VduseVirtq *vq); + +/** + * vduse_dev_get_priv: + * @dev: VDUSE device + * + * Get the private pointer passed to vduse_dev_create(). + * + * Returns: private pointer on success, NULL on failure. + */ +void *vduse_dev_get_priv(VduseDev *dev); + +/** + * vduse_dev_get_queue: + * @dev: VDUSE device + * @index: virtqueue index + * + * Get the specified virtqueue. + * + * Returns: a pointer to the virtqueue on success, NULL on failure. + */ +VduseVirtq *vduse_dev_get_queue(VduseDev *dev, int index); + +/** + * vduse_dev_get_fd: + * @dev: VDUSE device + * + * Get the control message fd for the VDUSE device. + * + * Returns: file descriptor on success, -1 on failure. + */ +int vduse_dev_get_fd(VduseDev *dev); + +/** + * vduse_dev_handler: + * @dev: VDUSE device + * + * Used to process the control message. + * + * Returns: file descriptor on success, -errno on failure. + */ +int vduse_dev_handler(VduseDev *dev); + +/** + * vduse_dev_update_config: + * @dev: VDUSE device + * @size: the size to write to configuration space + * @offset: the offset from the beginning of configuration space + * @buffer: the buffer used to write from + * + * Update device configuration space and inject a config interrupt. + * + * Returns: 0 on success, -errno on failure. + */ +int vduse_dev_update_config(VduseDev *dev, uint32_t size, + uint32_t offset, char *buffer); + +/** + * vduse_dev_setup_queue: + * @dev: VDUSE device + * @index: virtqueue index + * @max_size: the max size of virtqueue + * + * Setup the specified virtqueue. + * + * Returns: 0 on success, -errno on failure. + */ +int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size); + +/** + * vduse_dev_create_by_fd: + * @fd: passed file descriptor + * @num_queues: the number of virtqueues + * @ops: the operation of VDUSE backend + * @priv: private pointer + * + * Create VDUSE device from a passed file descriptor. + * + * Returns: pointer to VDUSE device on success, NULL on failure. + */ +VduseDev *vduse_dev_create_by_fd(int fd, uint16_t num_queues, + const VduseOps *ops, void *priv); + +/** + * vduse_dev_create_by_name: + * @name: VDUSE device name + * @num_queues: the number of virtqueues + * @ops: the operation of VDUSE backend + * @priv: private pointer + * + * Create VDUSE device on /dev/vduse/$NAME. + * + * Returns: pointer to VDUSE device on success, NULL on failure. + */ +VduseDev *vduse_dev_create_by_name(const char *name, uint16_t num_queues, + const VduseOps *ops, void *priv); + +/** + * vduse_dev_create: + * @name: VDUSE device name + * @device_id: virtio device id + * @vendor_id: virtio vendor id + * @features: virtio features + * @num_queues: the number of virtqueues + * @config_size: the size of the configuration space + * @config: the buffer of the configuration space + * @ops: the operation of VDUSE backend + * @priv: private pointer + * + * Create VDUSE device. + * + * Returns: pointer to VDUSE device on success, NULL on failure. + */ +VduseDev *vduse_dev_create(const char *name, uint32_t device_id, + uint32_t vendor_id, uint64_t features, + uint16_t num_queues, uint32_t config_size, + char *config, const VduseOps *ops, void *priv); + +/** + * vduse_dev_destroy: + * @dev: VDUSE device + * + * Destroy the VDUSE device. + * + * Returns: 0 on success, -errno on failure. + */ +int vduse_dev_destroy(VduseDev *dev); + +#endif diff --git a/subprojects/libvduse/linux-headers/linux b/subprojects/libvduse/linux-headers/linux new file mode 120000 index 0000000000..04f3304f79 --- /dev/null +++ b/subprojects/libvduse/linux-headers/linux @@ -0,0 +1 @@ +../../../linux-headers/linux/ \ No newline at end of file diff --git a/subprojects/libvduse/meson.build b/subprojects/libvduse/meson.build new file mode 100644 index 0000000000..ba08f5ee1a --- /dev/null +++ b/subprojects/libvduse/meson.build @@ -0,0 +1,10 @@ +project('libvduse', 'c', + license: 'GPL-2.0-or-later', + default_options: ['c_std=gnu99']) + +libvduse = static_library('vduse', + files('libvduse.c'), + c_args: '-D_GNU_SOURCE') + +libvduse_dep = declare_dependency(link_with: libvduse, + include_directories: include_directories('.')) diff --git a/subprojects/libvduse/standard-headers/linux b/subprojects/libvduse/standard-headers/linux new file mode 120000 index 0000000000..c416f068ac --- /dev/null +++ b/subprojects/libvduse/standard-headers/linux @@ -0,0 +1 @@ +../../../include/standard-headers/linux/ \ No newline at end of file From patchwork Wed May 4 07:40:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12837293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D8840C433FE for ; Wed, 4 May 2022 08:06:51 +0000 (UTC) Received: from localhost ([::1]:53470 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nmA21-00056W-IY for qemu-devel@archiver.kernel.org; Wed, 04 May 2022 04:06:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52542) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nm9dC-0002d3-2d for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:20 -0400 Received: from mail-pf1-x42a.google.com ([2607:f8b0:4864:20::42a]:35680) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nm9d8-00073q-Ew for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:05 -0400 Received: by mail-pf1-x42a.google.com with SMTP id c14so527376pfn.2 for ; Wed, 04 May 2022 00:41:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Q7PY5rUnW3DgJx7LI661jrJTUvdflVaSARvGu6Lskic=; b=Qs4s7jJSzaUTjdtat1/loCS6FP+mF4qVnh+YvZy2XGmiYdps67+McYD+OlwfBrcCYk Zbsbsh8d+6cxPZ9BCr7P2nEUw3+zNoP80evQ3wTqPTPZs0rNIMoEjoi8df4/nZtYIITs OshG6C/CXzC0SDLZcbGvjb5U1pumvIExZCqpliIo531vzCK8P2oRM6CHgIkGWHW1NZFY fo/IgLInOrpnTGF6AWHGFK459jAz42T36cYwStiuer+l7+Dm0m3MvKcl6c1/tZBH0Vux vyqzvq55g+4WqroV+bXkkbvO3bvj0yHvR4Mztjp2w3SdHMYBhtHg/zLW6pAAo4INNKFf xBNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Q7PY5rUnW3DgJx7LI661jrJTUvdflVaSARvGu6Lskic=; b=XokqD6i31u0MJOVAc4BYy42FUGoSlFdplthCEff8XH7r/7+sRE6KpvnZ47JMZpawTk HSrg2PuPzfMe4Mtqoug1w0no8WJYYIu0bnHw3NQklxRT0UsWeHooOTB0SAadCtQjFA0c oK6NWcYLow22Hi++oCXDEifE2e6iWqLY3rzG71SY2Yt7pqtoaFi84fDIPDSmBSZV+fAz FqW5jrTaP6SQegP74TXC7nDf9l0OjjBVcXhN41EKrt77GfhR6pZ/ZUvQkjKT9QeuIShg B6rvbRxdWeCioBw54k5xBAdNTUFtllHjMNjisa/BXI56BwU6i+L3pkEv57Wl39v6sSrE chbA== X-Gm-Message-State: AOAM5328FEsx0PPAqxo3H0vlKZW+ni/kodSF7qbJLUBhVQ0uX/9XWvd5 drombZoLYM9HCR1PjbkjiXnx X-Google-Smtp-Source: ABdhPJwV0FhQBjVJJlPTCTVL6fYWJdJa3CTb44mRd29KyTCIMeIZ5SKhdz/LA9b3KTJdgfP/MzV4mQ== X-Received: by 2002:a63:6bc6:0:b0:39d:966d:2791 with SMTP id g189-20020a636bc6000000b0039d966d2791mr16821806pgc.407.1651650061048; Wed, 04 May 2022 00:41:01 -0700 (PDT) Received: from localhost ([139.177.225.233]) by smtp.gmail.com with ESMTPSA id c10-20020a170902724a00b0015e8d4eb2cdsm7664937pll.279.2022.05.04.00.40.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 00:41:00 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com, jsnow@redhat.com, eblake@redhat.com, Coiby.Xu@gmail.com, hreitz@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v5 6/8] vduse-blk: Implement vduse-blk export Date: Wed, 4 May 2022 15:40:49 +0800 Message-Id: <20220504074051.90-7-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220504074051.90-1-xieyongji@bytedance.com> References: <20220504074051.90-1-xieyongji@bytedance.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42a; envelope-from=xieyongji@bytedance.com; helo=mail-pf1-x42a.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This implements a VDUSE block backends based on the libvduse library. We can use it to export the BDSs for both VM and container (host) usage. The new command-line syntax is: $ qemu-storage-daemon \ --blockdev file,node-name=drive0,filename=test.img \ --export vduse-blk,node-name=drive0,id=vduse-export0,writable=on After the qemu-storage-daemon started, we need to use the "vdpa" command to attach the device to vDPA bus: $ vdpa dev add name vduse-export0 mgmtdev vduse Also the device must be removed via the "vdpa" command before we stop the qemu-storage-daemon. Signed-off-by: Xie Yongji Reviewed-by: Stefan Hajnoczi --- MAINTAINERS | 4 +- block/export/export.c | 6 + block/export/meson.build | 5 + block/export/vduse-blk.c | 312 ++++++++++++++++++++++++++++++++++ block/export/vduse-blk.h | 20 +++ meson.build | 13 ++ meson_options.txt | 2 + qapi/block-export.json | 25 ++- scripts/meson-buildoptions.sh | 4 + 9 files changed, 388 insertions(+), 3 deletions(-) create mode 100644 block/export/vduse-blk.c create mode 100644 block/export/vduse-blk.h diff --git a/MAINTAINERS b/MAINTAINERS index 6de3cbaa1e..704489e50d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3550,10 +3550,12 @@ L: qemu-block@nongnu.org S: Supported F: block/export/fuse.c -VDUSE library +VDUSE library and block device exports M: Xie Yongji S: Maintained F: subprojects/libvduse/ +F: block/export/vduse-blk.c +F: block/export/vduse-blk.h Replication M: Wen Congyang diff --git a/block/export/export.c b/block/export/export.c index 7253af3bc3..4744862915 100644 --- a/block/export/export.c +++ b/block/export/export.c @@ -26,6 +26,9 @@ #ifdef CONFIG_VHOST_USER_BLK_SERVER #include "vhost-user-blk-server.h" #endif +#ifdef CONFIG_VDUSE_BLK_EXPORT +#include "vduse-blk.h" +#endif static const BlockExportDriver *blk_exp_drivers[] = { &blk_exp_nbd, @@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] = { #ifdef CONFIG_FUSE &blk_exp_fuse, #endif +#ifdef CONFIG_VDUSE_BLK_EXPORT + &blk_exp_vduse_blk, +#endif }; /* Only accessed from the main thread */ diff --git a/block/export/meson.build b/block/export/meson.build index 431e47ca51..c60116f455 100644 --- a/block/export/meson.build +++ b/block/export/meson.build @@ -5,3 +5,8 @@ if have_vhost_user_blk_server endif blockdev_ss.add(when: fuse, if_true: files('fuse.c')) + +if have_vduse_blk_export + blockdev_ss.add(files('vduse-blk.c', 'virtio-blk-handler.c')) + blockdev_ss.add(libvduse) +endif diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c new file mode 100644 index 0000000000..8580ae929f --- /dev/null +++ b/block/export/vduse-blk.c @@ -0,0 +1,312 @@ +/* + * Export QEMU block device via VDUSE + * + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved. + * Portions of codes and concepts borrowed from vhost-user-blk-server.c, so: + * Copyright (c) 2020 Red Hat, Inc. + * + * Author: + * Xie Yongji + * Coiby Xu + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "block/export.h" +#include "qemu/error-report.h" +#include "util/block-helpers.h" +#include "subprojects/libvduse/libvduse.h" +#include "virtio-blk-handler.h" + +#include "standard-headers/linux/virtio_blk.h" + +#define VDUSE_DEFAULT_NUM_QUEUE 1 +#define VDUSE_DEFAULT_QUEUE_SIZE 256 + +typedef struct VduseBlkExport { + BlockExport export; + VduseDev *dev; + uint16_t num_queues; + bool writable; +} VduseBlkExport; + +typedef struct VduseBlkReq { + VduseVirtqElement elem; + VduseVirtq *vq; +} VduseBlkReq; + +static void vduse_blk_req_complete(VduseBlkReq *req, size_t in_len) +{ + vduse_queue_push(req->vq, &req->elem, in_len); + vduse_queue_notify(req->vq); + + free(req); +} + +static void coroutine_fn vduse_blk_virtio_process_req(void *opaque) +{ + VduseBlkReq *req = opaque; + VduseVirtq *vq = req->vq; + VduseDev *dev = vduse_queue_get_dev(vq); + VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev); + BlockBackend *blk = vblk_exp->export.blk; + VduseVirtqElement *elem = &req->elem; + struct iovec *in_iov = elem->in_sg; + struct iovec *out_iov = elem->out_sg; + unsigned in_num = elem->in_num; + unsigned out_num = elem->out_num; + int in_len; + + in_len = virtio_blk_process_req(blk, vblk_exp->writable, + vblk_exp->export.id, in_iov, + out_iov, in_num, out_num); + if (in_len < 0) { + free(req); + return; + } + + vduse_blk_req_complete(req, in_len); +} + +static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq *vq) +{ + while (1) { + VduseBlkReq *req; + + req = vduse_queue_pop(vq, sizeof(VduseBlkReq)); + if (!req) { + break; + } + req->vq = vq; + + Coroutine *co = + qemu_coroutine_create(vduse_blk_virtio_process_req, req); + qemu_coroutine_enter(co); + } +} + +static void on_vduse_vq_kick(void *opaque) +{ + VduseVirtq *vq = opaque; + VduseDev *dev = vduse_queue_get_dev(vq); + int fd = vduse_queue_get_fd(vq); + eventfd_t kick_data; + + if (eventfd_read(fd, &kick_data) == -1) { + error_report("failed to read data from eventfd"); + return; + } + + vduse_blk_vq_handler(dev, vq); +} + +static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq) +{ + VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev); + + aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), + true, on_vduse_vq_kick, NULL, NULL, NULL, vq); +} + +static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq) +{ + VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev); + + aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), + true, NULL, NULL, NULL, NULL, NULL); +} + +static const VduseOps vduse_blk_ops = { + .enable_queue = vduse_blk_enable_queue, + .disable_queue = vduse_blk_disable_queue, +}; + +static void on_vduse_dev_kick(void *opaque) +{ + VduseDev *dev = opaque; + + vduse_dev_handler(dev); +} + +static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx) +{ + int i; + + aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev), + true, on_vduse_dev_kick, NULL, NULL, NULL, + vblk_exp->dev); + + for (i = 0; i < vblk_exp->num_queues; i++) { + VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i); + int fd = vduse_queue_get_fd(vq); + + if (fd < 0) { + continue; + } + aio_set_fd_handler(vblk_exp->export.ctx, fd, true, + on_vduse_vq_kick, NULL, NULL, NULL, vq); + } +} + +static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp) +{ + int i; + + for (i = 0; i < vblk_exp->num_queues; i++) { + VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i); + int fd = vduse_queue_get_fd(vq); + + if (fd < 0) { + continue; + } + aio_set_fd_handler(vblk_exp->export.ctx, fd, + true, NULL, NULL, NULL, NULL, NULL); + } + aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev), + true, NULL, NULL, NULL, NULL, NULL); +} + + +static void blk_aio_attached(AioContext *ctx, void *opaque) +{ + VduseBlkExport *vblk_exp = opaque; + + vblk_exp->export.ctx = ctx; + vduse_blk_attach_ctx(vblk_exp, ctx); +} + +static void blk_aio_detach(void *opaque) +{ + VduseBlkExport *vblk_exp = opaque; + + vduse_blk_detach_ctx(vblk_exp); + vblk_exp->export.ctx = NULL; +} + +static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, + Error **errp) +{ + VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); + BlockExportOptionsVduseBlk *vblk_opts = &opts->u.vduse_blk; + uint64_t logical_block_size = VIRTIO_BLK_SECTOR_SIZE; + uint16_t num_queues = VDUSE_DEFAULT_NUM_QUEUE; + uint16_t queue_size = VDUSE_DEFAULT_QUEUE_SIZE; + Error *local_err = NULL; + struct virtio_blk_config config = { 0 }; + uint64_t features; + int i; + + if (vblk_opts->has_num_queues) { + num_queues = vblk_opts->num_queues; + if (num_queues == 0) { + error_setg(errp, "num-queues must be greater than 0"); + return -EINVAL; + } + } + + if (vblk_opts->has_queue_size) { + queue_size = vblk_opts->queue_size; + if (queue_size <= 2 || !is_power_of_2(queue_size) || + queue_size > VIRTQUEUE_MAX_SIZE) { + error_setg(errp, "queue-size is invalid"); + return -EINVAL; + } + } + + if (vblk_opts->has_logical_block_size) { + logical_block_size = vblk_opts->logical_block_size; + check_block_size(exp->id, "logical-block-size", logical_block_size, + &local_err); + if (local_err) { + error_propagate(errp, local_err); + return -EINVAL; + } + } + blk_set_guest_block_size(exp->blk, logical_block_size); + + vblk_exp->writable = opts->writable; + vblk_exp->num_queues = num_queues; + + config.capacity = + cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS); + config.seg_max = cpu_to_le32(queue_size - 2); + config.size_max = cpu_to_le32(0); + config.min_io_size = cpu_to_le16(1); + config.opt_io_size = cpu_to_le32(1); + config.num_queues = cpu_to_le16(num_queues); + config.blk_size = cpu_to_le32(logical_block_size); + config.max_discard_sectors = cpu_to_le32(VIRTIO_BLK_MAX_DISCARD_SECTORS); + config.max_discard_seg = cpu_to_le32(1); + config.discard_sector_alignment = + cpu_to_le32(logical_block_size >> VIRTIO_BLK_SECTOR_BITS); + config.max_write_zeroes_sectors = + cpu_to_le32(VIRTIO_BLK_MAX_WRITE_ZEROES_SECTORS); + config.max_write_zeroes_seg = cpu_to_le32(1); + + features = vduse_get_virtio_features() | + (1ULL << VIRTIO_BLK_F_SIZE_MAX) | + (1ULL << VIRTIO_BLK_F_SEG_MAX) | + (1ULL << VIRTIO_BLK_F_TOPOLOGY) | + (1ULL << VIRTIO_BLK_F_BLK_SIZE) | + (1ULL << VIRTIO_BLK_F_FLUSH) | + (1ULL << VIRTIO_BLK_F_DISCARD) | + (1ULL << VIRTIO_BLK_F_WRITE_ZEROES); + + if (num_queues > 1) { + features |= 1ULL << VIRTIO_BLK_F_MQ; + } + if (!vblk_exp->writable) { + features |= 1ULL << VIRTIO_BLK_F_RO; + } + + vblk_exp->dev = vduse_dev_create(exp->id, VIRTIO_ID_BLOCK, 0, + features, num_queues, + sizeof(struct virtio_blk_config), + (char *)&config, &vduse_blk_ops, + vblk_exp); + if (!vblk_exp->dev) { + error_setg(errp, "failed to create vduse device"); + return -ENOMEM; + } + + for (i = 0; i < num_queues; i++) { + vduse_dev_setup_queue(vblk_exp->dev, i, queue_size); + } + + aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), true, + on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev); + + blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach, + vblk_exp); + + return 0; +} + +static void vduse_blk_exp_delete(BlockExport *exp) +{ + VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); + + blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach, + vblk_exp); + vduse_dev_destroy(vblk_exp->dev); +} + +static void vduse_blk_exp_request_shutdown(BlockExport *exp) +{ + VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); + + vduse_blk_detach_ctx(vblk_exp); +} + +const BlockExportDriver blk_exp_vduse_blk = { + .type = BLOCK_EXPORT_TYPE_VDUSE_BLK, + .instance_size = sizeof(VduseBlkExport), + .create = vduse_blk_exp_create, + .delete = vduse_blk_exp_delete, + .request_shutdown = vduse_blk_exp_request_shutdown, +}; diff --git a/block/export/vduse-blk.h b/block/export/vduse-blk.h new file mode 100644 index 0000000000..c4eeb1b70e --- /dev/null +++ b/block/export/vduse-blk.h @@ -0,0 +1,20 @@ +/* + * Export QEMU block device via VDUSE + * + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: + * Xie Yongji + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#ifndef VDUSE_BLK_H +#define VDUSE_BLK_H + +#include "block/export.h" + +extern const BlockExportDriver blk_exp_vduse_blk; + +#endif /* VDUSE_BLK_H */ diff --git a/meson.build b/meson.build index 4a0f1a2016..c6ded0ab4c 100644 --- a/meson.build +++ b/meson.build @@ -1407,6 +1407,17 @@ if have_libvduse libvduse = libvduse_proj.get_variable('libvduse_dep') endif +have_vduse_blk_export = (have_libvduse and targetos == 'linux') +if get_option('vduse_blk_export').enabled() + if targetos != 'linux' + error('vduse_blk_export requires linux') + elif not have_libvduse + error('vduse_blk_export requires libvduse support') + endif +elif get_option('vduse_blk_export').disabled() + have_vduse_blk_export = false +endif + # libbpf libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config') if libbpf.found() and not cc.links(''' @@ -1618,6 +1629,7 @@ config_host_data.set('CONFIG_TPM', have_tpm) config_host_data.set('CONFIG_USB_LIBUSB', libusb.found()) config_host_data.set('CONFIG_VDE', vde.found()) config_host_data.set('CONFIG_VHOST_USER_BLK_SERVER', have_vhost_user_blk_server) +config_host_data.set('CONFIG_VDUSE_BLK_EXPORT', have_vduse_blk_export) config_host_data.set('CONFIG_PNG', png.found()) config_host_data.set('CONFIG_VNC', vnc.found()) config_host_data.set('CONFIG_VNC_JPEG', jpeg.found()) @@ -3747,6 +3759,7 @@ if have_block summary_info += {'qed support': get_option('qed').allowed()} summary_info += {'parallels support': get_option('parallels').allowed()} summary_info += {'FUSE exports': fuse} + summary_info += {'VDUSE block exports': have_vduse_blk_export} endif summary(summary_info, bool_yn: true, section: 'Block layer support') diff --git a/meson_options.txt b/meson_options.txt index 47955f972d..06725d8302 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -233,6 +233,8 @@ option('virtiofsd', type: 'feature', value: 'auto', description: 'build virtiofs daemon (virtiofsd)') option('libvduse', type: 'feature', value: 'auto', description: 'build VDUSE Library') +option('vduse_blk_export', type: 'feature', value: 'auto', + description: 'VDUSE block export support') option('capstone', type: 'combo', value: 'auto', choices: ['disabled', 'enabled', 'auto', 'system', 'internal'], diff --git a/qapi/block-export.json b/qapi/block-export.json index 1de16d2589..bbaf7400fe 100644 --- a/qapi/block-export.json +++ b/qapi/block-export.json @@ -173,6 +173,23 @@ '*allow-other': 'FuseExportAllowOther' }, 'if': 'CONFIG_FUSE' } +## +# @BlockExportOptionsVduseBlk: +# +# A vduse-blk block export. +# +# @num-queues: the number of virtqueues. Defaults to 1. +# @queue-size: the size of virtqueue. Defaults to 256. +# @logical-block-size: Logical block size in bytes. Range [512, PAGE_SIZE] +# and must be power of 2. Defaults to 512 bytes. +# +# Since: 7.1 +## +{ 'struct': 'BlockExportOptionsVduseBlk', + 'data': { '*num-queues': 'uint16', + '*queue-size': 'uint16', + '*logical-block-size': 'size'} } + ## # @NbdServerAddOptions: # @@ -276,6 +293,7 @@ # @nbd: NBD export # @vhost-user-blk: vhost-user-blk export (since 5.2) # @fuse: FUSE export (since: 6.0) +# @vduse-blk: vduse-blk export (since 7.1) # # Since: 4.2 ## @@ -283,7 +301,8 @@ 'data': [ 'nbd', { 'name': 'vhost-user-blk', 'if': 'CONFIG_VHOST_USER_BLK_SERVER' }, - { 'name': 'fuse', 'if': 'CONFIG_FUSE' } ] } + { 'name': 'fuse', 'if': 'CONFIG_FUSE' }, + { 'name': 'vduse-blk', 'if': 'CONFIG_VDUSE_BLK_EXPORT' } ] } ## # @BlockExportOptions: @@ -327,7 +346,9 @@ 'vhost-user-blk': { 'type': 'BlockExportOptionsVhostUserBlk', 'if': 'CONFIG_VHOST_USER_BLK_SERVER' }, 'fuse': { 'type': 'BlockExportOptionsFuse', - 'if': 'CONFIG_FUSE' } + 'if': 'CONFIG_FUSE' }, + 'vduse-blk': { 'type': 'BlockExportOptionsVduseBlk', + 'if': 'CONFIG_VDUSE_BLK_EXPORT' } } } ## diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh index f725636ea8..e09b5cbdec 100644 --- a/scripts/meson-buildoptions.sh +++ b/scripts/meson-buildoptions.sh @@ -125,6 +125,8 @@ meson_options_help() { printf "%s\n" ' usb-redir libusbredir support' printf "%s\n" ' vde vde network backend support' printf "%s\n" ' vdi vdi image format support' + printf "%s\n" ' vduse-blk-export' + printf "%s\n" ' VDUSE block export support' printf "%s\n" ' vhost-user-blk-server' printf "%s\n" ' build vhost-user-blk server' printf "%s\n" ' virglrenderer virgl rendering support' @@ -359,6 +361,8 @@ _meson_option_parse() { --disable-vde) printf "%s" -Dvde=disabled ;; --enable-vdi) printf "%s" -Dvdi=enabled ;; --disable-vdi) printf "%s" -Dvdi=disabled ;; + --enable-vduse-blk-export) printf "%s" -Dvduse_blk_export=enabled ;; + --disable-vduse-blk-export) printf "%s" -Dvduse_blk_export=disabled ;; --enable-vhost-user-blk-server) printf "%s" -Dvhost_user_blk_server=enabled ;; --disable-vhost-user-blk-server) printf "%s" -Dvhost_user_blk_server=disabled ;; --enable-virglrenderer) printf "%s" -Dvirglrenderer=enabled ;; From patchwork Wed May 4 07:40:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12837216 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C3897C433F5 for ; Wed, 4 May 2022 07:44:56 +0000 (UTC) Received: from localhost ([::1]:60368 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nm9gs-00061w-J0 for qemu-devel@archiver.kernel.org; Wed, 04 May 2022 03:44:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52606) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nm9dF-0002d8-C4 for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:20 -0400 Received: from mail-pj1-x1031.google.com ([2607:f8b0:4864:20::1031]:53116) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nm9dC-00075J-VQ for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:09 -0400 Received: by mail-pj1-x1031.google.com with SMTP id e24so523750pjt.2 for ; Wed, 04 May 2022 00:41:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Xk261oHbXKFFnsukENd6oL02N15hbamNQACtPikcMSk=; b=MtiB/xPsRn1rHqMJnCx8eSuq4M7QvEwDZVL4E3fmXjduSgjIwgHG+w/Yh4C1YrnbED OXUXJOyGaaGLYDdReFm0NZNdyhCxXYBOIDvv8+sbNIHnm/kMY++sWk+Xi0+r86H3wBum UMwZiGVXp++ZaJOfgL7Q+O6bUopA0BUBKi1OxD+T/MTq0Rw4b2+F0aEnBxf2PcI8TyB2 RWNnRN5WdRpDMrb/ylz3NDm1d/xLUZR4BQ31p+6POt53sRzG4UA9t5i0ni7JGmpVUhZz 7Zjjc3ekLpML76f39Ol/UHLdCU4DJfvvoLIbeEdvr0ZSnKnBTwcL7BsvHiegQGeJtmxt CJ5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Xk261oHbXKFFnsukENd6oL02N15hbamNQACtPikcMSk=; b=YDJm7UqlQCDGRqUxn+Wj5eI+zzBqRSOQ/W7HYJTMzsjPcUgpEbXKgBXLAioeM4vadE m6ZvqM9jLVIgB+4SFXJaSEUM70lusAWAnijPaS01vVFNXiB+2KrR1+jeNIE62ezUUq+P lshEjzFnGv3YANVmY5oc+9hSSAf6Cu5vbMqz5GGNSjhwiT6ms01krV1ahDsPGjjurm7q PPl83mlk94V5Y2VLwvFiThxKwguwst+hGuuMM3uu+lmBydLqbHwPOQx08VCsuESAFKjC qDvVogGF2UtJ1Au5DtmciCReQwr0Lmg47gHDa/KOFoEfHyIb5K1a129vM54907/kaNkw rtNQ== X-Gm-Message-State: AOAM530KsrjpSJ4NTDbkarDHTJtfXf4daWFNPzSr/iM2Ix4joaSZCC6P SjN2w2mE7wIRBgLeMcF1boqW X-Google-Smtp-Source: ABdhPJxTfVtXy8swI6avJk+u6PipbaKq3Xzcl1In4D81Xl6gD1bsU1W5NPdGIUxlIqBn5BfVNf6IXQ== X-Received: by 2002:a17:90b:1d03:b0:1dc:9589:7c90 with SMTP id on3-20020a17090b1d0300b001dc95897c90mr3769593pjb.225.1651650065260; Wed, 04 May 2022 00:41:05 -0700 (PDT) Received: from localhost ([139.177.225.233]) by smtp.gmail.com with ESMTPSA id i3-20020a17090a718300b001d6a79768b6sm2584446pjk.49.2022.05.04.00.41.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 00:41:04 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com, jsnow@redhat.com, eblake@redhat.com, Coiby.Xu@gmail.com, hreitz@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v5 7/8] vduse-blk: Add vduse-blk resize support Date: Wed, 4 May 2022 15:40:50 +0800 Message-Id: <20220504074051.90-8-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220504074051.90-1-xieyongji@bytedance.com> References: <20220504074051.90-1-xieyongji@bytedance.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::1031; envelope-from=xieyongji@bytedance.com; helo=mail-pj1-x1031.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" To support block resize, this uses vduse_dev_update_config() to update the capacity field in configuration space and inject config interrupt on the block resize callback. Signed-off-by: Xie Yongji Reviewed-by: Stefan Hajnoczi Reviewed-by: Stefan Hajnoczi --- block/export/vduse-blk.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c index 8580ae929f..2b72baf7ab 100644 --- a/block/export/vduse-blk.c +++ b/block/export/vduse-blk.c @@ -188,6 +188,23 @@ static void blk_aio_detach(void *opaque) vblk_exp->export.ctx = NULL; } +static void vduse_blk_resize(void *opaque) +{ + BlockExport *exp = opaque; + VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); + struct virtio_blk_config config; + + config.capacity = + cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS); + vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity), + offsetof(struct virtio_blk_config, capacity), + (char *)&config.capacity); +} + +static const BlockDevOps vduse_block_ops = { + .resize_cb = vduse_blk_resize, +}; + static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, Error **errp) { @@ -284,6 +301,8 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach, vblk_exp); + blk_set_dev_ops(exp->blk, &vduse_block_ops, exp); + return 0; } @@ -293,6 +312,7 @@ static void vduse_blk_exp_delete(BlockExport *exp) blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach, vblk_exp); + blk_set_dev_ops(exp->blk, NULL, NULL); vduse_dev_destroy(vblk_exp->dev); } From patchwork Wed May 4 07:40:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12837295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 524BDC433EF for ; Wed, 4 May 2022 08:14:42 +0000 (UTC) Received: from localhost ([::1]:59856 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nmA9h-0001PT-4y for qemu-devel@archiver.kernel.org; Wed, 04 May 2022 04:14:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52636) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nm9dI-0002dh-QH for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:20 -0400 Received: from mail-pf1-x42a.google.com ([2607:f8b0:4864:20::42a]:35680) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nm9dG-00073q-8o for qemu-devel@nongnu.org; Wed, 04 May 2022 03:41:12 -0400 Received: by mail-pf1-x42a.google.com with SMTP id c14so527376pfn.2 for ; Wed, 04 May 2022 00:41:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vbkjJJtzhy2/yx1qez+ESSO3KG9iidL3ZRJ6DYPRfoA=; b=LIl558HW/3C6XMzYQURH0uE8Tlyjm/AQB2jGYlStbszUPAEgfUZchaFRWV72fUWy6U yaV4FmJFr3bj48NyYWprO95/hYdzURtektNdlk9sFfrRGOiURHW69wdKI3SMZat4gM6s y3OLuCJWqBYMG8M+FeocyMTEaOm42oRZcsx6gSdB+n9c+1NKD4n9BXDTXlN8Ks5Rtb1M nU4AjWYe7UCr5NtLtMrldZfBhweUPoYsVlDk38sRYwksxPL3ubzbfQ3mkVFTdRiehDZU NP6kEl0AEKlbMeqwXl5skxApmMIDxAYPTFW5s7lUr04hFwaOqhUFIPSdl9jJp6dUa4P4 Z8Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vbkjJJtzhy2/yx1qez+ESSO3KG9iidL3ZRJ6DYPRfoA=; b=HbMnhQcUhWEntAMyKyPqGXPkLZrqiNtEw/WdUuaSWKyLT4dOjdarY/g5/T91LHyYHn QH/nhggq73h6s7EXBcIphfDEnT00wOp9AwON4nQ6ULe6Hem2eIsMkFG4Lone0XPl9s1A XZJUovNjLQsyrIjYOLZsFGlNehWzwGIC2z3KYvCUhIzAN+b8GKkAxH7SXiqfDnr6QhG0 vrm3RNlCePvKMEBp9O4497rcLxyvJlOD4C9PNaAsG6tw83iBUDg0pfJQvnd931kZ6T3W RjHLRVeIXgYk1HwXQwgXatmHy9ab3qXNOh5zqZB71Tm9QAPF4nOYn7wGf73f1VfDHZ/B aioA== X-Gm-Message-State: AOAM532txtpalJhvxf6t1fZuiuURzVzpGONWBFpLJPCalu9zjBlXTvgY FfuhtQJXA26+E8bNISgNo9cp X-Google-Smtp-Source: ABdhPJzIRG8TyjOlWLgVmFbCcDbDhicy2m6F7afkEf6x5WQY2AJFWHuh+tQvhwRAl8VrW/iF9Dis/g== X-Received: by 2002:a63:515a:0:b0:3c1:4902:5e6a with SMTP id r26-20020a63515a000000b003c149025e6amr16416983pgl.203.1651650069504; Wed, 04 May 2022 00:41:09 -0700 (PDT) Received: from localhost ([139.177.225.233]) by smtp.gmail.com with ESMTPSA id t3-20020aa79383000000b0050dc762815asm7542633pfe.52.2022.05.04.00.41.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 May 2022 00:41:08 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com, jsnow@redhat.com, eblake@redhat.com, Coiby.Xu@gmail.com, hreitz@redhat.com Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v5 8/8] libvduse: Add support for reconnecting Date: Wed, 4 May 2022 15:40:51 +0800 Message-Id: <20220504074051.90-9-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220504074051.90-1-xieyongji@bytedance.com> References: <20220504074051.90-1-xieyongji@bytedance.com> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::42a; envelope-from=xieyongji@bytedance.com; helo=mail-pf1-x42a.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" To support reconnecting after restart or crash, VDUSE backend might need to resubmit inflight I/Os. This stores the metadata such as the index of inflight I/O's descriptors to a shm file so that VDUSE backend can restore them during reconnecting. Signed-off-by: Xie Yongji --- block/export/vduse-blk.c | 14 ++ subprojects/libvduse/libvduse.c | 235 +++++++++++++++++++++++++++++++- subprojects/libvduse/libvduse.h | 12 ++ 3 files changed, 256 insertions(+), 5 deletions(-) diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c index 2b72baf7ab..1fd787a862 100644 --- a/block/export/vduse-blk.c +++ b/block/export/vduse-blk.c @@ -33,6 +33,7 @@ typedef struct VduseBlkExport { VduseDev *dev; uint16_t num_queues; bool writable; + char *recon_file; } VduseBlkExport; typedef struct VduseBlkReq { @@ -111,6 +112,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq) aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), true, on_vduse_vq_kick, NULL, NULL, NULL, vq); + /* Make sure we don't miss any kick afer reconnecting */ + eventfd_write(vduse_queue_get_fd(vq), 1); } static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq) @@ -291,6 +294,15 @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, return -ENOMEM; } + vblk_exp->recon_file = g_strdup_printf("%s/vduse-blk-%s", + g_get_tmp_dir(), exp->id); + if (vduse_set_reconnect_log_file(vblk_exp->dev, vblk_exp->recon_file)) { + error_setg(errp, "failed to set reconnect log file"); + vduse_dev_destroy(vblk_exp->dev); + g_free(vblk_exp->recon_file); + return -EINVAL; + } + for (i = 0; i < num_queues; i++) { vduse_dev_setup_queue(vblk_exp->dev, i, queue_size); } @@ -314,6 +326,8 @@ static void vduse_blk_exp_delete(BlockExport *exp) vblk_exp); blk_set_dev_ops(exp->blk, NULL, NULL); vduse_dev_destroy(vblk_exp->dev); + unlink(vblk_exp->recon_file); + g_free(vblk_exp->recon_file); } static void vduse_blk_exp_request_shutdown(BlockExport *exp) diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvduse.c index ecee9c0568..b27145ceed 100644 --- a/subprojects/libvduse/libvduse.c +++ b/subprojects/libvduse/libvduse.c @@ -41,6 +41,8 @@ #define VDUSE_VQ_ALIGN 4096 #define MAX_IOVA_REGIONS 256 +#define LOG_ALIGNMENT 64 + /* Round number down to multiple */ #define ALIGN_DOWN(n, m) ((n) / (m) * (m)) @@ -51,6 +53,31 @@ #define unlikely(x) __builtin_expect(!!(x), 0) #endif +typedef struct VduseDescStateSplit { + uint8_t inflight; + uint8_t padding[5]; + uint16_t next; + uint64_t counter; +} VduseDescStateSplit; + +typedef struct VduseVirtqLogInflight { + uint64_t features; + uint16_t version; + uint16_t desc_num; + uint16_t last_batch_head; + uint16_t used_idx; + VduseDescStateSplit desc[]; +} VduseVirtqLogInflight; + +typedef struct VduseVirtqLog { + VduseVirtqLogInflight inflight; +} VduseVirtqLog; + +typedef struct VduseVirtqInflightDesc { + uint16_t index; + uint64_t counter; +} VduseVirtqInflightDesc; + typedef struct VduseRing { unsigned int num; uint64_t desc_addr; @@ -73,6 +100,10 @@ struct VduseVirtq { bool ready; int fd; VduseDev *dev; + VduseVirtqInflightDesc *resubmit_list; + uint16_t resubmit_num; + uint64_t counter; + VduseVirtqLog *log; }; typedef struct VduseIovaRegion { @@ -96,8 +127,36 @@ struct VduseDev { int fd; int ctrl_fd; void *priv; + void *log; }; +static inline size_t vduse_vq_log_size(uint16_t queue_size) +{ + return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size + + sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT); +} + +static void *vduse_log_get(const char *filename, size_t size) +{ + void *ptr = MAP_FAILED; + int fd; + + fd = open(filename, O_RDWR | O_CREAT, 0600); + if (fd == -1) { + return MAP_FAILED; + } + + if (ftruncate(fd, size) == -1) { + goto out; + } + + ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +out: + close(fd); + return ptr; +} + static inline bool has_feature(uint64_t features, unsigned int fbit) { assert(fbit < 64); @@ -148,6 +207,105 @@ static int vduse_inject_irq(VduseDev *dev, int index) return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index); } +static int inflight_desc_compare(const void *a, const void *b) +{ + VduseVirtqInflightDesc *desc0 = (VduseVirtqInflightDesc *)a, + *desc1 = (VduseVirtqInflightDesc *)b; + + if (desc1->counter > desc0->counter && + (desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) { + return 1; + } + + return -1; +} + +static int vduse_queue_check_inflights(VduseVirtq *vq) +{ + int i = 0; + VduseDev *dev = vq->dev; + + vq->used_idx = le16toh(vq->vring.used->idx); + vq->resubmit_num = 0; + vq->resubmit_list = NULL; + vq->counter = 0; + + if (unlikely(vq->log->inflight.used_idx != vq->used_idx)) { + if (vq->log->inflight.last_batch_head > VIRTQUEUE_MAX_SIZE) { + return -1; + } + + vq->log->inflight.desc[vq->log->inflight.last_batch_head].inflight = 0; + + barrier(); + + vq->log->inflight.used_idx = vq->used_idx; + } + + for (i = 0; i < vq->log->inflight.desc_num; i++) { + if (vq->log->inflight.desc[i].inflight == 1) { + vq->inuse++; + } + } + + vq->shadow_avail_idx = vq->last_avail_idx = vq->inuse + vq->used_idx; + + if (vq->inuse) { + vq->resubmit_list = calloc(vq->inuse, sizeof(VduseVirtqInflightDesc)); + if (!vq->resubmit_list) { + return -1; + } + + for (i = 0; i < vq->log->inflight.desc_num; i++) { + if (vq->log->inflight.desc[i].inflight) { + vq->resubmit_list[vq->resubmit_num].index = i; + vq->resubmit_list[vq->resubmit_num].counter = + vq->log->inflight.desc[i].counter; + vq->resubmit_num++; + } + } + + if (vq->resubmit_num > 1) { + qsort(vq->resubmit_list, vq->resubmit_num, + sizeof(VduseVirtqInflightDesc), inflight_desc_compare); + } + vq->counter = vq->resubmit_list[0].counter + 1; + } + + vduse_inject_irq(dev, vq->index); + + return 0; +} + +static int vduse_queue_inflight_get(VduseVirtq *vq, int desc_idx) +{ + vq->log->inflight.desc[desc_idx].counter = vq->counter++; + + barrier(); + + vq->log->inflight.desc[desc_idx].inflight = 1; + + return 0; +} + +static int vduse_queue_inflight_pre_put(VduseVirtq *vq, int desc_idx) +{ + vq->log->inflight.last_batch_head = desc_idx; + + return 0; +} + +static int vduse_queue_inflight_post_put(VduseVirtq *vq, int desc_idx) +{ + vq->log->inflight.desc[desc_idx].inflight = 0; + + barrier(); + + vq->log->inflight.used_idx = vq->used_idx; + + return 0; +} + static void vduse_iova_remove_region(VduseDev *dev, uint64_t start, uint64_t last) { @@ -596,11 +754,24 @@ void *vduse_queue_pop(VduseVirtq *vq, size_t sz) unsigned int head; VduseVirtqElement *elem; VduseDev *dev = vq->dev; + int i; if (unlikely(!vq->vring.avail)) { return NULL; } + if (unlikely(vq->resubmit_list && vq->resubmit_num > 0)) { + i = (--vq->resubmit_num); + elem = vduse_queue_map_desc(vq, vq->resubmit_list[i].index, sz); + + if (!vq->resubmit_num) { + free(vq->resubmit_list); + vq->resubmit_list = NULL; + } + + return elem; + } + if (vduse_queue_empty(vq)) { return NULL; } @@ -628,6 +799,8 @@ void *vduse_queue_pop(VduseVirtq *vq, size_t sz) vq->inuse++; + vduse_queue_inflight_get(vq, head); + return elem; } @@ -685,7 +858,9 @@ void vduse_queue_push(VduseVirtq *vq, const VduseVirtqElement *elem, unsigned int len) { vduse_queue_fill(vq, elem, len, 0); + vduse_queue_inflight_pre_put(vq, elem->index); vduse_queue_flush(vq, 1); + vduse_queue_inflight_post_put(vq, elem->index); } static int vduse_queue_update_vring(VduseVirtq *vq, uint64_t desc_addr, @@ -758,12 +933,15 @@ static void vduse_queue_enable(VduseVirtq *vq) } vq->fd = fd; - vq->shadow_avail_idx = vq->last_avail_idx = vq_info.split.avail_index; - vq->inuse = 0; - vq->used_idx = 0; vq->signalled_used_valid = false; vq->ready = true; + if (vduse_queue_check_inflights(vq)) { + fprintf(stderr, "Failed to check inflights for vq[%d]\n", vq->index); + close(fd); + return; + } + dev->ops->enable_queue(dev, vq); } @@ -813,11 +991,15 @@ static void vduse_dev_start_dataplane(VduseDev *dev) static void vduse_dev_stop_dataplane(VduseDev *dev) { + size_t log_size = dev->num_queues * vduse_vq_log_size(VIRTQUEUE_MAX_SIZE); int i; for (i = 0; i < dev->num_queues; i++) { vduse_queue_disable(&dev->vqs[i]); } + if (dev->log) { + memset(dev->log, 0, log_size); + } dev->features = 0; vduse_iova_remove_region(dev, 0, ULONG_MAX); } @@ -926,6 +1108,30 @@ int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size) return -errno; } + vduse_queue_enable(vq); + + return 0; +} + +int vduse_set_reconnect_log_file(VduseDev *dev, const char *filename) +{ + + size_t log_size = dev->num_queues * vduse_vq_log_size(VIRTQUEUE_MAX_SIZE); + void *log; + int i; + + dev->log = log = vduse_log_get(filename, log_size); + if (log == MAP_FAILED) { + fprintf(stderr, "Failed to get vduse log\n"); + return -EINVAL; + } + + for (i = 0; i < dev->num_queues; i++) { + dev->vqs[i].log = log; + dev->vqs[i].log->inflight.desc_num = VIRTQUEUE_MAX_SIZE; + log = (void *)((char *)log + vduse_vq_log_size(VIRTQUEUE_MAX_SIZE)); + } + return 0; } @@ -970,6 +1176,12 @@ static int vduse_dev_init(VduseDev *dev, const char *name, return -errno; } + if (ioctl(fd, VDUSE_DEV_GET_FEATURES, &dev->features)) { + fprintf(stderr, "Failed to get features: %s\n", strerror(errno)); + close(fd); + return -errno; + } + dev_name = strdup(name); if (!dev_name) { close(fd); @@ -1014,6 +1226,12 @@ VduseDev *vduse_dev_create_by_fd(int fd, uint16_t num_queues, return NULL; } + if (ioctl(fd, VDUSE_DEV_GET_FEATURES, &dev->features)) { + fprintf(stderr, "Failed to get features: %s\n", strerror(errno)); + free(dev); + return NULL; + } + ret = vduse_dev_init_vqs(dev, num_queues); if (ret) { fprintf(stderr, "Failed to init vqs\n"); @@ -1113,7 +1331,7 @@ VduseDev *vduse_dev_create(const char *name, uint32_t device_id, ret = ioctl(ctrl_fd, VDUSE_CREATE_DEV, dev_config); free(dev_config); - if (ret < 0) { + if (ret && errno != EEXIST) { fprintf(stderr, "Failed to create vduse device %s: %s\n", name, strerror(errno)); goto err_dev; @@ -1140,8 +1358,15 @@ err_ctrl: int vduse_dev_destroy(VduseDev *dev) { - int ret = 0; + size_t log_size = dev->num_queues * vduse_vq_log_size(VIRTQUEUE_MAX_SIZE); + int i, ret = 0; + if (dev->log) { + munmap(dev->log, log_size); + } + for (i = 0; i < dev->num_queues; i++) { + free(dev->vqs[i].resubmit_list); + } free(dev->vqs); if (dev->fd > 0) { close(dev->fd); diff --git a/subprojects/libvduse/libvduse.h b/subprojects/libvduse/libvduse.h index 6c2fe98213..32f19e7b48 100644 --- a/subprojects/libvduse/libvduse.h +++ b/subprojects/libvduse/libvduse.h @@ -173,6 +173,18 @@ int vduse_dev_update_config(VduseDev *dev, uint32_t size, */ int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size); +/** + * vduse_set_reconnect_log_file: + * @dev: VDUSE device + * @file: filename of reconnect log + * + * Specify the file to store log for reconnecting. It should + * be called before vduse_dev_setup_queue(). + * + * Returns: 0 on success, -errno on failure. + */ +int vduse_set_reconnect_log_file(VduseDev *dev, const char *filename); + /** * vduse_dev_create_by_fd: * @fd: passed file descriptor