From patchwork Mon Mar 15 05:37:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138345 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 409B9C43331 for ; Mon, 15 Mar 2021 05:38:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2791D61574 for ; Mon, 15 Mar 2021 05:38:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229929AbhCOFiO (ORCPT ); Mon, 15 Mar 2021 01:38:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229924AbhCOFhy (ORCPT ); Mon, 15 Mar 2021 01:37:54 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74352C06175F for ; Sun, 14 Mar 2021 22:37:54 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id kr3-20020a17090b4903b02900c096fc01deso14108268pjb.4 for ; Sun, 14 Mar 2021 22:37:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aYq0J+9b4tKDZXBHw2wOWvtavxgBRahDPr03liFECw0=; b=puwumHcKmr61wNYUFotpbK2IfHnTbHcpCnzievNuhMmjO4+Eo52l2U71CPjXORPQz0 YWaZVO/rDFNIZGuUgdkCM/DYmhSJACrGuGbgZ96S0h8GYCQo6udNxn/gTu3jjeOrPW7W hVbFsIw4Q4NwTZtm31kC2rzC+rWg9rO4G5tG79ebQG6fU5JB9zEI0EEx75Cx+YFRO6IT As3HPa1UlB63m9oLqpFab0cmMrrodrVeOKP/AkxkFjU5tWMyxjzRvXIYkER++lr4Q7zv l/oDALFuxF6tADxCV5wqSCQc+zPtkKBdt0k5n/nVcBHSOyLMB23OQQcHkXC8zoumc/tL RIAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aYq0J+9b4tKDZXBHw2wOWvtavxgBRahDPr03liFECw0=; b=KqeQ0HIJNXBZWwYNCm/tygLl/i5U4jBfmPlmVqTIrsg+z9Uxr8mdjPv3iwHIGUha85 hBsMvupC2Jl6cmgayTzCT8kH9JEyjhDYNYMaXEdsmzw51hoYOJ9cqrfiEzPDInK/fQzi DIy71fBke80eiDys+B7ex2dkWkUsRL8nqzlpI02+Ser6arB5oh7AdQppDcSACfBCLYiR cktvnrk0xQDvLCTRrH7GjmRi6igbghios5bjhO+7Idp0gChhofmT/BGddCSst8b6AEZc pSA8PP0TfJ/9Fby2CQ3i9zQmJjGzKqF4uQUYGLNAXWulF9F1NaCGxgBP2lpIyX10zrlM qCdw== X-Gm-Message-State: AOAM530hC1NUm/m2EXgUlZ632jZSuhvP7Ema4qMg7CjumLUMxKxKNagc Qh39JIEfqMhOg0Ath341akt5 X-Google-Smtp-Source: ABdhPJy4+A1RYkcNV8xwLGkTiZKllYZiilY4Bu0DGetxutWdy9edDqFjAl4qPO0fDEr21OfnR4xE2Q== X-Received: by 2002:a17:902:ecc3:b029:e5:d7cc:2a20 with SMTP id a3-20020a170902ecc3b02900e5d7cc2a20mr9865008plh.11.1615786674068; Sun, 14 Mar 2021 22:37:54 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id h16sm8544215pfc.194.2021.03.14.22.37.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:37:53 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 01/11] file: Export __receive_fd() to modules Date: Mon, 15 Mar 2021 13:37:11 +0800 Message-Id: <20210315053721.189-2-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Export __receive_fd() so that some modules can use it to pass file descriptor between processes. Signed-off-by: Xie Yongji --- fs/file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/file.c b/fs/file.c index dab120b71e44..a2e5bcae63ba 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1107,6 +1107,7 @@ int __receive_fd(int fd, struct file *file, int __user *ufd, unsigned int o_flag __receive_sock(file); return new_fd; } +EXPORT_SYMBOL(__receive_fd); static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags) { From patchwork Mon Mar 15 05:37:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138347 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4CE7C432C3 for ; Mon, 15 Mar 2021 05:38:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9836364D5D for ; Mon, 15 Mar 2021 05:38:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229936AbhCOFiP (ORCPT ); Mon, 15 Mar 2021 01:38:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229927AbhCOFh6 (ORCPT ); Mon, 15 Mar 2021 01:37:58 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1499C061574 for ; Sun, 14 Mar 2021 22:37:57 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id y67so5872550pfb.2 for ; Sun, 14 Mar 2021 22:37:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=vQwyOgJ8FbVKIyGmYgv1VI8rf3/7i+776SIOH8C9h3s=; b=tKC5Kczn7W4GX2DVAtJhGQH7OZ/Y9d5HQGZokltTFvIKjewpr8wlARk9gk5FOOpdyM u6azIY9AT5RdJPG56be28vHXhQeiduc8tM0CpBk9BMSctJXkdS1HVd569aB0ftF35541 In86LSyOjqLfo31fh21uUaSqQsXVZDOp+KOKDjas7vRQjJ0ZAOsHYfW4UaNxC8Zr5gPy DLd2kiSTWjC8AqTt4t7cWjtLSMcGSs6gpn1l0Z3KGUQYnFctmVN/MM5pr8HiggThoQfi yevqx1dZqd49OTgP9EADlRkPZgMm2AfL/AO2bfmC4Zg9pPqTvDR76CiYtth8OvtWG8BW caMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=vQwyOgJ8FbVKIyGmYgv1VI8rf3/7i+776SIOH8C9h3s=; b=WlMgIEditMxR0x1cYUqIc/of3dToOtkYRvp0M1BPSrSIIUSxDDCG4gNOpuGjbqQNtH enwREm5+5HviAedU4lPZur3ykxrHM2fXTBleH8i5gBUKuyrT23KpsrqxnjWxXZi4Wi7G j11TA45Qlrh0o/fYPr8mPPNRbjPzWfiZNLVz+mX5nIUE25yQrUQNhDLhzWSPOxmdGC8s 9x8tdy1Z3SunFptpBedYRJ8L04Vu9bcCj6ji6vVEpLhe+wJBc6V84JM70kP+mC73ANKI jbMVNxHsfyvCBeyXBznrbSMGUOYWKiKAdP3VLbw85jeHMyo83mXlnvpFOQKCc4ZTE9+h Z4mg== X-Gm-Message-State: AOAM533Iyy7qWgShsobWMHEJfqaQwZ+dBDatRZ0gxH7BLsW78Xbn3kW3 XdPD90bTzNrq512um4n/YT8W X-Google-Smtp-Source: ABdhPJw90klV/lhgLs9r2kWZa9+r4MKFCLT6x38DG1GBOMIxMW2w5D3Tc5a7icPrhx3yRBOSiAwkmQ== X-Received: by 2002:a65:5cc2:: with SMTP id b2mr4878167pgt.280.1615786677555; Sun, 14 Mar 2021 22:37:57 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id u2sm9282968pjy.14.2021.03.14.22.37.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:37:57 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 02/11] eventfd: Increase the recursion depth of eventfd_signal() Date: Mon, 15 Mar 2021 13:37:12 +0800 Message-Id: <20210315053721.189-3-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Increase the recursion depth of eventfd_signal() to 1. This is the maximum recursion depth we have found so far, which can be triggered with the following call chain: kvm_io_bus_write [kvm] --> ioeventfd_write [kvm] --> eventfd_signal [eventfd] --> vhost_poll_wakeup [vhost] --> vduse_vdpa_kick_vq [vduse] --> eventfd_signal [eventfd] Acked-by: Jason Wang Signed-off-by: Xie Yongji --- fs/eventfd.c | 2 +- include/linux/eventfd.h | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index e265b6dd4f34..cc7cd1dbedd3 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -71,7 +71,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) * it returns true, the eventfd_signal() call should be deferred to a * safe context. */ - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) + if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH)) return 0; spin_lock_irqsave(&ctx->wqh.lock, flags); diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index fa0a524baed0..886d99cd38ef 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -29,6 +29,9 @@ #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE) +/* Maximum recursion depth */ +#define EFD_WAKE_DEPTH 1 + struct eventfd_ctx; struct file; @@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count); static inline bool eventfd_signal_count(void) { - return this_cpu_read(eventfd_wake_count); + return this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH; } #else /* CONFIG_EVENTFD */ From patchwork Mon Mar 15 05:37:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138349 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE726C43603 for ; Mon, 15 Mar 2021 05:38:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BC11F601FA for ; Mon, 15 Mar 2021 05:38:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230157AbhCOFiQ (ORCPT ); Mon, 15 Mar 2021 01:38:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229939AbhCOFiC (ORCPT ); Mon, 15 Mar 2021 01:38:02 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E43DAC061574 for ; Sun, 14 Mar 2021 22:38:01 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id x7-20020a17090a2b07b02900c0ea793940so13729705pjc.2 for ; Sun, 14 Mar 2021 22:38:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EwpfT4N+OvVlLx64Zz6EFx++RG+spKqj79bcGdpdHNo=; b=IaLsea4Qapq24mtt06XRRqn77JxIdZFz5G+/ZPPDmRS64B/Xy2AGEz0qobHDLz8xfe BZpZgwCy9nFn0d7KEzmZiXvBz/AWcBhxpdmIhkFmbGutN+kecLaoxO5nFt3I8XseBpCj pQ/LgiaiYqJW1I2Ns1VkhijJsPlhNyHmTcRdXmUGkXadCRRZ9dKakoc9y+5ZxWXGYBX4 wej/YpyCk2M+mM3i3PQSGf/8lEZR9jTFoZ5imCYsnksDgmtrnGI5c+6jSXvhlXrNLgKD wnibXBEXgWgS+dLgtJxMgVqWxNCVm4Mn52K7q+51Sw2Thj7ZrqO+/lVg6e/ExQezhZiP cnCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EwpfT4N+OvVlLx64Zz6EFx++RG+spKqj79bcGdpdHNo=; b=E/XgqnpYsYu8GNpFXKRD4o+6BTS+l+bGzbMz0mjGVRJGfPPNtSX1YLN+KLKQ7JGGVU NgB4tuveSHapA/K4Y14a8DGtw9MZrVNVgBumpF2sO7VJDB3VKC7S86b246sG1G8jAFCK +P/lKie2aDeoPAgG08eAY8ZLFY5PXrD7KH7laz0Xc/jT0x0lbSiDgL3CmDUuYvJMRBwz ZMfdlTJvDeQUfdbmtSASDGPdtx5dmBQJ46rtKeltLQHj6B3LINC6/+5nKNR02dSzEzHO xkmk0oTxHrgZxF/i4wla7h7PFCr24vxxzK7i+o/sgLhsy1xx7/bayD2PYsmLrSyN32MF +oPg== X-Gm-Message-State: AOAM5328W8l8bZHimm8lKLjQ+w7y9kFCcyKND1ddunMJwfGx2/mJ/bjw GApCXXZxHPx0o3H4fYjpCxv8 X-Google-Smtp-Source: ABdhPJyrl/g0uemO0TIBbEB3y4rgqF1KLHEIEOQZKdoqnVu3n9aZkgI7tttjndhnEN+STq1za8FvEA== X-Received: by 2002:a17:90a:d903:: with SMTP id c3mr10967111pjv.27.1615786681561; Sun, 14 Mar 2021 22:38:01 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id f11sm5475738pga.34.2021.03.14.22.38.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:01 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 03/11] vhost-vdpa: protect concurrent access to vhost device iotlb Date: Mon, 15 Mar 2021 13:37:13 +0800 Message-Id: <20210315053721.189-4-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Use vhost_dev->mutex to protect vhost device iotlb from concurrent access. Fixes: 4c8cf318("vhost: introduce vDPA-based backend") Signed-off-by: Xie Yongji Acked-by: Jason Wang Reviewed-by: Stefano Garzarella --- drivers/vhost/vdpa.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index cb14c66eb2ec..3f7175c2ac24 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -719,9 +719,11 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, const struct vdpa_config_ops *ops = vdpa->config; int r = 0; + mutex_lock(&dev->mutex); + r = vhost_dev_check_owner(dev); if (r) - return r; + goto unlock; switch (msg->type) { case VHOST_IOTLB_UPDATE: @@ -742,6 +744,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, r = -EINVAL; break; } +unlock: + mutex_unlock(&dev->mutex); return r; } From patchwork Mon Mar 15 05:37:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138351 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D651CC4360C for ; Mon, 15 Mar 2021 05:38:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A87C264E0F for ; Mon, 15 Mar 2021 05:38:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230169AbhCOFiR (ORCPT ); Mon, 15 Mar 2021 01:38:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229944AbhCOFiF (ORCPT ); Mon, 15 Mar 2021 01:38:05 -0400 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B37BC061762 for ; Sun, 14 Mar 2021 22:38:05 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id 30so10165477ple.4 for ; Sun, 14 Mar 2021 22:38:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZinwHoffrnXP3ywQjEhLA6Z7sImGc8X+vWD+k+yVcXw=; b=jipQjp0vAxMfYHmaj5s0kWN9Ke/4CsNxigw7n7A+2C7XFhG+ZbANQFOyP1z77EE2gG D1CrlOJtUQell0Q0O7/KhFhhg0FUMf1HI1/+D7/jipA+myFQwTySahEtiPG43luqyntR o8Z1zfIw7kJ8sxezd5jBydusWziXzXjiQdvFexL5PeEmkILq6lMfjX/3Nihae952Vmom iPj1LFIc/qHygjtZVY7w1+p5jsxuJIvVmoC4UKPOD3JXd4x1HLnDGMNxtOToK34nEb7W u3mKYXALhOv64TaqFUWCio77ncAhbrzr5nMcuFfgcSmx3u89Y5f9ifbL/0M7CkNGEamD /HIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZinwHoffrnXP3ywQjEhLA6Z7sImGc8X+vWD+k+yVcXw=; b=VkWgHLWb7SY9BcUdVMH52alXkmMizPdLITf8zYrhM+kR8zDawTWBLHqv3RVVCCzujY y05LxhorLVPqwzaPH9dqTwsqtAqw3dnHqlGIBBjzovrnp+xW6OHG7QpX6LkwDEH/OKly 34wjCh4xGqjCZglW/m8KjPkt0xKfdtQvp+bUa13EYiOUdL3fC7XH465mfZpQheh0pvxl N0/8403CCvrsvLey0EBoqrgCXNWVfXga/s0zMvwpxtnlTzy5EhvwcrgQjUd+TSc3TWWT YBm4ee0kzXmpTUaZmze3pChpe5eQvkffu0/Kfrwc0XHn4KPJNKCZNuJm9BXf1ntUqFkm rCCw== X-Gm-Message-State: AOAM530xRGkEdsKC8WHHiobQua1lSndVoOkE9Vhc2gTLi4OoG/FFhvYL 91ZZt82UStyAn4s/eptnOJ8U X-Google-Smtp-Source: ABdhPJz3eltTXJ6/U1MHiydOs5RJgb/pCjjpSyDi9M2ABZX4mV6uQbf0s9qvIlne9yl8ovrJ57QHKg== X-Received: by 2002:a17:902:74c4:b029:e4:9e16:18d9 with SMTP id f4-20020a17090274c4b02900e49e1618d9mr10219721plt.21.1615786685259; Sun, 14 Mar 2021 22:38:05 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id b22sm12821599pff.173.2021.03.14.22.38.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:04 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 04/11] vhost-iotlb: Add an opaque pointer for vhost IOTLB Date: Mon, 15 Mar 2021 13:37:14 +0800 Message-Id: <20210315053721.189-5-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add an opaque pointer for vhost IOTLB. And introduce vhost_iotlb_add_range_ctx() to accept it. Suggested-by: Jason Wang Acked-by: Jason Wang Signed-off-by: Xie Yongji --- drivers/vhost/iotlb.c | 20 ++++++++++++++++---- include/linux/vhost_iotlb.h | 3 +++ 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c index 0fd3f87e913c..5c99e1112cbb 100644 --- a/drivers/vhost/iotlb.c +++ b/drivers/vhost/iotlb.c @@ -36,19 +36,21 @@ void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, EXPORT_SYMBOL_GPL(vhost_iotlb_map_free); /** - * vhost_iotlb_add_range - add a new range to vhost IOTLB + * vhost_iotlb_add_range_ctx - add a new range to vhost IOTLB * @iotlb: the IOTLB * @start: start of the IOVA range * @last: last of IOVA range * @addr: the address that is mapped to @start * @perm: access permission of this range + * @opaque: the opaque pointer for the new mapping * * Returns an error last is smaller than start or memory allocation * fails */ -int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, - u64 start, u64 last, - u64 addr, unsigned int perm) +int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, + u64 start, u64 last, + u64 addr, unsigned int perm, + void *opaque) { struct vhost_iotlb_map *map; @@ -71,6 +73,7 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, map->last = last; map->addr = addr; map->perm = perm; + map->opaque = opaque; iotlb->nmaps++; vhost_iotlb_itree_insert(map, &iotlb->root); @@ -80,6 +83,15 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, return 0; } +EXPORT_SYMBOL_GPL(vhost_iotlb_add_range_ctx); + +int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, + u64 start, u64 last, + u64 addr, unsigned int perm) +{ + return vhost_iotlb_add_range_ctx(iotlb, start, last, + addr, perm, NULL); +} EXPORT_SYMBOL_GPL(vhost_iotlb_add_range); /** diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h index 6b09b786a762..2d0e2f52f938 100644 --- a/include/linux/vhost_iotlb.h +++ b/include/linux/vhost_iotlb.h @@ -17,6 +17,7 @@ struct vhost_iotlb_map { u32 perm; u32 flags_padding; u64 __subtree_last; + void *opaque; }; #define VHOST_IOTLB_FLAG_RETIRE 0x1 @@ -29,6 +30,8 @@ struct vhost_iotlb { unsigned int flags; }; +int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, u64 start, u64 last, + u64 addr, unsigned int perm, void *opaque); int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last, u64 addr, unsigned int perm); void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last); From patchwork Mon Mar 15 05:37:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138353 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFFDFC433E6 for ; Mon, 15 Mar 2021 05:39:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CBB8364E45 for ; Mon, 15 Mar 2021 05:39:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230198AbhCOFio (ORCPT ); Mon, 15 Mar 2021 01:38:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229970AbhCOFiJ (ORCPT ); Mon, 15 Mar 2021 01:38:09 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AF22C061762 for ; Sun, 14 Mar 2021 22:38:09 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id t85so5839649pfc.13 for ; Sun, 14 Mar 2021 22:38:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9N6blPgH0j2SgVauFBTpGoDtqNcJPr+KXjOwv3Qbftw=; b=bQke1R6CukVfwAhiXr8UKtzZtlbiVip03NEyWxemdUzMcuzOuWxzjrI80WHFsHCVZj VgWEt2fGfjTrYEG8uMZPpMNaeaanJr+jO+D/Zi5Ou/PJ6A34JwB2gZpuuY/eHRvp8pcn sM41UstY2+uSemscQK7ovGhqckuZWPyPosDaEhTdL4I/pEqUWs3yc+BS8KVONE2IHJ7a 3rnuaO6sJvFi1oUk1bvoSdN8q/7DDhtc2kLevLOOlDRve5wl3S9nDlHvtRGWG8hGZW3b vTJcFzqnVfVCLZLt9tk40WCW945DMIswKoLK0b6QuVhV9HPsO7fjWa+5ym4vnv9c0Elz +YEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9N6blPgH0j2SgVauFBTpGoDtqNcJPr+KXjOwv3Qbftw=; b=q/DqqcVa8p093uuRect16r7R8o5RG1Vbi3Q0MZiWz8N/kCUwpWpaWX5TldJBcaPmPU yPHu2gMGlyNRLq1JfLqR8Ih3YT9os7Epcb8xBRymSrxKwsH13HEo63m/63MaTxdmGkwT qnfyhFPEGE5rRui3qxGWf9e8wzHwGgy0uT/BhyVJSqlQuys1ExY/8bignPhOWI4Iuh40 ZW5dIenvxnVkyy/G/PKjEM+b+g2GQ86OCwr3Q9K6aysT1agpn9g1QjgQb76DtQQ7JWVG glbDrGaG22nUSze+pijo9WH6A3j0Xx+NdpJOdbZFVhejU527x/hwJ0+4eD7F1KuBLadI OiRA== X-Gm-Message-State: AOAM530WrIfsJ4a0y5L2zUSRM+qagCRYoMD9qAzNO/5Nqy4XmE6NjBLx Jt58WtUbOumfL/Sv4d5qPOob X-Google-Smtp-Source: ABdhPJxGHMGslrX8WILGl9BLpx9s6b9saPzqOKfAm9RjyzmW/KqylQHu/oucIhgrxvBLEkmBY14O2g== X-Received: by 2002:a62:84d0:0:b029:1f6:77d5:b75a with SMTP id k199-20020a6284d00000b02901f677d5b75amr9025140pfd.2.1615786688813; Sun, 14 Mar 2021 22:38:08 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id m5sm12191110pfd.96.2021.03.14.22.38.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:08 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 05/11] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() Date: Mon, 15 Mar 2021 13:37:15 +0800 Message-Id: <20210315053721.189-6-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add an opaque pointer for DMA mapping. Suggested-by: Jason Wang Acked-by: Jason Wang Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 +++--- drivers/vhost/vdpa.c | 2 +- include/linux/vdpa.h | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 5b6b2f87d40c..ff331f088baf 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -512,14 +512,14 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, } static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size, - u64 pa, u32 perm) + u64 pa, u32 perm, void *opaque) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); int ret; spin_lock(&vdpasim->iommu_lock); - ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa, - perm); + ret = vhost_iotlb_add_range_ctx(vdpasim->iommu, iova, iova + size - 1, + pa, perm, opaque); spin_unlock(&vdpasim->iommu_lock); return ret; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 3f7175c2ac24..b24ec69a374b 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -544,7 +544,7 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, return r; if (ops->dma_map) { - r = ops->dma_map(vdpa, iova, size, pa, perm); + r = ops->dma_map(vdpa, iova, size, pa, perm, NULL); } else if (ops->set_map) { if (!v->in_batch) r = ops->set_map(vdpa, dev->iotlb); diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 15fa085fab05..b01f7c9096bf 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -241,7 +241,7 @@ struct vdpa_config_ops { /* DMA ops */ int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb); int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size, - u64 pa, u32 perm); + u64 pa, u32 perm, void *opaque); int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size); /* Free device resources */ From patchwork Mon Mar 15 05:37:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138355 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FB13C4332B for ; Mon, 15 Mar 2021 05:39:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0C50764E1F for ; Mon, 15 Mar 2021 05:39:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230217AbhCOFip (ORCPT ); Mon, 15 Mar 2021 01:38:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230108AbhCOFiN (ORCPT ); Mon, 15 Mar 2021 01:38:13 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDE20C06175F for ; Sun, 14 Mar 2021 22:38:12 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id 205so2817359pgh.9 for ; Sun, 14 Mar 2021 22:38:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dGaxNIV8NpWEN5rzIRYOvlj8jKPS9Xwy93a1pAna0qs=; b=PJjx4wEMvdjt9TfXtK9oth2iCFk5tg+Kbh1GQ+WL3fo3atQZvQ5soPWQeJHqZ5QysE rE2omph93ad6uFbzSOyMAwXh31l4Nbq3rz9wVI8AH+p6O7oKXsrivgxhgGW1OnCJfKJG 9lE9wZGdmeBFRIEjC3pbNnwsI1khm69I7WqjgmKdQNoTTO5O2/uLv0tWCfXuJbbzIlE4 8ns3BHiSHfRKpj+bTZwky3ZkmFTvkHLxoOjnxtkEoESRnVN6m3NNtSJqe2XVXtwsqXyp gjFzEITksYK5V9cts/c14OkN8y7M2ApKhgBdk9SClogu17itbKdbCHiYuFx9N1qUbvX4 S6rQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dGaxNIV8NpWEN5rzIRYOvlj8jKPS9Xwy93a1pAna0qs=; b=cEKLblyB/TF/7Yge9RFZ9Jknwd4oQLZW07uejU5a1yWgziehiUM/A+NeLOmxAAYgE0 lmI/+hovHgN7JWZn48xGzSENj/r5bmlFdYWZcps0z0xLLRVRJxxfkif7QWfSYKWSBaXE dRkrsXY5ShTVxJ6+caPTrQj2vtett5kU5lrIN37hF+o9fY+4SoF0udvv05tSqfsKbw10 jsztcU3grrq90KAquNIJ7hbH3xH0g0lHjCRoiof6I7Glz5bazHl2qdpGFbUWGVAKu8U4 3nHnBeZlfVbPz/qq1Ewf7xOuzUbMvuAFHiBLRVmgucPs4rwRzekg7nJcehlL+bpsOYTj ZJew== X-Gm-Message-State: AOAM533thI8lOZFdZBHO2I0To3p+JXVJXBqWaxIksY3rLqZLNzoMoZpO Py9R5up8yFjIBJUqPqQ4K2Qx X-Google-Smtp-Source: ABdhPJyPNDa7cApRWnHwPzbO0R8pnkLtt+6ozRigyQSdUsp4cUDhgwxNSCPM22Y5/5glVT9mNaP8BA== X-Received: by 2002:a63:df10:: with SMTP id u16mr21733129pgg.308.1615786692602; Sun, 14 Mar 2021 22:38:12 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id r16sm12302000pfq.211.2021.03.14.22.38.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:12 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 06/11] vdpa: factor out vhost_vdpa_pa_map() Date: Mon, 15 Mar 2021 13:37:16 +0800 Message-Id: <20210315053721.189-7-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The upcoming patch is going to support VA mapping. So let's factor out the logic of PA mapping firstly to make the code more readable. Suggested-by: Jason Wang Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/vhost/vdpa.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index b24ec69a374b..7c83fbf3edac 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -579,37 +579,28 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) } } -static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, - struct vhost_iotlb_msg *msg) +static int vhost_vdpa_pa_map(struct vhost_vdpa *v, + u64 iova, u64 size, u64 uaddr, u32 perm) { struct vhost_dev *dev = &v->vdev; - struct vhost_iotlb *iotlb = dev->iotlb; struct page **page_list; unsigned long list_size = PAGE_SIZE / sizeof(struct page *); unsigned int gup_flags = FOLL_LONGTERM; unsigned long npages, cur_base, map_pfn, last_pfn = 0; unsigned long lock_limit, sz2pin, nchunks, i; - u64 iova = msg->iova; + u64 start = iova; long pinned; int ret = 0; - if (msg->iova < v->range.first || - msg->iova + msg->size - 1 > v->range.last) - return -EINVAL; - - if (vhost_iotlb_itree_first(iotlb, msg->iova, - msg->iova + msg->size - 1)) - return -EEXIST; - /* Limit the use of memory for bookkeeping */ page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) return -ENOMEM; - if (msg->perm & VHOST_ACCESS_WO) + if (perm & VHOST_ACCESS_WO) gup_flags |= FOLL_WRITE; - npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT; + npages = PAGE_ALIGN(size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT; if (!npages) { ret = -EINVAL; goto free; @@ -623,7 +614,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, goto unlock; } - cur_base = msg->uaddr & PAGE_MASK; + cur_base = uaddr & PAGE_MASK; iova &= PAGE_MASK; nchunks = 0; @@ -654,7 +645,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT; ret = vhost_vdpa_map(v, iova, csize, map_pfn << PAGE_SHIFT, - msg->perm); + perm); if (ret) { /* * Unpin the pages that are left unmapped @@ -683,7 +674,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, /* Pin the rest chunk */ ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT, - map_pfn << PAGE_SHIFT, msg->perm); + map_pfn << PAGE_SHIFT, perm); out: if (ret) { if (nchunks) { @@ -702,13 +693,32 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, for (pfn = map_pfn; pfn <= last_pfn; pfn++) unpin_user_page(pfn_to_page(pfn)); } - vhost_vdpa_unmap(v, msg->iova, msg->size); + vhost_vdpa_unmap(v, start, size); } unlock: mmap_read_unlock(dev->mm); free: free_page((unsigned long)page_list); return ret; + +} + +static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, + struct vhost_iotlb_msg *msg) +{ + struct vhost_dev *dev = &v->vdev; + struct vhost_iotlb *iotlb = dev->iotlb; + + if (msg->iova < v->range.first || + msg->iova + msg->size - 1 > v->range.last) + return -EINVAL; + + if (vhost_iotlb_itree_first(iotlb, msg->iova, + msg->iova + msg->size - 1)) + return -EEXIST; + + return vhost_vdpa_pa_map(v, msg->iova, msg->size, msg->uaddr, + msg->perm); } static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, From patchwork Mon Mar 15 05:37:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138357 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BFD8C43333 for ; Mon, 15 Mar 2021 05:39:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86BE761574 for ; Mon, 15 Mar 2021 05:39:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230242AbhCOFis (ORCPT ); Mon, 15 Mar 2021 01:38:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49522 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230158AbhCOFiQ (ORCPT ); Mon, 15 Mar 2021 01:38:16 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98F40C061574 for ; Sun, 14 Mar 2021 22:38:16 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id j6-20020a17090adc86b02900cbfe6f2c96so14119058pjv.1 for ; Sun, 14 Mar 2021 22:38:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=W+exRWI6njC+NPtM/IYTo1hQodYIxG7deQqKDh/owdo=; b=PAS5v7I1HhZ6zj65QEcDJ258EHwqk3fOibLXg2dxN1kYU/Mzac+fxWfg4pRaXYqLwY 9RkV+99InfcTPasN1LojvOdReLuVUnnFrwBv4O8aKoXJzqZppfdU2JrUcWIf8F9d7tIk z1zmsWygK5cgETFJaGBO5XCZdAOeBqkWsWBmMxlLEyuZBqggAuAEuCINJeWlz+bbzuoi LHYCgdqiEMEBU5xKEGaUqok2vLshgyWIN7Qm6ImkYZlaNUA5zp2UdGPNWUilI/BCBxjj Xz9nNODzUmgDI1qOegtvfW2jqCIvM2+nxZpqnCXzBIX1IrSyLZDgsYIp427ldXv5+6fn eAyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=W+exRWI6njC+NPtM/IYTo1hQodYIxG7deQqKDh/owdo=; b=gKsi3YBn/tv7u95epnf6VVXv6x67lSMY8x9P5d0KIlxUzouiyUDMwV/KkNzoGhopMt oyNrq/p5dI5YnT8RftgOncJ87dRSvt32HAjC99kewtmH0J3vVjtsur9xKW4lOoYZqT8i L55AEZxMxvZ7mdhamNOagZ1y7hj5736pTXZSga5BvVkNYHF1YUoQ2fAJBWvCIn6nE6BE rb16I2+0TKGhDuqVCh2bc2snEP/HKBILakq1+rVWdaYmDmSkthKHNNrIuwT1iVpfdurW GNwFbOh7q5/nAcghLSSZsKsIdykpEnpio+eADAMsrW9kFrWZfbJFLQ6RheU9kMo4FbZI d8cw== X-Gm-Message-State: AOAM532+9ULQlGFFSEoOjL869Gpcf8kazaOpnsphPcOZDR41eMIWN49U zAtw/GFzZ0xgBk12o9EWFxR2 X-Google-Smtp-Source: ABdhPJxBFWigs9OW2eyvn77wPEclBtvkRb9P7hsuilf/3cKfPlTFpMlXpIZizSF+aVychY7FOkiG9w== X-Received: by 2002:a17:90a:b898:: with SMTP id o24mr10675452pjr.14.1615786696187; Sun, 14 Mar 2021 22:38:16 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id d20sm9325224pjv.47.2021.03.14.22.38.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:15 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 07/11] vdpa: Support transferring virtual addressing during DMA mapping Date: Mon, 15 Mar 2021 13:37:17 +0800 Message-Id: <20210315053721.189-8-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch introduces an attribute for vDPA device to indicate whether virtual address can be used. If vDPA device driver set it, vhost-vdpa bus driver will not pin user page and transfer userspace virtual address instead of physical address during DMA mapping. And corresponding vma->vm_file and offset will be also passed as an opaque pointer. Suggested-by: Jason Wang Signed-off-by: Xie Yongji --- drivers/vdpa/ifcvf/ifcvf_main.c | 2 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- drivers/vdpa/vdpa.c | 9 +++- drivers/vdpa/vdpa_sim/vdpa_sim.c | 2 +- drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +- drivers/vhost/vdpa.c | 104 +++++++++++++++++++++++++++++++------- include/linux/vdpa.h | 19 +++++-- 7 files changed, 113 insertions(+), 27 deletions(-) diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index d555a6a5d1ba..aee013f3eb5f 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -431,7 +431,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) } adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, - dev, &ifc_vdpa_ops, NULL); + dev, &ifc_vdpa_ops, NULL, false); if (adapter == NULL) { IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); return -ENOMEM; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 71397fdafa6a..fb62ebcf464a 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1982,7 +1982,7 @@ static int mlx5v_probe(struct auxiliary_device *adev, max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS); ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops, - NULL); + NULL, false); if (IS_ERR(ndev)) return PTR_ERR(ndev); diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 5cffce67cab0..97fbac276c72 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -71,6 +71,7 @@ static void vdpa_release_dev(struct device *d) * @config: the bus operations that is supported by this device * @size: size of the parent structure that contains private data * @name: name of the vdpa device; optional. + * @use_va: indicate whether virtual address must be used by this device * * Driver should use vdpa_alloc_device() wrapper macro instead of * using this directly. @@ -80,7 +81,8 @@ static void vdpa_release_dev(struct device *d) */ struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - size_t size, const char *name) + size_t size, const char *name, + bool use_va) { struct vdpa_device *vdev; int err = -EINVAL; @@ -91,6 +93,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, if (!!config->dma_map != !!config->dma_unmap) goto err; + /* It should only work for the device that use on-chip IOMMU */ + if (use_va && !(config->dma_map || config->set_map)) + goto err; + err = -ENOMEM; vdev = kzalloc(size, GFP_KERNEL); if (!vdev) @@ -106,6 +112,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, vdev->index = err; vdev->config = config; vdev->features_valid = false; + vdev->use_va = use_va; if (name) err = dev_set_name(&vdev->dev, "%s", name); diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index ff331f088baf..d26334e9a412 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -235,7 +235,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr) ops = &vdpasim_config_ops; vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, - dev_attr->name); + dev_attr->name, false); if (!vdpasim) goto err_alloc; diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index 1321a2fcd088..03b36aed48d6 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -377,7 +377,7 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) return ret; vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa, - dev, &vp_vdpa_ops, NULL); + dev, &vp_vdpa_ops, NULL, false); if (vp_vdpa == NULL) { dev_err(dev, "vp_vdpa: Failed to allocate vDPA structure\n"); return -ENOMEM; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 7c83fbf3edac..b65c21ae98d1 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -480,21 +480,30 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) { struct vhost_dev *dev = &v->vdev; + struct vdpa_device *vdpa = v->vdpa; struct vhost_iotlb *iotlb = dev->iotlb; struct vhost_iotlb_map *map; + struct vdpa_map_file *map_file; struct page *page; unsigned long pfn, pinned; while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) { - pinned = map->size >> PAGE_SHIFT; - for (pfn = map->addr >> PAGE_SHIFT; - pinned > 0; pfn++, pinned--) { - page = pfn_to_page(pfn); - if (map->perm & VHOST_ACCESS_WO) - set_page_dirty_lock(page); - unpin_user_page(page); + if (!vdpa->use_va) { + pinned = map->size >> PAGE_SHIFT; + for (pfn = map->addr >> PAGE_SHIFT; + pinned > 0; pfn++, pinned--) { + page = pfn_to_page(pfn); + if (map->perm & VHOST_ACCESS_WO) + set_page_dirty_lock(page); + unpin_user_page(page); + } + atomic64_sub(map->size >> PAGE_SHIFT, + &dev->mm->pinned_vm); + } else { + map_file = (struct vdpa_map_file *)map->opaque; + fput(map_file->file); + kfree(map_file); } - atomic64_sub(map->size >> PAGE_SHIFT, &dev->mm->pinned_vm); vhost_iotlb_map_free(iotlb, map); } } @@ -530,21 +539,21 @@ static int perm_to_iommu_flags(u32 perm) return flags | IOMMU_CACHE; } -static int vhost_vdpa_map(struct vhost_vdpa *v, - u64 iova, u64 size, u64 pa, u32 perm) +static int vhost_vdpa_map(struct vhost_vdpa *v, u64 iova, + u64 size, u64 pa, u32 perm, void *opaque) { struct vhost_dev *dev = &v->vdev; struct vdpa_device *vdpa = v->vdpa; const struct vdpa_config_ops *ops = vdpa->config; int r = 0; - r = vhost_iotlb_add_range(dev->iotlb, iova, iova + size - 1, - pa, perm); + r = vhost_iotlb_add_range_ctx(dev->iotlb, iova, iova + size - 1, + pa, perm, opaque); if (r) return r; if (ops->dma_map) { - r = ops->dma_map(vdpa, iova, size, pa, perm, NULL); + r = ops->dma_map(vdpa, iova, size, pa, perm, opaque); } else if (ops->set_map) { if (!v->in_batch) r = ops->set_map(vdpa, dev->iotlb); @@ -552,13 +561,15 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, r = iommu_map(v->domain, iova, pa, size, perm_to_iommu_flags(perm)); } - - if (r) + if (r) { vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1); - else + return r; + } + + if (!vdpa->use_va) atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm); - return r; + return 0; } static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) @@ -579,6 +590,56 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) } } +static int vhost_vdpa_va_map(struct vhost_vdpa *v, + u64 iova, u64 size, u64 uaddr, u32 perm) +{ + struct vhost_dev *dev = &v->vdev; + u64 offset, map_size, map_iova = iova; + struct vdpa_map_file *map_file; + struct vm_area_struct *vma; + int ret; + + mmap_read_lock(dev->mm); + + while (size) { + vma = find_vma(dev->mm, uaddr); + if (!vma) { + ret = -EINVAL; + break; + } + map_size = min(size, vma->vm_end - uaddr); + if (!(vma->vm_file && (vma->vm_flags & VM_SHARED) && + !(vma->vm_flags & (VM_IO | VM_PFNMAP)))) + goto next; + + map_file = kzalloc(sizeof(*map_file), GFP_KERNEL); + if (!map_file) { + ret = -ENOMEM; + break; + } + offset = (vma->vm_pgoff << PAGE_SHIFT) + uaddr - vma->vm_start; + map_file->offset = offset; + map_file->file = get_file(vma->vm_file); + ret = vhost_vdpa_map(v, map_iova, map_size, uaddr, + perm, map_file); + if (ret) { + fput(map_file->file); + kfree(map_file); + break; + } +next: + size -= map_size; + uaddr += map_size; + map_iova += map_size; + } + if (ret) + vhost_vdpa_unmap(v, iova, map_iova - iova); + + mmap_read_unlock(dev->mm); + + return ret; +} + static int vhost_vdpa_pa_map(struct vhost_vdpa *v, u64 iova, u64 size, u64 uaddr, u32 perm) { @@ -645,7 +706,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT; ret = vhost_vdpa_map(v, iova, csize, map_pfn << PAGE_SHIFT, - perm); + perm, NULL); if (ret) { /* * Unpin the pages that are left unmapped @@ -674,7 +735,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, /* Pin the rest chunk */ ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT, - map_pfn << PAGE_SHIFT, perm); + map_pfn << PAGE_SHIFT, perm, NULL); out: if (ret) { if (nchunks) { @@ -707,6 +768,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, struct vhost_iotlb_msg *msg) { struct vhost_dev *dev = &v->vdev; + struct vdpa_device *vdpa = v->vdpa; struct vhost_iotlb *iotlb = dev->iotlb; if (msg->iova < v->range.first || @@ -717,6 +779,10 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, msg->iova + msg->size - 1)) return -EEXIST; + if (vdpa->use_va) + return vhost_vdpa_va_map(v, msg->iova, msg->size, + msg->uaddr, msg->perm); + return vhost_vdpa_pa_map(v, msg->iova, msg->size, msg->uaddr, msg->perm); } diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index b01f7c9096bf..e67404e4b23e 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -44,6 +44,7 @@ struct vdpa_mgmt_dev; * @config: the configuration ops for this device. * @index: device index * @features_valid: were features initialized? for legacy guests + * @use_va: indicate whether virtual address must be used by this device * @nvqs: maximum number of supported virtqueues * @mdev: management device pointer; caller must setup when registering device as part * of dev_add() mgmtdev ops callback before invoking _vdpa_register_device(). @@ -54,6 +55,7 @@ struct vdpa_device { const struct vdpa_config_ops *config; unsigned int index; bool features_valid; + bool use_va; int nvqs; struct vdpa_mgmt_dev *mdev; }; @@ -69,6 +71,16 @@ struct vdpa_iova_range { }; /** + * Corresponding file area for device memory mapping + * @file: vma->vm_file for the mapping + * @offset: mapping offset in the vm_file + */ +struct vdpa_map_file { + struct file *file; + u64 offset; +}; + +/** * vDPA_config_ops - operations for configuring a vDPA device. * Note: vDPA device drivers are required to implement all of the * operations unless it is mentioned to be optional in the following @@ -250,14 +262,15 @@ struct vdpa_config_ops { struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - size_t size, const char *name); + size_t size, const char *name, + bool use_va); -#define vdpa_alloc_device(dev_struct, member, parent, config, name) \ +#define vdpa_alloc_device(dev_struct, member, parent, config, name, use_va) \ container_of(__vdpa_alloc_device( \ parent, config, \ sizeof(dev_struct) + \ BUILD_BUG_ON_ZERO(offsetof( \ - dev_struct, member)), name), \ + dev_struct, member)), name, use_va), \ dev_struct, member) int vdpa_register_device(struct vdpa_device *vdev, int nvqs); From patchwork Mon Mar 15 05:37:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138359 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D75E7C43619 for ; Mon, 15 Mar 2021 05:39:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C668561574 for ; Mon, 15 Mar 2021 05:39:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230182AbhCOFiu (ORCPT ); Mon, 15 Mar 2021 01:38:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230171AbhCOFiU (ORCPT ); Mon, 15 Mar 2021 01:38:20 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B5CBC061762 for ; Sun, 14 Mar 2021 22:38:20 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id kr3-20020a17090b4903b02900c096fc01deso14108760pjb.4 for ; Sun, 14 Mar 2021 22:38:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=sHUYjbUDjCLKYREmESGF+Nb3IfODbGLwitZFsiNV5Vc=; b=dTDIXfSIGwYgcto+uHuzB9WPPaU02rdLz2eQXYXtvtPnwTFhvSMwVMuOB8DOnfilpk o4XlJ1Iwrggz0vme90woCgYQoE4PbkyriYoEkHD6+m2KnydFbbgDfTCwgYbeDbyTIF0u qfMymV0VqJ18sQN1lnuFaCAXrfDMbvnaN46HHwzV0HEDusQj+ZofEGGGmMFo8ThqrHSW /AKIspyXkZBwklrLewl+/QxaMJsdfFPe+WxpYlUvE0M62X6B4NgzqtVo6T6TkaxbnN4M Jjplk8yaX1ZAjbf5L/+JUoh3teIiKinKiMiAeda4G76v+GlA7IT2KrhJNFarEXZszF5M Ag4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sHUYjbUDjCLKYREmESGF+Nb3IfODbGLwitZFsiNV5Vc=; b=QOdGKCyVMxIBD5o6dOZ3VIeN9ctOaINToPAAo242dmAJ/t4AwvpBqpcFR5Cm6wSkFD orLwxdto+gji/PqY0Gk5gJZTFSl5BEqOh9uDtzIuyr+m/gr7dDd6q3NFAZJHaclOvEb0 itpiurFsNbMeNTDhuS213MWojKf+wtTl2tZO0s1g362WurlkTqyKC6wpFJS2F4AhkqE7 9/FiYnrafSZn84JvebE9bgq01L5JHxzkY08rjXnPZ3omjM4dqA1C1PGv588oXRz8Mp5o nyRrbrOSuUW4trMX+Reg12A1AvsEHWnNUW/fv+XV7PHIRjXm4MISr9w2OODOq18pczb1 gX9g== X-Gm-Message-State: AOAM5305jUTOACt8u1Dvsaw8h+VKdC+HwAg8PowFFq2Ufu4fC+tD4Fvj 3OfcyMW7Sos75b/tG8ub7C0p X-Google-Smtp-Source: ABdhPJz56eMDjG5RkOg+L9FilBjFAhRPnsS7mo76g2EnI8c/Z7bFwIBS/AycNa/PYJdE6NblzHiwsg== X-Received: by 2002:a17:902:7889:b029:e6:b9c3:bc0d with SMTP id q9-20020a1709027889b02900e6b9c3bc0dmr302779pll.23.1615786699767; Sun, 14 Mar 2021 22:38:19 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id o11sm9280717pjg.41.2021.03.14.22.38.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:19 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 08/11] vduse: Implement an MMU-based IOMMU driver Date: Mon, 15 Mar 2021 13:37:18 +0800 Message-Id: <20210315053721.189-9-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This implements an MMU-based IOMMU driver to support mapping kernel dma buffer into userspace. The basic idea behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set up MMU mapping instead of IOMMU mapping for the DMA transfer so that the userspace process is able to use its virtual address to access the dma buffer in kernel. And to avoid security issue, a bounce-buffering mechanism is introduced to prevent userspace accessing the original buffer directly. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_user/iova_domain.c | 535 +++++++++++++++++++++++++++++++++++ drivers/vdpa/vdpa_user/iova_domain.h | 75 +++++ 2 files changed, 610 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c new file mode 100644 index 000000000000..83de216b0e51 --- /dev/null +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -0,0 +1,535 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * MMU-based IOMMU implementation + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +static int vduse_iotlb_add_range(struct vduse_iova_domain *domain, + u64 start, u64 last, + u64 addr, unsigned int perm, + struct file *file, u64 offset) +{ + struct vdpa_map_file *map_file; + int ret; + + map_file = kmalloc(sizeof(*map_file), GFP_ATOMIC); + if (!map_file) + return -ENOMEM; + + map_file->file = get_file(file); + map_file->offset = offset; + + ret = vhost_iotlb_add_range_ctx(domain->iotlb, start, last, + addr, perm, map_file); + if (ret) { + fput(map_file->file); + kfree(map_file); + return ret; + } + return 0; +} + +static void vduse_iotlb_del_range(struct vduse_iova_domain *domain, + u64 start, u64 last) +{ + struct vdpa_map_file *map_file; + struct vhost_iotlb_map *map; + + while ((map = vhost_iotlb_itree_first(domain->iotlb, start, last))) { + map_file = (struct vdpa_map_file *)map->opaque; + fput(map_file->file); + kfree(map_file); + vhost_iotlb_map_free(domain->iotlb, map); + } +} + +int vduse_domain_set_map(struct vduse_iova_domain *domain, + struct vhost_iotlb *iotlb) +{ + struct vdpa_map_file *map_file; + struct vhost_iotlb_map *map; + u64 start = 0ULL, last = ULLONG_MAX; + int ret; + + spin_lock(&domain->iotlb_lock); + vduse_iotlb_del_range(domain, start, last); + + for (map = vhost_iotlb_itree_first(iotlb, start, last); map; + map = vhost_iotlb_itree_next(map, start, last)) { + map_file = (struct vdpa_map_file *)map->opaque; + ret = vduse_iotlb_add_range(domain, map->start, map->last, + map->addr, map->perm, + map_file->file, + map_file->offset); + if (ret) + goto err; + } + spin_unlock(&domain->iotlb_lock); + + return 0; +err: + vduse_iotlb_del_range(domain, start, last); + spin_unlock(&domain->iotlb_lock); + return ret; +} + +static void vduse_domain_map_bounce_page(struct vduse_iova_domain *domain, + u64 iova, u64 size, u64 paddr) +{ + struct vduse_bounce_map *map; + unsigned int index; + u64 last = iova + size - 1; + + while (iova < last) { + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + index = offset_in_page(iova) >> IOVA_ALLOC_ORDER; + map->orig_phys[index] = paddr; + paddr += IOVA_ALLOC_SIZE; + iova += IOVA_ALLOC_SIZE; + } +} + +static void vduse_domain_unmap_bounce_page(struct vduse_iova_domain *domain, + u64 iova, u64 size) +{ + struct vduse_bounce_map *map; + unsigned int index; + u64 last = iova + size - 1; + + while (iova < last) { + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + index = offset_in_page(iova) >> IOVA_ALLOC_ORDER; + map->orig_phys[index] = INVALID_PHYS_ADDR; + iova += IOVA_ALLOC_SIZE; + } +} + +static void do_bounce(phys_addr_t orig, void *addr, size_t size, + enum dma_data_direction dir) +{ + unsigned long pfn = PFN_DOWN(orig); + + if (PageHighMem(pfn_to_page(pfn))) { + unsigned int offset = offset_in_page(orig); + char *buffer; + unsigned int sz = 0; + + while (size) { + sz = min_t(size_t, PAGE_SIZE - offset, size); + + buffer = kmap_atomic(pfn_to_page(pfn)); + if (dir == DMA_TO_DEVICE) + memcpy(addr, buffer + offset, sz); + else + memcpy(buffer + offset, addr, sz); + kunmap_atomic(buffer); + + size -= sz; + pfn++; + addr += sz; + offset = 0; + } + } else if (dir == DMA_TO_DEVICE) { + memcpy(addr, phys_to_virt(orig), size); + } else { + memcpy(phys_to_virt(orig), addr, size); + } +} + +static void vduse_domain_bounce(struct vduse_iova_domain *domain, + dma_addr_t iova, size_t size, + enum dma_data_direction dir) +{ + struct vduse_bounce_map *map; + unsigned int index, offset; + void *addr; + size_t sz; + + while (size) { + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + offset = offset_in_page(iova); + sz = min_t(size_t, IOVA_ALLOC_SIZE, size); + + if (map->bounce_page && + map->orig_phys[index] != INVALID_PHYS_ADDR) { + addr = page_address(map->bounce_page) + offset; + index = offset >> IOVA_ALLOC_ORDER; + do_bounce(map->orig_phys[index], addr, sz, dir); + } + size -= sz; + iova += sz; + } +} + +static struct page * +vduse_domain_get_mapping_page(struct vduse_iova_domain *domain, u64 iova) +{ + u64 start = iova & PAGE_MASK; + u64 last = start + PAGE_SIZE - 1; + struct vhost_iotlb_map *map; + struct page *page = NULL; + + spin_lock(&domain->iotlb_lock); + map = vhost_iotlb_itree_first(domain->iotlb, start, last); + if (!map) + goto out; + + page = pfn_to_page((map->addr + iova - map->start) >> PAGE_SHIFT); + get_page(page); +out: + spin_unlock(&domain->iotlb_lock); + + return page; +} + +static struct page * +vduse_domain_alloc_bounce_page(struct vduse_iova_domain *domain, u64 iova) +{ + u64 start = iova & PAGE_MASK; + struct page *page = alloc_page(GFP_KERNEL); + struct vduse_bounce_map *map; + + if (!page) + return NULL; + + spin_lock(&domain->iotlb_lock); + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + if (map->bounce_page) { + __free_page(page); + goto out; + } + map->bounce_page = page; + + /* paired with vduse_domain_map_page() */ + smp_mb(); + + vduse_domain_bounce(domain, start, PAGE_SIZE, DMA_TO_DEVICE); +out: + get_page(map->bounce_page); + spin_unlock(&domain->iotlb_lock); + + return map->bounce_page; +} + +static void +vduse_domain_free_bounce_pages(struct vduse_iova_domain *domain) +{ + struct vduse_bounce_map *map; + unsigned long i, pfn, bounce_pfns; + + bounce_pfns = domain->bounce_size >> PAGE_SHIFT; + + for (pfn = 0; pfn < bounce_pfns; pfn++) { + map = &domain->bounce_maps[pfn]; + for (i = 0; i < IOVA_MAPS_PER_PAGE; i++) { + if (WARN_ON(map->orig_phys[i] != INVALID_PHYS_ADDR)) + continue; + } + if (!map->bounce_page) + continue; + + __free_page(map->bounce_page); + map->bounce_page = NULL; + } +} + +void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain) +{ + if (!domain->bounce_map) + return; + + spin_lock(&domain->iotlb_lock); + if (!domain->bounce_map) + goto unlock; + + vduse_iotlb_del_range(domain, 0, domain->bounce_size - 1); + domain->bounce_map = 0; + vduse_domain_free_bounce_pages(domain); +unlock: + spin_unlock(&domain->iotlb_lock); +} + +static int vduse_domain_init_bounce_map(struct vduse_iova_domain *domain) +{ + int ret; + + if (domain->bounce_map) + return 0; + + spin_lock(&domain->iotlb_lock); + if (domain->bounce_map) + goto unlock; + + ret = vduse_iotlb_add_range(domain, 0, domain->bounce_size - 1, + 0, VHOST_MAP_RW, domain->file, 0); + if (!ret) + domain->bounce_map = 1; +unlock: + spin_unlock(&domain->iotlb_lock); + return ret; +} + +static dma_addr_t +vduse_domain_alloc_iova(struct iova_domain *iovad, + unsigned long size, unsigned long limit) +{ + unsigned long shift = iova_shift(iovad); + unsigned long iova_len = iova_align(iovad, size) >> shift; + unsigned long iova_pfn; + + if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) + iova_len = roundup_pow_of_two(iova_len); + iova_pfn = alloc_iova_fast(iovad, iova_len, limit >> shift, true); + + return iova_pfn << shift; +} + +static void vduse_domain_free_iova(struct iova_domain *iovad, + dma_addr_t iova, size_t size) +{ + unsigned long shift = iova_shift(iovad); + unsigned long iova_len = iova_align(iovad, size) >> shift; + + free_iova_fast(iovad, iova >> shift, iova_len); +} + +dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, + struct page *page, unsigned long offset, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct iova_domain *iovad = &domain->stream_iovad; + unsigned long limit = domain->bounce_size - 1; + phys_addr_t pa = page_to_phys(page) + offset; + dma_addr_t iova = vduse_domain_alloc_iova(iovad, size, limit); + + if (!iova) + return DMA_MAPPING_ERROR; + + if (vduse_domain_init_bounce_map(domain)) { + vduse_domain_free_iova(iovad, iova, size); + return DMA_MAPPING_ERROR; + } + + vduse_domain_map_bounce_page(domain, (u64)iova, (u64)size, pa); + + /* paired with vduse_domain_alloc_bounce_page() */ + smp_mb(); + + if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) + vduse_domain_bounce(domain, iova, size, DMA_TO_DEVICE); + + return iova; +} + +void vduse_domain_unmap_page(struct vduse_iova_domain *domain, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + struct iova_domain *iovad = &domain->stream_iovad; + + if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) + vduse_domain_bounce(domain, dma_addr, size, DMA_FROM_DEVICE); + + vduse_domain_unmap_bounce_page(domain, (u64)dma_addr, (u64)size); + vduse_domain_free_iova(iovad, dma_addr, size); +} + +void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, + size_t size, dma_addr_t *dma_addr, + gfp_t flag, unsigned long attrs) +{ + struct iova_domain *iovad = &domain->consistent_iovad; + unsigned long limit = domain->iova_limit; + dma_addr_t iova = vduse_domain_alloc_iova(iovad, size, limit); + void *orig = alloc_pages_exact(size, flag); + + if (!iova || !orig) + goto err; + + spin_lock(&domain->iotlb_lock); + if (vduse_iotlb_add_range(domain, (u64)iova, (u64)iova + size - 1, + virt_to_phys(orig), VHOST_MAP_RW, + domain->file, (u64)iova)) { + spin_unlock(&domain->iotlb_lock); + goto err; + } + spin_unlock(&domain->iotlb_lock); + + *dma_addr = iova; + + return orig; +err: + *dma_addr = DMA_MAPPING_ERROR; + if (orig) + free_pages_exact(orig, size); + if (iova) + vduse_domain_free_iova(iovad, iova, size); + + return NULL; +} + +void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs) +{ + struct iova_domain *iovad = &domain->consistent_iovad; + struct vhost_iotlb_map *map; + struct vdpa_map_file *map_file; + phys_addr_t pa; + + spin_lock(&domain->iotlb_lock); + map = vhost_iotlb_itree_first(domain->iotlb, (u64)dma_addr, + (u64)dma_addr + size - 1); + if (WARN_ON(!map)) { + spin_unlock(&domain->iotlb_lock); + return; + } + map_file = (struct vdpa_map_file *)map->opaque; + fput(map_file->file); + kfree(map_file); + pa = map->addr; + vhost_iotlb_map_free(domain->iotlb, map); + spin_unlock(&domain->iotlb_lock); + + vduse_domain_free_iova(iovad, dma_addr, size); + free_pages_exact(phys_to_virt(pa), size); +} + +static vm_fault_t vduse_domain_mmap_fault(struct vm_fault *vmf) +{ + struct vduse_iova_domain *domain = vmf->vma->vm_private_data; + unsigned long iova = vmf->pgoff << PAGE_SHIFT; + struct page *page; + + if (!domain) + return VM_FAULT_SIGBUS; + + if (iova < domain->bounce_size) + page = vduse_domain_alloc_bounce_page(domain, iova); + else + page = vduse_domain_get_mapping_page(domain, iova); + + if (!page) + return VM_FAULT_SIGBUS; + + vmf->page = page; + + return 0; +} + +static const struct vm_operations_struct vduse_domain_mmap_ops = { + .fault = vduse_domain_mmap_fault, +}; + +static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct vduse_iova_domain *domain = file->private_data; + + vma->vm_flags |= VM_DONTDUMP | VM_DONTEXPAND; + vma->vm_private_data = domain; + vma->vm_ops = &vduse_domain_mmap_ops; + + return 0; +} + +static int vduse_domain_release(struct inode *inode, struct file *file) +{ + struct vduse_iova_domain *domain = file->private_data; + + vduse_domain_reset_bounce_map(domain); + put_iova_domain(&domain->stream_iovad); + put_iova_domain(&domain->consistent_iovad); + vhost_iotlb_free(domain->iotlb); + vfree(domain->bounce_maps); + kfree(domain); + + return 0; +} + +static const struct file_operations vduse_domain_fops = { + .mmap = vduse_domain_mmap, + .release = vduse_domain_release, +}; + +void vduse_domain_destroy(struct vduse_iova_domain *domain) +{ + fput(domain->file); +} + +struct vduse_iova_domain * +vduse_domain_create(unsigned long iova_limit, size_t bounce_size) +{ + struct vduse_iova_domain *domain; + struct file *file; + struct vduse_bounce_map *map; + unsigned long i, pfn, bounce_pfns; + + bounce_pfns = PAGE_ALIGN(bounce_size) >> PAGE_SHIFT; + if (iova_limit <= bounce_size) + return NULL; + + domain = kzalloc(sizeof(*domain), GFP_KERNEL); + if (!domain) + return NULL; + + domain->iotlb = vhost_iotlb_alloc(0, 0); + if (!domain->iotlb) + goto err_iotlb; + + domain->iova_limit = iova_limit; + domain->bounce_size = PAGE_ALIGN(bounce_size); + domain->bounce_maps = vzalloc(bounce_pfns * + sizeof(struct vduse_bounce_map)); + if (!domain->bounce_maps) + goto err_map; + + for (pfn = 0; pfn < bounce_pfns; pfn++) { + map = &domain->bounce_maps[pfn]; + for (i = 0; i < IOVA_MAPS_PER_PAGE; i++) + map->orig_phys[i] = INVALID_PHYS_ADDR; + } + file = anon_inode_getfile("[vduse-domain]", &vduse_domain_fops, + domain, O_RDWR); + if (IS_ERR(file)) + goto err_file; + + domain->file = file; + spin_lock_init(&domain->iotlb_lock); + init_iova_domain(&domain->stream_iovad, + IOVA_ALLOC_SIZE, IOVA_START_PFN); + init_iova_domain(&domain->consistent_iovad, + PAGE_SIZE, bounce_pfns); + + return domain; +err_file: + vfree(domain->bounce_maps); +err_map: + vhost_iotlb_free(domain->iotlb); +err_iotlb: + kfree(domain); + return NULL; +} + +int vduse_domain_init(void) +{ + return iova_cache_get(); +} + +void vduse_domain_exit(void) +{ + iova_cache_put(); +} diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h new file mode 100644 index 000000000000..faeeedfaa786 --- /dev/null +++ b/drivers/vdpa/vdpa_user/iova_domain.h @@ -0,0 +1,75 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * MMU-based IOMMU implementation + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#ifndef _VDUSE_IOVA_DOMAIN_H +#define _VDUSE_IOVA_DOMAIN_H + +#include +#include +#include + +#define IOVA_START_PFN 1 + +#define IOVA_ALLOC_ORDER 12 +#define IOVA_ALLOC_SIZE (1 << IOVA_ALLOC_ORDER) + +#define IOVA_MAPS_PER_PAGE (1 << (PAGE_SHIFT - IOVA_ALLOC_ORDER)) + +#define INVALID_PHYS_ADDR (~(phys_addr_t)0) + +struct vduse_bounce_map { + struct page *bounce_page; + u64 orig_phys[IOVA_MAPS_PER_PAGE]; +}; + +struct vduse_iova_domain { + struct iova_domain stream_iovad; + struct iova_domain consistent_iovad; + struct vduse_bounce_map *bounce_maps; + size_t bounce_size; + unsigned long iova_limit; + int bounce_map; + struct vhost_iotlb *iotlb; + spinlock_t iotlb_lock; + struct file *file; +}; + +int vduse_domain_set_map(struct vduse_iova_domain *domain, + struct vhost_iotlb *iotlb); + +dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, + struct page *page, unsigned long offset, + size_t size, enum dma_data_direction dir, + unsigned long attrs); + +void vduse_domain_unmap_page(struct vduse_iova_domain *domain, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs); + +void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, + size_t size, dma_addr_t *dma_addr, + gfp_t flag, unsigned long attrs); + +void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs); + +void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); + +void vduse_domain_destroy(struct vduse_iova_domain *domain); + +struct vduse_iova_domain *vduse_domain_create(unsigned long iova_limit, + size_t bounce_size); + +int vduse_domain_init(void); + +void vduse_domain_exit(void); + +#endif /* _VDUSE_IOVA_DOMAIN_H */ From patchwork Mon Mar 15 05:37:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138365 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04006C4361A for ; Mon, 15 Mar 2021 05:39:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E5F4964E1F for ; Mon, 15 Mar 2021 05:39:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230308AbhCOFiv (ORCPT ); Mon, 15 Mar 2021 01:38:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230176AbhCOFiY (ORCPT ); Mon, 15 Mar 2021 01:38:24 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15A01C061762 for ; Sun, 14 Mar 2021 22:38:24 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id z5so14738004plg.3 for ; Sun, 14 Mar 2021 22:38:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EWuoiCThygvJIGEuOtgCvGMC6LvtHvsWOzsVip7oxMc=; b=wt1ZkLgetjwQJC/2ajAeUWwA3umj7El9HlKMsHMYrCcMX19pLyUv3sxd0/cXajqRC/ f1ss/tOO+2Z7j65l69fpFX2GevvAZdAK2lHB0D5G1JZ79m/ISuvtk9CPUZOUGRL+CdHt 9mRqC8xYkc+YxGnmYQwgOWzZbonku8j6wgtP3EqaXWmgYGT9Ibnft5k3lnEZ06N/gIHK Koc9hAn+aMmF1kZ6DkndjWQqWd8MI65AG0fsKkaW8A5Dyyugy7ImuRit66iBjC+8lvRk 8XJBLX562v3UaAgpb/vvPBFcrvVhtWW8Kg4qD0ICNR4NJ3YFdSfdcqVVonAPoLdwYS2m Vc9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EWuoiCThygvJIGEuOtgCvGMC6LvtHvsWOzsVip7oxMc=; b=A+2cyP13pmAZD3/y32/gT14Ahq9R+ksFsrkDDymw1WseRym0s0AHkssVIHpE/IoqvC V9NXdldLeYMNANsNFDHC8Ka7aTScBtl9MbY0ZAGz5kOs2mQcFGscWKbqHCpO29w6I1sF /zb4J7b2w+5jmdtz7cZk2tX9iTfXymYiS7pKhgYnXiubVBuvsg0qob/rgg6hIi1QIear gBbIBc2nsm4cGjWPCcBYywZb/LZ/jFyO6JFKM//lNQJTEDY7X8TiVP1bNdTuz6Ov3f2C UH37yvWhTIu3l4QKQr7kdWxcJtCBBXAQvsGWlgBHWCffUdJZ+t3eUdqOQwXYAL9WMi3e M+aQ== X-Gm-Message-State: AOAM531GkvN1OfRheb+KvBuC+/3omidaVZZA++PN4V3MokIHCOPd8iVk 4p2TaPR+/4SR01qsOPy+YKij X-Google-Smtp-Source: ABdhPJw1YeQmwoeN1r+ftsNft8k9qICm1oCYysKJ0tzVrl2WNxSzzqwcRiXLrAow6rmXcCNgP7fVvw== X-Received: by 2002:a17:90b:804:: with SMTP id bk4mr10905565pjb.25.1615786703485; Sun, 14 Mar 2021 22:38:23 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id t5sm11568411pgl.89.2021.03.14.22.38.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:23 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 09/11] vduse: Introduce VDUSE - vDPA Device in Userspace Date: Mon, 15 Mar 2021 13:37:19 +0800 Message-Id: <20210315053721.189-10-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This VDUSE driver enables implementing vDPA devices in userspace. Both control path and data path of vDPA devices will be able to be handled in userspace. In the control path, the VDUSE driver will make use of message mechnism to forward the config operation from vdpa bus driver to userspace. Userspace can use read()/write() to receive/reply those control messages. In the data path, userspace can use mmap() to access vDPA device's iova regions obtained through VDUSE_IOTLB_GET_ENTRY ioctl. Besides, userspace can use ioctl() to inject interrupt and use the eventfd mechanism to receive virtqueue kicks. Signed-off-by: Xie Yongji --- Documentation/userspace-api/ioctl/ioctl-number.rst | 1 + drivers/vdpa/Kconfig | 10 + drivers/vdpa/Makefile | 1 + drivers/vdpa/vdpa_user/Makefile | 5 + drivers/vdpa/vdpa_user/vduse_dev.c | 1281 ++++++++++++++++++++ include/uapi/linux/vduse.h | 153 +++ 6 files changed, 1451 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index a4c75a28c839..71722e6f8f23 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -300,6 +300,7 @@ Code Seq# Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index a245809c99d0..77a1da522c21 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -25,6 +25,16 @@ config VDPA_SIM_NET help vDPA networking device simulator which loops TX traffic back to RX. +config VDPA_USER + tristate "VDUSE (vDPA Device in Userspace) support" + depends on EVENTFD && MMU && HAS_DMA + select DMA_OPS + select VHOST_IOTLB + select IOMMU_IOVA + help + With VDUSE it is possible to emulate a vDPA Device + in a userspace program. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index 67fe7f3d6943..f02ebed33f19 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ +obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF) += ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA) += virtio_pci/ diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile new file mode 100644 index 000000000000..260e0b26af99 --- /dev/null +++ b/drivers/vdpa/vdpa_user/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +vduse-y := vduse_dev.o iova_domain.o + +obj-$(CONFIG_VDPA_USER) += vduse.o diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c new file mode 100644 index 000000000000..07d0ae92d470 --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -0,0 +1,1281 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +#define DRV_VERSION "1.0" +#define DRV_AUTHOR "Yongji Xie " +#define DRV_DESC "vDPA Device in Userspace" +#define DRV_LICENSE "GPL v2" + +#define VDUSE_DEV_MAX (1U << MINORBITS) + +struct vduse_virtqueue { + u16 index; + bool ready; + spinlock_t kick_lock; + spinlock_t irq_lock; + struct eventfd_ctx *kickfd; + struct vdpa_callback cb; + struct work_struct inject; +}; + +struct vduse_dev; + +struct vduse_vdpa { + struct vdpa_device vdpa; + struct vduse_dev *dev; +}; + +struct vduse_dev { + struct vduse_vdpa *vdev; + struct device dev; + struct cdev cdev; + struct vduse_virtqueue *vqs; + struct vduse_iova_domain *domain; + spinlock_t msg_lock; + atomic64_t msg_unique; + wait_queue_head_t waitq; + struct list_head send_list; + struct list_head recv_list; + struct list_head list; + bool connected; + int minor; + u16 vq_size_max; + u16 vq_num; + u32 vq_align; + u32 device_id; + u32 vendor_id; +}; + +struct vduse_dev_msg { + struct vduse_dev_request req; + struct vduse_dev_response resp; + struct list_head list; + wait_queue_head_t waitq; + bool completed; +}; + +static unsigned long max_bounce_size = (64 * 1024 * 1024); +module_param(max_bounce_size, ulong, 0444); +MODULE_PARM_DESC(max_bounce_size, "Maximum bounce buffer size. (default: 64M)"); + +static unsigned long max_iova_size = (128 * 1024 * 1024); +module_param(max_iova_size, ulong, 0444); +MODULE_PARM_DESC(max_iova_size, "Maximum iova space size (default: 128M)"); + +static DEFINE_MUTEX(vduse_lock); +static LIST_HEAD(vduse_devs); +static DEFINE_IDA(vduse_ida); + +static dev_t vduse_major; +static struct class *vduse_class; +static struct workqueue_struct *vduse_irq_wq; + +static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) +{ + struct vduse_vdpa *vdev = container_of(vdpa, struct vduse_vdpa, vdpa); + + return vdev->dev; +} + +static inline struct vduse_dev *dev_to_vduse(struct device *dev) +{ + struct vdpa_device *vdpa = dev_to_vdpa(dev); + + return vdpa_to_vduse(vdpa); +} + +static struct vduse_dev_msg *vduse_find_msg(struct list_head *head, + uint32_t request_id) +{ + struct vduse_dev_msg *tmp, *msg = NULL; + + list_for_each_entry(tmp, head, list) { + if (tmp->req.request_id == request_id) { + msg = tmp; + list_del(&tmp->list); + break; + } + } + + return msg; +} + +static struct vduse_dev_msg *vduse_dequeue_msg(struct list_head *head) +{ + struct vduse_dev_msg *msg = NULL; + + if (!list_empty(head)) { + msg = list_first_entry(head, struct vduse_dev_msg, list); + list_del(&msg->list); + } + + return msg; +} + +static void vduse_enqueue_msg(struct list_head *head, + struct vduse_dev_msg *msg) +{ + list_add_tail(&msg->list, head); +} + +static int vduse_dev_msg_sync(struct vduse_dev *dev, + struct vduse_dev_msg *msg) +{ + init_waitqueue_head(&msg->waitq); + spin_lock(&dev->msg_lock); + vduse_enqueue_msg(&dev->send_list, msg); + wake_up(&dev->waitq); + spin_unlock(&dev->msg_lock); + wait_event_interruptible(msg->waitq, msg->completed); + spin_lock(&dev->msg_lock); + if (!msg->completed) + list_del(&msg->list); + spin_unlock(&dev->msg_lock); + + return (msg->resp.result == VDUSE_REQUEST_OK) ? 0 : -1; +} + +static u64 vduse_dev_get_features(struct vduse_dev *dev) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_GET_FEATURES; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + + return vduse_dev_msg_sync(dev, &msg) ? 0 : msg.resp.f.features; +} + +static int vduse_dev_set_features(struct vduse_dev *dev, u64 features) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_FEATURES; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.f.features = features; + + return vduse_dev_msg_sync(dev, &msg); +} + +static u8 vduse_dev_get_status(struct vduse_dev *dev) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_GET_STATUS; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + + return vduse_dev_msg_sync(dev, &msg) ? 0 : msg.resp.s.status; +} + +static void vduse_dev_set_status(struct vduse_dev *dev, u8 status) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_STATUS; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.s.status = status; + + vduse_dev_msg_sync(dev, &msg); +} + +static void vduse_dev_get_config(struct vduse_dev *dev, unsigned int offset, + void *buf, unsigned int len) +{ + struct vduse_dev_msg msg = { 0 }; + unsigned int sz; + + while (len) { + sz = min_t(unsigned int, len, sizeof(msg.req.config.data)); + msg.req.type = VDUSE_GET_CONFIG; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.config.offset = offset; + msg.req.config.len = sz; + vduse_dev_msg_sync(dev, &msg); + memcpy(buf, msg.resp.config.data, sz); + buf += sz; + offset += sz; + len -= sz; + } +} + +static void vduse_dev_set_config(struct vduse_dev *dev, unsigned int offset, + const void *buf, unsigned int len) +{ + struct vduse_dev_msg msg = { 0 }; + unsigned int sz; + + while (len) { + sz = min_t(unsigned int, len, sizeof(msg.req.config.data)); + msg.req.type = VDUSE_SET_CONFIG; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.config.offset = offset; + msg.req.config.len = sz; + memcpy(msg.req.config.data, buf, sz); + vduse_dev_msg_sync(dev, &msg); + buf += sz; + offset += sz; + len -= sz; + } +} + +static void vduse_dev_set_vq_num(struct vduse_dev *dev, + struct vduse_virtqueue *vq, u32 num) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_VQ_NUM; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.vq_num.index = vq->index; + msg.req.vq_num.num = num; + + vduse_dev_msg_sync(dev, &msg); +} + +static int vduse_dev_set_vq_addr(struct vduse_dev *dev, + struct vduse_virtqueue *vq, u64 desc_addr, + u64 driver_addr, u64 device_addr) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_VQ_ADDR; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.vq_addr.index = vq->index; + msg.req.vq_addr.desc_addr = desc_addr; + msg.req.vq_addr.driver_addr = driver_addr; + msg.req.vq_addr.device_addr = device_addr; + + return vduse_dev_msg_sync(dev, &msg); +} + +static void vduse_dev_set_vq_ready(struct vduse_dev *dev, + struct vduse_virtqueue *vq, bool ready) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_VQ_READY; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.vq_ready.index = vq->index; + msg.req.vq_ready.ready = ready; + + vduse_dev_msg_sync(dev, &msg); +} + +static bool vduse_dev_get_vq_ready(struct vduse_dev *dev, + struct vduse_virtqueue *vq) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_GET_VQ_READY; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.vq_ready.index = vq->index; + + return vduse_dev_msg_sync(dev, &msg) ? false : msg.resp.vq_ready.ready; +} + +static int vduse_dev_get_vq_state(struct vduse_dev *dev, + struct vduse_virtqueue *vq, + struct vdpa_vq_state *state) +{ + struct vduse_dev_msg msg = { 0 }; + int ret; + + msg.req.type = VDUSE_GET_VQ_STATE; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.vq_state.index = vq->index; + + ret = vduse_dev_msg_sync(dev, &msg); + if (!ret) + state->avail_index = msg.resp.vq_state.avail_idx; + + return ret; +} + +static int vduse_dev_set_vq_state(struct vduse_dev *dev, + struct vduse_virtqueue *vq, + const struct vdpa_vq_state *state) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_VQ_STATE; + msg.req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg.req.vq_state.index = vq->index; + msg.req.vq_state.avail_idx = state->avail_index; + + return vduse_dev_msg_sync(dev, &msg); +} + +static int vduse_dev_update_iotlb(struct vduse_dev *dev, + u64 start, u64 last) +{ + struct vduse_dev_msg *msg; + + if (last < start) + return -EINVAL; + + msg = kzalloc(sizeof(*msg), GFP_ATOMIC); + msg->req.type = VDUSE_UPDATE_IOTLB; + msg->req.request_id = atomic64_fetch_inc(&dev->msg_unique); + msg->req.iova.start = start; + msg->req.iova.last = last; + + return vduse_dev_msg_sync(dev, msg); +} + +static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct file *file = iocb->ki_filp; + struct vduse_dev *dev = file->private_data; + struct vduse_dev_msg *msg; + int size = sizeof(struct vduse_dev_request); + ssize_t ret = 0; + + if (iov_iter_count(to) < size) + return 0; + + spin_lock(&dev->msg_lock); + while (1) { + msg = vduse_dequeue_msg(&dev->send_list); + if (msg) + break; + + ret = -EAGAIN; + if (file->f_flags & O_NONBLOCK) + goto unlock; + + spin_unlock(&dev->msg_lock); + ret = wait_event_interruptible_exclusive(dev->waitq, + !list_empty(&dev->send_list)); + if (ret) + return ret; + + spin_lock(&dev->msg_lock); + } + spin_unlock(&dev->msg_lock); + ret = copy_to_iter(&msg->req, size, to); + spin_lock(&dev->msg_lock); + if (ret != size) { + ret = -EFAULT; + vduse_enqueue_msg(&dev->send_list, msg); + goto unlock; + } + vduse_enqueue_msg(&dev->recv_list, msg); +unlock: + spin_unlock(&dev->msg_lock); + + return ret; +} + +static ssize_t vduse_dev_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct vduse_dev *dev = file->private_data; + struct vduse_dev_response resp; + struct vduse_dev_msg *msg; + size_t ret; + + ret = copy_from_iter(&resp, sizeof(resp), from); + if (ret != sizeof(resp)) + return -EINVAL; + + spin_lock(&dev->msg_lock); + msg = vduse_find_msg(&dev->recv_list, resp.request_id); + if (!msg) { + ret = -EINVAL; + goto unlock; + } + + memcpy(&msg->resp, &resp, sizeof(resp)); + msg->completed = 1; + wake_up(&msg->waitq); +unlock: + spin_unlock(&dev->msg_lock); + + return ret; +} + +static __poll_t vduse_dev_poll(struct file *file, poll_table *wait) +{ + struct vduse_dev *dev = file->private_data; + __poll_t mask = 0; + + poll_wait(file, &dev->waitq, wait); + + if (!list_empty(&dev->send_list)) + mask |= EPOLLIN | EPOLLRDNORM; + + return mask; +} + +static void vduse_dev_reset(struct vduse_dev *dev) +{ + int i; + + vduse_domain_reset_bounce_map(dev->domain); + vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); + + for (i = 0; i < dev->vq_num; i++) { + struct vduse_virtqueue *vq = &dev->vqs[i]; + + spin_lock(&vq->irq_lock); + vq->ready = false; + vq->cb.callback = NULL; + vq->cb.private = NULL; + spin_unlock(&vq->irq_lock); + } +} + +static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx, + u64 desc_area, u64 driver_area, + u64 device_area) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_set_vq_addr(dev, vq, desc_area, + driver_area, device_area); +} + +static void vduse_vdpa_kick_vq(struct vdpa_device *vdpa, u16 idx) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + spin_lock(&vq->kick_lock); + if (vq->ready && vq->kickfd) + eventfd_signal(vq->kickfd, 1); + spin_unlock(&vq->kick_lock); +} + +static void vduse_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 idx, + struct vdpa_callback *cb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + spin_lock(&vq->irq_lock); + vq->cb.callback = cb->callback; + vq->cb.private = cb->private; + spin_unlock(&vq->irq_lock); +} + +static void vduse_vdpa_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vduse_dev_set_vq_num(dev, vq, num); +} + +static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa, + u16 idx, bool ready) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vduse_dev_set_vq_ready(dev, vq, ready); + vq->ready = ready; +} + +static bool vduse_vdpa_get_vq_ready(struct vdpa_device *vdpa, u16 idx) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vq->ready = vduse_dev_get_vq_ready(dev, vq); + + return vq->ready; +} + +static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx, + const struct vdpa_vq_state *state) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_set_vq_state(dev, vq, state); +} + +static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx, + struct vdpa_vq_state *state) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_get_vq_state(dev, vq, state); +} + +static u32 vduse_vdpa_get_vq_align(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vq_align; +} + +static u64 vduse_vdpa_get_features(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return vduse_dev_get_features(dev); +} + +static int vduse_vdpa_set_features(struct vdpa_device *vdpa, u64 features) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + return -EINVAL; + + return vduse_dev_set_features(dev, features); +} + +static void vduse_vdpa_set_config_cb(struct vdpa_device *vdpa, + struct vdpa_callback *cb) +{ + /* We don't support config interrupt */ +} + +static u16 vduse_vdpa_get_vq_num_max(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vq_size_max; +} + +static u32 vduse_vdpa_get_device_id(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->device_id; +} + +static u32 vduse_vdpa_get_vendor_id(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vendor_id; +} + +static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return vduse_dev_get_status(dev); +} + +static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + if (status == 0) + vduse_dev_reset(dev); + + vduse_dev_set_status(dev, status); +} + +static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, + void *buf, unsigned int len) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + vduse_dev_get_config(dev, offset, buf, len); +} + +static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int offset, + const void *buf, unsigned int len) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + vduse_dev_set_config(dev, offset, buf, len); +} + +static int vduse_vdpa_set_map(struct vdpa_device *vdpa, + struct vhost_iotlb *iotlb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + int ret; + + ret = vduse_domain_set_map(dev->domain, iotlb); + vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); + + return ret; +} + +static void vduse_vdpa_free(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + WARN_ON(!list_empty(&dev->send_list)); + WARN_ON(!list_empty(&dev->recv_list)); + dev->vdev = NULL; +} + +static const struct vdpa_config_ops vduse_vdpa_config_ops = { + .set_vq_address = vduse_vdpa_set_vq_address, + .kick_vq = vduse_vdpa_kick_vq, + .set_vq_cb = vduse_vdpa_set_vq_cb, + .set_vq_num = vduse_vdpa_set_vq_num, + .set_vq_ready = vduse_vdpa_set_vq_ready, + .get_vq_ready = vduse_vdpa_get_vq_ready, + .set_vq_state = vduse_vdpa_set_vq_state, + .get_vq_state = vduse_vdpa_get_vq_state, + .get_vq_align = vduse_vdpa_get_vq_align, + .get_features = vduse_vdpa_get_features, + .set_features = vduse_vdpa_set_features, + .set_config_cb = vduse_vdpa_set_config_cb, + .get_vq_num_max = vduse_vdpa_get_vq_num_max, + .get_device_id = vduse_vdpa_get_device_id, + .get_vendor_id = vduse_vdpa_get_vendor_id, + .get_status = vduse_vdpa_get_status, + .set_status = vduse_vdpa_set_status, + .get_config = vduse_vdpa_get_config, + .set_config = vduse_vdpa_set_config, + .set_map = vduse_vdpa_set_map, + .free = vduse_vdpa_free, +}; + +static dma_addr_t vduse_dev_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return vduse_domain_map_page(domain, page, offset, size, dir, attrs); +} + +static void vduse_dev_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); +} + +static void *vduse_dev_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long iova; + void *addr; + + *dma_addr = DMA_MAPPING_ERROR; + addr = vduse_domain_alloc_coherent(domain, size, + (dma_addr_t *)&iova, flag, attrs); + if (!addr) + return NULL; + + *dma_addr = (dma_addr_t)iova; + vduse_dev_update_iotlb(vdev, iova, iova + size - 1); + + return addr; +} + +static void vduse_dev_free_coherent(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long start = (unsigned long)dma_addr; + unsigned long last = start + size - 1; + + vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs); + vduse_dev_update_iotlb(vdev, start, last); +} + +static const struct dma_map_ops vduse_dev_dma_ops = { + .map_page = vduse_dev_map_page, + .unmap_page = vduse_dev_unmap_page, + .alloc = vduse_dev_alloc_coherent, + .free = vduse_dev_free_coherent, +}; + +static unsigned int perm_to_file_flags(u8 perm) +{ + unsigned int flags = 0; + + switch (perm) { + case VDUSE_ACCESS_WO: + flags |= O_WRONLY; + break; + case VDUSE_ACCESS_RO: + flags |= O_RDONLY; + break; + case VDUSE_ACCESS_RW: + flags |= O_RDWR; + break; + default: + WARN(1, "invalidate vhost IOTLB permission\n"); + break; + } + + return flags; +} + +static int vduse_kickfd_setup(struct vduse_dev *dev, + struct vduse_vq_eventfd *eventfd) +{ + struct eventfd_ctx *ctx = NULL; + struct vduse_virtqueue *vq; + + if (eventfd->index >= dev->vq_num) + return -EINVAL; + + vq = &dev->vqs[eventfd->index]; + if (eventfd->fd > 0) { + ctx = eventfd_ctx_fdget(eventfd->fd); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + } else if (eventfd->fd != VDUSE_EVENTFD_DEASSIGN) + return 0; + + spin_lock(&vq->kick_lock); + if (vq->kickfd) + eventfd_ctx_put(vq->kickfd); + vq->kickfd = ctx; + spin_unlock(&vq->kick_lock); + + return 0; +} + +static void vduse_vq_irq_inject(struct work_struct *work) +{ + struct vduse_virtqueue *vq = container_of(work, + struct vduse_virtqueue, inject); + + spin_lock_irq(&vq->irq_lock); + if (vq->ready && vq->cb.callback) + vq->cb.callback(vq->cb.private); + spin_unlock_irq(&vq->irq_lock); +} + +static long vduse_dev_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + struct vduse_dev *dev = file->private_data; + void __user *argp = (void __user *)arg; + int ret; + + switch (cmd) { + case VDUSE_IOTLB_GET_ENTRY: { + struct vduse_iotlb_entry entry; + struct vhost_iotlb_map *map; + struct vdpa_map_file *map_file; + struct vduse_iova_domain *domain = dev->domain; + struct file *f = NULL; + + ret = -EFAULT; + if (copy_from_user(&entry, argp, sizeof(entry))) + break; + + spin_lock(&domain->iotlb_lock); + map = vhost_iotlb_itree_first(domain->iotlb, + entry.start, entry.start + 1); + if (map) { + map_file = (struct vdpa_map_file *)map->opaque; + f = get_file(map_file->file); + entry.offset = map_file->offset; + entry.start = map->start; + entry.last = map->last; + entry.perm = map->perm; + } + spin_unlock(&domain->iotlb_lock); + ret = -EINVAL; + if (!f) + break; + + ret = -EFAULT; + if (copy_to_user(argp, &entry, sizeof(entry))) { + fput(f); + break; + } + ret = receive_fd_user(f, argp, perm_to_file_flags(entry.perm)); + fput(f); + break; + } + case VDUSE_VQ_SETUP_KICKFD: { + struct vduse_vq_eventfd eventfd; + + ret = -EFAULT; + if (copy_from_user(&eventfd, argp, sizeof(eventfd))) + break; + + ret = vduse_kickfd_setup(dev, &eventfd); + break; + } + case VDUSE_INJECT_VQ_IRQ: + ret = -EINVAL; + if (arg >= dev->vq_num) + break; + + ret = 0; + queue_work(vduse_irq_wq, &dev->vqs[arg].inject); + break; + default: + ret = -ENOIOCTLCMD; + break; + } + + return ret; +} + +static int vduse_dev_release(struct inode *inode, struct file *file) +{ + struct vduse_dev *dev = file->private_data; + struct vduse_dev_msg *msg; + int i; + + for (i = 0; i < dev->vq_num; i++) { + struct vduse_virtqueue *vq = &dev->vqs[i]; + + spin_lock(&vq->kick_lock); + if (vq->kickfd) + eventfd_ctx_put(vq->kickfd); + vq->kickfd = NULL; + spin_unlock(&vq->kick_lock); + } + + spin_lock(&dev->msg_lock); + while ((msg = vduse_dequeue_msg(&dev->recv_list))) + vduse_enqueue_msg(&dev->send_list, msg); + spin_unlock(&dev->msg_lock); + + dev->connected = false; + + return 0; +} + +static int vduse_dev_open(struct inode *inode, struct file *file) +{ + struct vduse_dev *dev = container_of(inode->i_cdev, + struct vduse_dev, cdev); + int ret = -EBUSY; + + mutex_lock(&vduse_lock); + if (dev->connected) + goto unlock; + + ret = 0; + dev->connected = true; + file->private_data = dev; +unlock: + mutex_unlock(&vduse_lock); + + return ret; +} + +static const struct file_operations vduse_dev_fops = { + .owner = THIS_MODULE, + .open = vduse_dev_open, + .release = vduse_dev_release, + .read_iter = vduse_dev_read_iter, + .write_iter = vduse_dev_write_iter, + .poll = vduse_dev_poll, + .unlocked_ioctl = vduse_dev_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .llseek = noop_llseek, +}; + +static struct vduse_dev *vduse_dev_create(void) +{ + struct vduse_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL); + + if (!dev) + return NULL; + + spin_lock_init(&dev->msg_lock); + INIT_LIST_HEAD(&dev->send_list); + INIT_LIST_HEAD(&dev->recv_list); + atomic64_set(&dev->msg_unique, 0); + + init_waitqueue_head(&dev->waitq); + + return dev; +} + +static void vduse_dev_destroy(struct vduse_dev *dev) +{ + kfree(dev); +} + +static struct vduse_dev *vduse_find_dev(const char *name) +{ + struct vduse_dev *tmp, *dev = NULL; + + list_for_each_entry(tmp, &vduse_devs, list) { + if (!strcmp(dev_name(&tmp->dev), name)) { + dev = tmp; + break; + } + } + return dev; +} + +static int vduse_destroy_dev(char *name) +{ + struct vduse_dev *dev = vduse_find_dev(name); + + if (!dev) + return -EINVAL; + + if (dev->vdev || dev->connected) + return -EBUSY; + + dev->connected = true; + list_del(&dev->list); + cdev_device_del(&dev->cdev, &dev->dev); + put_device(&dev->dev); + + return 0; +} + +static void vduse_release_dev(struct device *device) +{ + struct vduse_dev *dev = + container_of(device, struct vduse_dev, dev); + + ida_simple_remove(&vduse_ida, dev->minor); + kfree(dev->vqs); + vduse_domain_destroy(dev->domain); + vduse_dev_destroy(dev); + module_put(THIS_MODULE); +} + +static int vduse_create_dev(struct vduse_dev_config *config) +{ + int i, ret = -ENOMEM; + struct vduse_dev *dev; + + if (config->bounce_size > max_bounce_size) + return -EINVAL; + + if (config->bounce_size > max_iova_size) + return -EINVAL; + + if (vduse_find_dev(config->name)) + return -EEXIST; + + dev = vduse_dev_create(); + if (!dev) + return -ENOMEM; + + dev->device_id = config->device_id; + dev->vendor_id = config->vendor_id; + dev->domain = vduse_domain_create(max_iova_size - 1, + config->bounce_size); + if (!dev->domain) + goto err_domain; + + dev->vq_align = config->vq_align; + dev->vq_size_max = config->vq_size_max; + dev->vq_num = config->vq_num; + dev->vqs = kcalloc(dev->vq_num, sizeof(*dev->vqs), GFP_KERNEL); + if (!dev->vqs) + goto err_vqs; + + for (i = 0; i < dev->vq_num; i++) { + dev->vqs[i].index = i; + INIT_WORK(&dev->vqs[i].inject, vduse_vq_irq_inject); + spin_lock_init(&dev->vqs[i].kick_lock); + spin_lock_init(&dev->vqs[i].irq_lock); + } + + ret = ida_simple_get(&vduse_ida, 0, VDUSE_DEV_MAX, GFP_KERNEL); + if (ret < 0) + goto err_ida; + + dev->minor = ret; + device_initialize(&dev->dev); + dev->dev.release = vduse_release_dev; + dev->dev.class = vduse_class; + dev->dev.devt = MKDEV(MAJOR(vduse_major), dev->minor); + ret = dev_set_name(&dev->dev, "%s", config->name); + if (ret) + goto err_name; + + cdev_init(&dev->cdev, &vduse_dev_fops); + dev->cdev.owner = THIS_MODULE; + + ret = cdev_device_add(&dev->cdev, &dev->dev); + if (ret) { + put_device(&dev->dev); + return ret; + } + list_add(&dev->list, &vduse_devs); + __module_get(THIS_MODULE); + + return 0; +err_name: + ida_simple_remove(&vduse_ida, dev->minor); +err_ida: + kfree(dev->vqs); +err_vqs: + vduse_domain_destroy(dev->domain); +err_domain: + vduse_dev_destroy(dev); + return ret; +} + +static long vduse_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + int ret; + void __user *argp = (void __user *)arg; + + mutex_lock(&vduse_lock); + switch (cmd) { + case VDUSE_GET_API_VERSION: + ret = VDUSE_API_VERSION; + break; + case VDUSE_CREATE_DEV: { + struct vduse_dev_config config; + + ret = -EFAULT; + if (copy_from_user(&config, argp, sizeof(config))) + break; + + ret = vduse_create_dev(&config); + break; + } + case VDUSE_DESTROY_DEV: { + char name[VDUSE_NAME_MAX]; + + ret = -EFAULT; + if (copy_from_user(name, argp, VDUSE_NAME_MAX)) + break; + + ret = vduse_destroy_dev(name); + break; + } + default: + ret = -EINVAL; + break; + } + mutex_unlock(&vduse_lock); + + return ret; +} + +static const struct file_operations vduse_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = vduse_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .llseek = noop_llseek, +}; + +static char *vduse_devnode(struct device *dev, umode_t *mode) +{ + return kasprintf(GFP_KERNEL, "vduse/%s", dev_name(dev)); +} + +static struct miscdevice vduse_misc = { + .fops = &vduse_fops, + .minor = MISC_DYNAMIC_MINOR, + .name = "vduse", + .nodename = "vduse/control", +}; + +static void vduse_mgmtdev_release(struct device *dev) +{ +} + +static struct device vduse_mgmtdev = { + .init_name = "vduse", + .release = vduse_mgmtdev_release, +}; + +static struct vdpa_mgmt_dev mgmt_dev; + +static int vduse_dev_add_vdpa(struct vduse_dev *dev, const char *name) +{ + struct vduse_vdpa *vdev = dev->vdev; + int ret; + + if (vdev) + return -EEXIST; + + vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, &dev->dev, + &vduse_vdpa_config_ops, name, true); + if (!vdev) + return -ENOMEM; + + vdev->dev = dev; + vdev->vdpa.dev.dma_mask = &vdev->vdpa.dev.coherent_dma_mask; + ret = dma_set_mask_and_coherent(&vdev->vdpa.dev, DMA_BIT_MASK(64)); + if (ret) + goto err; + + set_dma_ops(&vdev->vdpa.dev, &vduse_dev_dma_ops); + vdev->vdpa.dma_dev = &vdev->vdpa.dev; + vdev->vdpa.mdev = &mgmt_dev; + + ret = _vdpa_register_device(&vdev->vdpa, dev->vq_num); + if (ret) + goto err; + + dev->vdev = vdev; + + return 0; +err: + put_device(&vdev->vdpa.dev); + return ret; +} + +static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name) +{ + struct vduse_dev *dev; + int ret = -EINVAL; + + mutex_lock(&vduse_lock); + dev = vduse_find_dev(name); + if (!dev) + goto unlock; + + ret = vduse_dev_add_vdpa(dev, name); +unlock: + mutex_unlock(&vduse_lock); + + return ret; +} + +static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev) +{ + _vdpa_unregister_device(dev); +} + +static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { + .dev_add = vdpa_dev_add, + .dev_del = vdpa_dev_del, +}; + +static struct virtio_device_id id_table[] = { + { VIRTIO_DEV_ANY_ID, VIRTIO_DEV_ANY_ID }, + { 0 }, +}; + +static struct vdpa_mgmt_dev mgmt_dev = { + .device = &vduse_mgmtdev, + .id_table = id_table, + .ops = &vdpa_dev_mgmtdev_ops, +}; + +static int vduse_mgmtdev_init(void) +{ + int ret; + + ret = device_register(&vduse_mgmtdev); + if (ret) + return ret; + + ret = vdpa_mgmtdev_register(&mgmt_dev); + if (ret) + goto err; + + return 0; +err: + device_unregister(&vduse_mgmtdev); + return ret; +} + +static void vduse_mgmtdev_exit(void) +{ + vdpa_mgmtdev_unregister(&mgmt_dev); + device_unregister(&vduse_mgmtdev); +} + +static int vduse_init(void) +{ + int ret; + + if (max_bounce_size >= max_iova_size) + return -EINVAL; + + ret = misc_register(&vduse_misc); + if (ret) + return ret; + + vduse_class = class_create(THIS_MODULE, "vduse"); + if (IS_ERR(vduse_class)) { + ret = PTR_ERR(vduse_class); + goto err_class; + } + vduse_class->devnode = vduse_devnode; + + ret = alloc_chrdev_region(&vduse_major, 0, VDUSE_DEV_MAX, "vduse"); + if (ret) + goto err_chardev; + + vduse_irq_wq = alloc_workqueue("vduse-irq", + WQ_HIGHPRI | WQ_SYSFS | WQ_UNBOUND, 0); + if (!vduse_irq_wq) + goto err_wq; + + ret = vduse_domain_init(); + if (ret) + goto err_domain; + + ret = vduse_mgmtdev_init(); + if (ret) + goto err_mgmtdev; + + return 0; +err_mgmtdev: + vduse_domain_exit(); +err_domain: + destroy_workqueue(vduse_irq_wq); +err_wq: + unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX); +err_chardev: + class_destroy(vduse_class); +err_class: + misc_deregister(&vduse_misc); + return ret; +} +module_init(vduse_init); + +static void vduse_exit(void) +{ + misc_deregister(&vduse_misc); + class_destroy(vduse_class); + unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX); + destroy_workqueue(vduse_irq_wq); + vduse_domain_exit(); + vduse_mgmtdev_exit(); +} +module_exit(vduse_exit); + +MODULE_VERSION(DRV_VERSION); +MODULE_LICENSE(DRV_LICENSE); +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC); diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h new file mode 100644 index 000000000000..37f7d7059aa8 --- /dev/null +++ b/include/uapi/linux/vduse.h @@ -0,0 +1,153 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_VDUSE_H_ +#define _UAPI_VDUSE_H_ + +#include + +#define VDUSE_API_VERSION 0 + +#define VDUSE_CONFIG_DATA_LEN 256 +#define VDUSE_NAME_MAX 256 + +/* the control messages definition for read/write */ + +enum vduse_req_type { + VDUSE_SET_VQ_NUM, + VDUSE_SET_VQ_ADDR, + VDUSE_SET_VQ_READY, + VDUSE_GET_VQ_READY, + VDUSE_SET_VQ_STATE, + VDUSE_GET_VQ_STATE, + VDUSE_SET_FEATURES, + VDUSE_GET_FEATURES, + VDUSE_SET_STATUS, + VDUSE_GET_STATUS, + VDUSE_SET_CONFIG, + VDUSE_GET_CONFIG, + VDUSE_UPDATE_IOTLB, +}; + +struct vduse_vq_num { + __u32 index; + __u32 num; +}; + +struct vduse_vq_addr { + __u32 index; + __u64 desc_addr; + __u64 driver_addr; + __u64 device_addr; +}; + +struct vduse_vq_ready { + __u32 index; + __u8 ready; +}; + +struct vduse_vq_state { + __u32 index; + __u16 avail_idx; +}; + +struct vduse_dev_config_data { + __u32 offset; + __u32 len; + __u8 data[VDUSE_CONFIG_DATA_LEN]; +}; + +struct vduse_iova_range { + __u64 start; + __u64 last; +}; + +struct vduse_features { + __u64 features; +}; + +struct vduse_status { + __u8 status; +}; + +struct vduse_dev_request { + __u32 type; /* request type */ + __u32 request_id; /* request id */ + __u32 reserved[2]; /* for feature use */ + union { + struct vduse_vq_num vq_num; /* virtqueue num */ + struct vduse_vq_addr vq_addr; /* virtqueue address */ + struct vduse_vq_ready vq_ready; /* virtqueue ready status */ + struct vduse_vq_state vq_state; /* virtqueue state */ + struct vduse_dev_config_data config; /* virtio device config space */ + struct vduse_iova_range iova; /* iova range for updating */ + struct vduse_features f; /* virtio features */ + struct vduse_status s; /* device status */ + __u32 padding[16]; /* padding */ + }; +}; + +struct vduse_dev_response { + __u32 request_id; /* corresponding request id */ +#define VDUSE_REQUEST_OK 0x00 +#define VDUSE_REQUEST_FAILED 0x01 + __u32 result; /* the result of request */ + __u32 reserved[2]; /* for feature use */ + union { + struct vduse_vq_ready vq_ready; /* virtqueue ready status */ + struct vduse_vq_state vq_state; /* virtqueue state */ + struct vduse_dev_config_data config; /* virtio device config space */ + struct vduse_features f; /* virtio features */ + struct vduse_status s; /* device status */ + __u32 padding[16]; /* padding */ + }; +}; + +/* ioctls */ + +struct vduse_dev_config { + char name[VDUSE_NAME_MAX]; /* vduse device name */ + __u32 vendor_id; /* virtio vendor id */ + __u32 device_id; /* virtio device id */ + __u64 bounce_size; /* bounce buffer size for iommu */ + __u16 vq_num; /* the number of virtqueues */ + __u16 vq_size_max; /* the max size of virtqueue */ + __u32 vq_align; /* the allocation alignment of virtqueue's metadata */ +}; + +struct vduse_iotlb_entry { + int fd; +#define VDUSE_ACCESS_RO 0x1 +#define VDUSE_ACCESS_WO 0x2 +#define VDUSE_ACCESS_RW 0x3 + __u8 perm; /* access permission of this range */ + __u64 offset; /* the mmap offset on fd */ + __u64 start; /* start of the IOVA range */ + __u64 last; /* last of the IOVA range */ +}; + +struct vduse_vq_eventfd { + __u32 index; /* virtqueue index */ +#define VDUSE_EVENTFD_DEASSIGN -1 + int fd; /* eventfd, -1 means de-assigning the eventfd */ +}; + +#define VDUSE_BASE 0x81 + +/* Get the version of VDUSE API. This is used for future extension */ +#define VDUSE_GET_API_VERSION _IO(VDUSE_BASE, 0x00) + +/* Create a vduse device which is represented by a char device (/dev/vduse/) */ +#define VDUSE_CREATE_DEV _IOW(VDUSE_BASE, 0x01, struct vduse_dev_config) + +/* Destroy a vduse device. Make sure there are no references to the char device */ +#define VDUSE_DESTROY_DEV _IOW(VDUSE_BASE, 0x02, char[VDUSE_NAME_MAX]) + +/* Get a mmap'able iova region */ +#define VDUSE_IOTLB_GET_ENTRY _IOWR(VDUSE_BASE, 0x03, struct vduse_iotlb_entry) + +/* Setup an eventfd to receive kick for virtqueue */ +#define VDUSE_VQ_SETUP_KICKFD _IOW(VDUSE_BASE, 0x04, struct vduse_vq_eventfd) + +/* Inject an interrupt for specific virtqueue */ +#define VDUSE_INJECT_VQ_IRQ _IO(VDUSE_BASE, 0x05) + +#endif /* _UAPI_VDUSE_H_ */ From patchwork Mon Mar 15 05:37:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138361 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F31FC4160E for ; Mon, 15 Mar 2021 05:39:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 424FC64E37 for ; Mon, 15 Mar 2021 05:39:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230331AbhCOFix (ORCPT ); Mon, 15 Mar 2021 01:38:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230180AbhCOFi1 (ORCPT ); Mon, 15 Mar 2021 01:38:27 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51EB4C061574 for ; Sun, 14 Mar 2021 22:38:27 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id j6-20020a17090adc86b02900cbfe6f2c96so14119241pjv.1 for ; Sun, 14 Mar 2021 22:38:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=P/3zALspLSxFVgxa420xFghjxZDebD6dVQ7y7utAibE=; b=ACAc2f1GmAazLjOqMvvxvkxWPsJsk5l8GH5Gs5zEQgoozUM3itrYVk4Wx3X+k9Ha0Z Zfcd/wBYz+Hy+fKyl1Ypi/qAgRzNhBN0MtSUomXEYb7fqnnGSRY0Z5jckrthOZqGMS0c CEYTgBfHllx/+8Vw04THD5n6xmPdqNFvqvqu8+Dod4nWcae1yx3i9i6w7mhApfdjoSF0 JZVB6JZxwQPA3Fd4YMIBTI2lo/z997gsskQEQPZd3ZVKdsv6muV5n/HpueE9uqa3ua8f yN1ChaYUuXTuQ7h0bq1Eo56PAzOpOLaEkTSqApmb1RH62M9YDfUGk4HE/9dx0xjiYDym 5n4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=P/3zALspLSxFVgxa420xFghjxZDebD6dVQ7y7utAibE=; b=Fq36ngR3l5qbTG2G//1bnfhzr7XSzQVflFO9jW/JoLkO/WGpQESJXkdq0jFKpbZUL0 keNl4ktMQAkDbajfeOHq/uKA5XRr1JJWpwIfISxkd1ISXpHPzo9SHSR+ltufx4hJup/y wE2ciJ+gdA2A0GKXz4BFii3F6QHa0dKPFDbCtwkIqzAWUieIHzRY1lwy+qRUIdCJqOY/ g5WiU99GgOAn0woiCg5esnIstTJ6UYaObTQlj6YV9ipvP674MHXoyLEWHQXfxzeZJ8Dm Whh9bMs/mIL1zuGxivAsAjPj1J4ljkWGbnJLiIQ4MWXPGkjlfVCqEIjsTgLYEguDptuI +CGw== X-Gm-Message-State: AOAM532XYh8RacWhR7UDnYyhwBKAaeo7tIqPZ0DlJLjTTZEo34/V9TYw ZZxVgjx1jAlIUX5pd07WoG3m X-Google-Smtp-Source: ABdhPJy7OEuy5awQEQoSJcJXbxqYr6jjC5aj/lbgmiXvRp3orJRy+soCtvCRBg08Gq4rmHmLxlGXTg== X-Received: by 2002:a17:90b:16cd:: with SMTP id iy13mr4607320pjb.46.1615786706994; Sun, 14 Mar 2021 22:38:26 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id s1sm11783866pfe.151.2021.03.14.22.38.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:26 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 10/11] vduse: Add config interrupt support Date: Mon, 15 Mar 2021 13:37:20 +0800 Message-Id: <20210315053721.189-11-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch introduces a new ioctl VDUSE_INJECT_CONFIG_IRQ to support injecting config interrupt. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_user/vduse_dev.c | 24 +++++++++++++++++++++++- include/uapi/linux/vduse.h | 3 +++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c index 07d0ae92d470..cc12b58bdc09 100644 --- a/drivers/vdpa/vdpa_user/vduse_dev.c +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -64,6 +64,8 @@ struct vduse_dev { struct list_head send_list; struct list_head recv_list; struct list_head list; + struct vdpa_callback config_cb; + spinlock_t irq_lock; bool connected; int minor; u16 vq_size_max; @@ -439,6 +441,11 @@ static void vduse_dev_reset(struct vduse_dev *dev) vduse_domain_reset_bounce_map(dev->domain); vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); + spin_lock(&dev->irq_lock); + dev->config_cb.callback = NULL; + dev->config_cb.private = NULL; + spin_unlock(&dev->irq_lock); + for (i = 0; i < dev->vq_num; i++) { struct vduse_virtqueue *vq = &dev->vqs[i]; @@ -557,7 +564,12 @@ static int vduse_vdpa_set_features(struct vdpa_device *vdpa, u64 features) static void vduse_vdpa_set_config_cb(struct vdpa_device *vdpa, struct vdpa_callback *cb) { - /* We don't support config interrupt */ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + spin_lock(&dev->irq_lock); + dev->config_cb.callback = cb->callback; + dev->config_cb.private = cb->private; + spin_unlock(&dev->irq_lock); } static u16 vduse_vdpa_get_vq_num_max(struct vdpa_device *vdpa) @@ -842,6 +854,15 @@ static long vduse_dev_ioctl(struct file *file, unsigned int cmd, ret = 0; queue_work(vduse_irq_wq, &dev->vqs[arg].inject); break; + case VDUSE_INJECT_CONFIG_IRQ: + ret = -EINVAL; + spin_lock_irq(&dev->irq_lock); + if (dev->config_cb.callback) { + dev->config_cb.callback(dev->config_cb.private); + ret = 0; + } + spin_unlock_irq(&dev->irq_lock); + break; default: ret = -ENOIOCTLCMD; break; @@ -918,6 +939,7 @@ static struct vduse_dev *vduse_dev_create(void) INIT_LIST_HEAD(&dev->send_list); INIT_LIST_HEAD(&dev->recv_list); atomic64_set(&dev->msg_unique, 0); + spin_lock_init(&dev->irq_lock); init_waitqueue_head(&dev->waitq); diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h index 37f7d7059aa8..337e766f5622 100644 --- a/include/uapi/linux/vduse.h +++ b/include/uapi/linux/vduse.h @@ -150,4 +150,7 @@ struct vduse_vq_eventfd { /* Inject an interrupt for specific virtqueue */ #define VDUSE_INJECT_VQ_IRQ _IO(VDUSE_BASE, 0x05) +/* Inject a config interrupt */ +#define VDUSE_INJECT_CONFIG_IRQ _IO(VDUSE_BASE, 0x06) + #endif /* _UAPI_VDUSE_H_ */ From patchwork Mon Mar 15 05:37:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12138363 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7774AC2BA12 for ; Mon, 15 Mar 2021 05:39:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6695A61574 for ; Mon, 15 Mar 2021 05:39:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230359AbhCOFi4 (ORCPT ); Mon, 15 Mar 2021 01:38:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230183AbhCOFib (ORCPT ); Mon, 15 Mar 2021 01:38:31 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6CFCC06175F for ; Sun, 14 Mar 2021 22:38:30 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id s21so8159167pjq.1 for ; Sun, 14 Mar 2021 22:38:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lVG7vtvMbRMpvTShU3eF9/pmfbsuVpTNULn/mM9+dMw=; b=X4Katv61hXzWe/h3bbDQWPAQO+MiETrtLH9iH2sLpZRpl7F2JHyimf/FUJ29UjQADI /LZaO1zGI/9WVRVY+cmFhFmjEnvs2n8N+kRGoWjxIKubAFLqDfndDyq0otr0/G6wkgqO MUEqC3mUUFl1/WGlZfZZjPcsYBQWaGRaKdrB9GtVT6JwDT/7dvtAdM80/OnJza+1X3R+ a5KWUW95sUdB0zU9dOEwluTrVZeA+j13WV62gcbi7yM/Jf5+nma5ZcpAyc/Ig9G8FCwP dIfE/ozUv8SlBm35FchW29E/rQbzfLZPqejj8CooWPgaMzNe4h2es5r4z1Yx+O/nFhXq J9nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lVG7vtvMbRMpvTShU3eF9/pmfbsuVpTNULn/mM9+dMw=; b=m/V6AW1kgvR8GWapn8108pzHnb7zAzxNMLcp7Mk0ch2DNv929mv2drXsdoY+9+Zcpn qEqxWQes7Vh7pp5YCBnVeTzsJm2TMd45pfXvExel5vow6tyHiG0n5x6KcuRIVl6zzvcT 3wxei7y13DCcyjC6mDRehl/VgqEexC/B/zckpC0qiOCwNfzlYuqmt1DCmg9K8PRFlDw4 dsJoVuVUrgVE4yA75PYFW/4X1Zt3EPYEZJcGm072fUdOq0tHf009309k6TTBfRSYjPvn vcWcW6sJvIgHJqG3o6HtASCtaMF029jXfl9Qxiqm1/eRTV03ga+W4/LdqAvKFJ03Xfm9 90dg== X-Gm-Message-State: AOAM531TkMx/IiJQYWLsxBx0TjD6JSCKsYMTVk3G5QhtHCD2/yuOrXtc LwBq9t7K2UPMEMVIAzYRg4TW X-Google-Smtp-Source: ABdhPJy0ZNV143F4ylJty11vzLVRq3Al/lgreCs5sIT9ATazkmI77l1xyiIl9kk/TpuIoUh/8F9Sjg== X-Received: by 2002:a17:902:6bca:b029:e2:c5d6:973e with SMTP id m10-20020a1709026bcab02900e2c5d6973emr9918682plt.40.1615786710484; Sun, 14 Mar 2021 22:38:30 -0700 (PDT) Received: from localhost ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id o18sm8538438pji.10.2021.03.14.22.38.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Mar 2021 22:38:30 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, bob.liu@oracle.com, hch@infradead.org, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v5 11/11] Documentation: Add documentation for VDUSE Date: Mon, 15 Mar 2021 13:37:21 +0800 Message-Id: <20210315053721.189-12-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210315053721.189-1-xieyongji@bytedance.com> References: <20210315053721.189-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org VDUSE (vDPA Device in Userspace) is a framework to support implementing software-emulated vDPA devices in userspace. This document is intended to clarify the VDUSE design and usage. Signed-off-by: Xie Yongji --- Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/vduse.rst | 209 ++++++++++++++++++++++++++++++++++ 2 files changed, 210 insertions(+) create mode 100644 Documentation/userspace-api/vduse.rst diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index acd2cc2a538d..f63119130898 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -24,6 +24,7 @@ place where this information is gathered. ioctl/index iommu media/index + vduse .. only:: subproject and html diff --git a/Documentation/userspace-api/vduse.rst b/Documentation/userspace-api/vduse.rst new file mode 100644 index 000000000000..744a9d3452c1 --- /dev/null +++ b/Documentation/userspace-api/vduse.rst @@ -0,0 +1,209 @@ +================================== +VDUSE - "vDPA Device in Userspace" +================================== + +vDPA (virtio data path acceleration) device is a device that uses a +datapath which complies with the virtio specifications with vendor +specific control path. vDPA devices can be both physically located on +the hardware or emulated by software. VDUSE is a framework that makes it +possible to implement software-emulated vDPA devices in userspace. + +How VDUSE works +------------ +Each userspace vDPA device is created by the VDUSE_CREATE_DEV ioctl on +the character device (/dev/vduse/control). Then a device file with the +specified name (/dev/vduse/$NAME) will appear, which can be used to +implement the userspace vDPA device's control path and data path. + +To implement control path, a message-based communication protocol and some +types of control messages are introduced in the VDUSE framework: + +- VDUSE_SET_VQ_ADDR: Set the vring address of virtqueue. + +- VDUSE_SET_VQ_NUM: Set the size of virtqueue + +- VDUSE_SET_VQ_READY: Set ready status of virtqueue + +- VDUSE_GET_VQ_READY: Get ready status of virtqueue + +- VDUSE_SET_VQ_STATE: Set the state for virtqueue + +- VDUSE_GET_VQ_STATE: Get the state for virtqueue + +- VDUSE_SET_FEATURES: Set virtio features supported by the driver + +- VDUSE_GET_FEATURES: Get virtio features supported by the device + +- VDUSE_SET_STATUS: Set the device status + +- VDUSE_GET_STATUS: Get the device status + +- VDUSE_SET_CONFIG: Write to device specific configuration space + +- VDUSE_GET_CONFIG: Read from device specific configuration space + +- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping in device IOTLB + +Those control messages are mostly based on the vdpa_config_ops in +include/linux/vdpa.h which defines a unified interface to control +different types of vdpa device. Userspace needs to read()/write() +on the VDUSE device file to receive/reply those control messages +from/to VDUSE kernel module as follows: + +.. code-block:: c + + static int vduse_message_handler(int dev_fd) + { + int len; + struct vduse_dev_request req; + struct vduse_dev_response resp; + + len = read(dev_fd, &req, sizeof(req)); + if (len != sizeof(req)) + return -1; + + resp.request_id = req.request_id; + + switch (req.type) { + + /* handle different types of message */ + + } + + len = write(dev_fd, &resp, sizeof(resp)); + if (len != sizeof(resp)) + return -1; + + return 0; + } + +In the data path, vDPA device's iova regions will be mapped into userspace +with the help of VDUSE_IOTLB_GET_ENTRY ioctl on the VDUSE device file: + +- VDUSE_IOTLB_GET_ENTRY: get a mmap'able iova region containing the specified iova. + Userspace can access this iova region by passing corresponding size, offset, perm + and fd to mmap(). For example: + +.. code-block:: c + + static int perm_to_prot(uint8_t perm) + { + int prot = 0; + + switch (perm) { + case VDUSE_ACCESS_WO: + prot |= PROT_WRITE; + break; + case VDUSE_ACCESS_RO: + prot |= PROT_READ; + break; + case VDUSE_ACCESS_RW: + prot |= PROT_READ | PROT_WRITE; + break; + } + + return prot; + } + + static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len) + { + void *addr; + size_t size; + struct vduse_iotlb_entry entry; + + entry.start = iova; + if (ioctl(dev_fd, VDUSE_IOTLB_GET_ENTRY, &entry)) + return NULL; + + size = entry.last - entry.start + 1; + *len = entry.last - iova + 1; + addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED, + entry.fd, entry.offset); + + if (addr == MAP_FAILED) + return NULL; + + /* do something to cache this iova region */ + + return addr + iova - entry.start; + } + +Besides, the following ioctls on the VDUSE device file are provided to support +interrupt injection and setting up eventfd for virtqueue kicks: + +- VDUSE_VQ_SETUP_KICKFD: set the kickfd for virtqueue, this eventfd is used + by VDUSE kernel module to notify userspace to consume the vring. + +- VDUSE_INJECT_VQ_IRQ: inject an interrupt for specific virtqueue + +- VDUSE_INJECT_CONFIG_IRQ: inject a config interrupt + +Register VDUSE device on vDPA bus +--------------------------------- +In order to make the VDUSE device work, administrator needs to use the management +API (netlink) to register it on vDPA bus. Some sample codes are show below: + +.. code-block:: c + + static int netlink_add_vduse(const char *name, int device_id) + { + struct nl_sock *nlsock; + struct nl_msg *msg; + int famid; + + nlsock = nl_socket_alloc(); + if (!nlsock) + return -ENOMEM; + + if (genl_connect(nlsock)) + goto free_sock; + + famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME); + if (famid < 0) + goto close_sock; + + msg = nlmsg_alloc(); + if (!msg) + goto close_sock; + + if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, + VDPA_CMD_DEV_NEW, 0)) + goto nla_put_failure; + + NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name); + NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse"); + NLA_PUT_U32(msg, VDPA_ATTR_DEV_ID, device_id); + + if (nl_send_sync(nlsock, msg)) + goto close_sock; + + nl_close(nlsock); + nl_socket_free(nlsock); + + return 0; + nla_put_failure: + nlmsg_free(msg); + close_sock: + nl_close(nlsock); + free_sock: + nl_socket_free(nlsock); + return -1; + } + +MMU-based IOMMU Driver +---------------------- +VDUSE framework implements an MMU-based on-chip IOMMU driver to support +mapping the kernel DMA buffer into the userspace iova region dynamically. +This is mainly designed for virtio-vdpa case (kernel virtio drivers). + +The basic idea behind this driver is treating MMU (VA->PA) as IOMMU (IOVA->PA). +The driver will set up MMU mapping instead of IOMMU mapping for the DMA transfer +so that the userspace process is able to use its virtual address to access +the DMA buffer in kernel. + +And to avoid security issue, a bounce-buffering mechanism is introduced to +prevent userspace accessing the original buffer directly which may contain other +kernel data. During the mapping, unmapping, the driver will copy the data from +the original buffer to the bounce buffer and back, depending on the direction of +the transfer. And the bounce-buffer addresses will be mapped into the user address +space instead of the original one.