From patchwork Mon May 17 09:55:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261425 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 035F0C433ED for ; Mon, 17 May 2021 09:56:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D43A8611CA for ; Mon, 17 May 2021 09:56:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236149AbhEQJ5S (ORCPT ); Mon, 17 May 2021 05:57:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236089AbhEQJ5Q (ORCPT ); Mon, 17 May 2021 05:57:16 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEA1FC06175F for ; Mon, 17 May 2021 02:56:00 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id v13so2830470ple.9 for ; Mon, 17 May 2021 02:56:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ygd6Y3pMMIE3RUljThmAuE4r52Kqy5h8O2JQV7ru1VE=; b=CfGZ0PIMRYu4HZPQTqyHPr1YCzbRVLXIMfeN4edm7scYV+U9MYChU0jqbjBaSYww8W x34cC9/BMlqozDzEDG8ux9bhvujMR9dQUSals7Y5Q6Z1Ic9ux5H7kh6skw81G6vCSEuj qvAj7zfktNXLCd1rUIgxI87C26tOnzC7+pVGuD1g/0Lv5qn8oiSuf1+7+eWR6D+6V+Us mf1DA4vTKCekvwxQldEXP7DGoACUCXG3tr2GgElhChFw367kHCgwfpQSfxiMYchniS3x 3QikwhZOKWcBciqgNHjjH9ktT79H4PFjZ9sUPr0lm6w4kGIx2J2u98pIhDIqi7lIhPpV gsYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ygd6Y3pMMIE3RUljThmAuE4r52Kqy5h8O2JQV7ru1VE=; b=hcrDARpqcQCHpayGg1MLVLjNxTjghAyjFHPJGMgRGFk76PRhBK9HTub6RhI0zIx9K2 vFeDD41yZUxhPfQSAyZgsnyvvmKnt9puqFNa7+ecTfU/liayRZmAj1HjRtn/INfyo/F7 Ig+tahrvNpWKRPZD12BX04aH6n17FpndlJElXh4KXF4OJL7Kxgu9m2BD1ZLpx0WwBOWo C28qYch0+UDfl2uLGYDIxCXcxfU9mEOQgHwMKp/Sas+Fnn1L9I8K7R2zmUQMUYPWSy3Q aFnpohWwyW62dz1pd/eOmuaA1jY/gfMjH3TQoGYKZDPsZo3TOjOFWo/drUWE1bH92e2R L5Cg== X-Gm-Message-State: AOAM531F3xESWNZMDAM+llwCscX0QrSWzXPDiqthYfG3VhSzaL5UjGv5 LPOPJm3BLx8qrT0SWUQ8DnQz X-Google-Smtp-Source: ABdhPJxcu8VbEVbQ6UidxIbU5FzeWTHg5acjJx2pdfr2qW4l4EQrVjPUuXEDDezt/Znh1VwtXb7+/w== X-Received: by 2002:a17:90a:6c23:: with SMTP id x32mr30960965pjj.228.1621245360342; Mon, 17 May 2021 02:56:00 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id q24sm10233373pjp.6.2021.05.17.02.55.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:00 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 01/12] iova: Export alloc_iova_fast() Date: Mon, 17 May 2021 17:55:02 +0800 Message-Id: <20210517095513.850-2-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Export alloc_iova_fast() so that some modules can use it to improve iova allocation efficiency. Signed-off-by: Xie Yongji --- drivers/iommu/iova.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index e6e2fa85271c..317eef64ffef 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -450,6 +450,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long size, return new_iova->pfn_lo; } +EXPORT_SYMBOL_GPL(alloc_iova_fast); /** * free_iova_fast - free iova pfn range into rcache From patchwork Mon May 17 09:55:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261427 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E98BCC43460 for ; Mon, 17 May 2021 09:56:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CDABA61005 for ; Mon, 17 May 2021 09:56:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236260AbhEQJ5c (ORCPT ); Mon, 17 May 2021 05:57:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236215AbhEQJ5V (ORCPT ); Mon, 17 May 2021 05:57:21 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57E6BC06138B for ; Mon, 17 May 2021 02:56:04 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id m190so4295629pga.2 for ; Mon, 17 May 2021 02:56:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=b2jCEx8mi3vXgkJ9kkatzIPCmNDeaAe/ogyF3BNTwbw=; b=BIAa64R4y9k3Eg3WHj2S666iigdAlamttMFasFKl0/os5LVtgYGUotdFoSEHugW/Eu bPRzaxAW284e8L8yo2K3DDL0/T8tA/l8wgLVMmAmo6+IhFLW/GYToPaLLl355N+97G5L XexcA5aHvM9w2PGt9fWa2k/dVGxtVldmNsYPnqawOun42Yk2znLlV60hSNjDq7IYxrSR /MfnGeT4tFvQVkSyH11nQ65tEhzqJ/jZPaGxLuLbESIf2R6H7QaYs485JdPrsXqE7dOf I4DwWEpiBqfQ6lw4Z+Sbeqdk137WXiyVzbQGXwJPtZ+TTYkfwjFV4U35eCpK8JW0Dmto ErvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=b2jCEx8mi3vXgkJ9kkatzIPCmNDeaAe/ogyF3BNTwbw=; b=ONctkrXPjaTklLYYe3nJ/fvwVuSnYEGRmr16CFUOoAaiC06+l6BeRRzs0/28gr2+Te Ij4p5JzR+6xnOWlO9f+lml7rzY1Xh4FemZ16K7muK4qcpy11yvAaTT2EYYlKXnRKYIb4 7R9WWcZeb1TWTZSSzN5te8yhOma8C+ZAjuwU++6HnybbgnTGHS84LGHgsjrPMC8r21+O w41g4oCwtNJaRZn0PTUwYDaoD4e+qbKdHgg2Q23vn17vLmIHs/oJEH8+Cv1GEOy0yInH GTCLGO5udwkshxhPPI1MXX1Om/gOfow8sTZwoc9l6AabN2lcfAfdD1c9aCp0ceFUE4sV WrnQ== X-Gm-Message-State: AOAM533Bxk4Pe+BiU92xShXPIr7PyhiovTipHeXjsuGADiPNm4tgBp2/ u7rwVddg3Cv3y1n0ah+expVc X-Google-Smtp-Source: ABdhPJxMHtpizSq6PPRichE5BBR0ifuwPYTxOEXD/Wu/CUyxhwu/aKM7AbIbvC3ahRVJo9npK3SCVw== X-Received: by 2002:a63:d45:: with SMTP id 5mr1321436pgn.72.1621245363924; Mon, 17 May 2021 02:56:03 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id gz18sm8486949pjb.19.2021.05.17.02.56.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:03 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 02/12] file: Export receive_fd() to modules Date: Mon, 17 May 2021 17:55:03 +0800 Message-Id: <20210517095513.850-3-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Export receive_fd() so that some modules can use it to pass file descriptor between processes without missing any security stuffs. Signed-off-by: Xie Yongji --- fs/file.c | 6 ++++++ include/linux/file.h | 7 +++---- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/file.c b/fs/file.c index f633348029a5..ef4da2eaf25b 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1135,6 +1135,12 @@ int __receive_fd(int fd, struct file *file, int __user *ufd, unsigned int o_flag return new_fd; } +int receive_fd(struct file *file, unsigned int o_flags) +{ + return __receive_fd(-1, file, NULL, o_flags); +} +EXPORT_SYMBOL_GPL(receive_fd); + static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags) { int err = -EBADF; diff --git a/include/linux/file.h b/include/linux/file.h index 225982792fa2..4667f9567d3e 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -94,6 +94,9 @@ extern void fd_install(unsigned int fd, struct file *file); extern int __receive_fd(int fd, struct file *file, int __user *ufd, unsigned int o_flags); + +extern int receive_fd(struct file *file, unsigned int o_flags); + static inline int receive_fd_user(struct file *file, int __user *ufd, unsigned int o_flags) { @@ -101,10 +104,6 @@ static inline int receive_fd_user(struct file *file, int __user *ufd, return -EFAULT; return __receive_fd(-1, file, ufd, o_flags); } -static inline int receive_fd(struct file *file, unsigned int o_flags) -{ - return __receive_fd(-1, file, NULL, o_flags); -} static inline int receive_fd_replace(int fd, struct file *file, unsigned int o_flags) { return __receive_fd(fd, file, NULL, o_flags); From patchwork Mon May 17 09:55:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261429 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1786EC433ED for ; Mon, 17 May 2021 09:56:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E53C8611CA for ; Mon, 17 May 2021 09:56:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236303AbhEQJ5n (ORCPT ); Mon, 17 May 2021 05:57:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236140AbhEQJ5Z (ORCPT ); Mon, 17 May 2021 05:57:25 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C525BC061344 for ; Mon, 17 May 2021 02:56:08 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id q15so4266785pgg.12 for ; Mon, 17 May 2021 02:56:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=b7apvPbQfTG1FYg2uNsocmvMG/QdC7c4RuzNmrGKxtc=; b=lXRHbHSIsb1KADw2tX2zKstf7SBlVEY4DtJnDMFT3EHKIWMje067JnmjgrkNReUxPn I0z2aIAyJGaL9m4k2OpOD29iaKpNcW4xfR7d14dIuNrPK0hQOvA8kl9rxSPc6f69Mm6p R0F2LeWEZA0PPqOKTYoBW3+9nQSkYxFKNz5WUEn2djE6598CZ00wKdFzTggwdy8opqtd +LicmUYFz5R2JqJ3r3lQ/vDv0gKToNlIKrKEdVnqYs5TGR2EJsU3XPEKupH3aVQAL9FE 3exCZ90aL2Cb7tTVwr9v8iYtnhbog2Qo9fEaKaJOYYh61niFj77C/LAndUBtJYYe86bM nS5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=b7apvPbQfTG1FYg2uNsocmvMG/QdC7c4RuzNmrGKxtc=; b=hdut8cE8FXfhZIMqJvOFrOIvq8wwfnz1j5z0KvYR8HKeJ/0FS8C4XX4T6IvYwkEw2S FlCLgEJ2bt6ZZqQaggnyd+q4AhigVDrtz9PKH+X2JjoY4Z6JxlIYmsH30pDGFQXJlRyY MMYgWdF5SeirNp6g/9skBThGw1cflri68QElasVhSpPrPPVkk9ZWSIvdG4yzkycuEdKA /gmlTKbBC+zGTr3Wkyu4syE+lN109MWrgfgOfysczxadPfDe6PXhcNu+ujdQNBriZWm1 AI8NADUEKmB+hMjOJkrkkNoX7tUYdDYux6q3okzISJhxf/6QxuQFrJePKfqGiJMfj8ag 7n7A== X-Gm-Message-State: AOAM5306/Z6m7I+lZ0PGwPoWQq9xWvxBFA3PQt7ZhyOTbMrbwG6f2Kyk aNoAk7eGX7AJR+e0uU8zWNEP X-Google-Smtp-Source: ABdhPJwO1N2BTDcGIAdEBE1GwKwtKYl5hxzWv7M8ro+LthKkIXN/GYpMnfpVi6rAWoY6mIGZm1TWzA== X-Received: by 2002:a63:741e:: with SMTP id p30mr61016414pgc.68.1621245368362; Mon, 17 May 2021 02:56:08 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id 24sm10063099pgz.77.2021.05.17.02.56.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:07 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 03/12] eventfd: Increase the recursion depth of eventfd_signal() Date: Mon, 17 May 2021 17:55:04 +0800 Message-Id: <20210517095513.850-4-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Increase the recursion depth of eventfd_signal() to 1. This is the maximum recursion depth we have found so far, which can be triggered with the following call chain: kvm_io_bus_write [kvm] --> ioeventfd_write [kvm] --> eventfd_signal [eventfd] --> vhost_poll_wakeup [vhost] --> vduse_vdpa_kick_vq [vduse] --> eventfd_signal [eventfd] Signed-off-by: Xie Yongji Acked-by: Jason Wang --- fs/eventfd.c | 2 +- include/linux/eventfd.h | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index e265b6dd4f34..cc7cd1dbedd3 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -71,7 +71,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) * it returns true, the eventfd_signal() call should be deferred to a * safe context. */ - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) + if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH)) return 0; spin_lock_irqsave(&ctx->wqh.lock, flags); diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index fa0a524baed0..886d99cd38ef 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -29,6 +29,9 @@ #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE) +/* Maximum recursion depth */ +#define EFD_WAKE_DEPTH 1 + struct eventfd_ctx; struct file; @@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count); static inline bool eventfd_signal_count(void) { - return this_cpu_read(eventfd_wake_count); + return this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH; } #else /* CONFIG_EVENTFD */ From patchwork Mon May 17 09:55:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261431 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2DFDC43462 for ; Mon, 17 May 2021 09:56:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A9B2A61005 for ; Mon, 17 May 2021 09:56:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236306AbhEQJ5q (ORCPT ); Mon, 17 May 2021 05:57:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236247AbhEQJ52 (ORCPT ); Mon, 17 May 2021 05:57:28 -0400 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96301C0613ED for ; Mon, 17 May 2021 02:56:12 -0700 (PDT) Received: by mail-pg1-x52e.google.com with SMTP id k15so4266127pgb.10 for ; Mon, 17 May 2021 02:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tx8QMBGpZJX1ew8np0gTCpYyfE19GueKJf7/gIkdvaI=; b=xuePQKnI02Lahx+4Y+JNH3sW8SQG8Ak9dQIdwLXgLchUJt1JVNjIL6rVMKCJfgNgzb Vb/qRjAR9wDRbRbqECJXSfbAViZOzuMI57xitwX91penLlPfZgA7sMJ43Smd4aM5TW08 W+LwxP2K0D1yO52eg/OLk1l5ch7fYVaqmjdJJ9Kw9hOGMhzR4ee650DrK2bmhH+3HJR2 fdA778/MCyyZBHTQGJIxo1Xt9bXg79iwAcMQHgDf2TBv1ll7WWWCv3PfLTDFLYaqBzQK 9VOL17IcPdxDmUjnB6bN0Nng4ug8GZbN4WTz46Yng2r58wITMeBLORKVGF+sp54AnFA+ R36g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tx8QMBGpZJX1ew8np0gTCpYyfE19GueKJf7/gIkdvaI=; b=FSL8jezgNGFx4zoMNp7dnhShof84pdDPpqwPlo7O+DRBkjPjG/pVZ8Ke2/Dcot51oq oejhyCZs6UguYyYo3cKm72ET/Zgz7Uy/kFeZEiwHS3Htm26lxxEC4TORMsieIadIdpjA T1UgMTyqvLUSvusbpkRMNmjkk6KKHKfTWn0LAT3itcH5qKi+XrdTevXWwkIOaowRulZf 178K4IJLMF3agh4HOT+mKwiN2i4Gej2KUSxSbftH2oSxUvBBzTMRYeh3lSpwGi3Zi6WJ Pve1SPON7elEA1jTf83ttjqDbnShQBlXFJ1NI7bJdZ1wpUvftAG1V7dOsdHAJyR9dhsf 9TKQ== X-Gm-Message-State: AOAM530tS8LAS/bKQBpb37dcFfm/9TeiZE04oiBbj4zYvLOT/Uciky1v RqLIVMLQcnjKr95L+NxVVsJ4 X-Google-Smtp-Source: ABdhPJz4jJ0Z7VvRRYZhi1VrL3hoPcZzoDQgK9lGdzLpuPpHEzBKkdKEplH7qnfQz5cB3LlwWpUAjA== X-Received: by 2002:a05:6a00:787:b029:2dd:15b6:515a with SMTP id g7-20020a056a000787b02902dd15b6515amr1691311pfu.26.1621245372172; Mon, 17 May 2021 02:56:12 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id d9sm14749696pjx.41.2021.05.17.02.56.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:11 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 04/12] virtio-blk: Add validation for block size in config space Date: Mon, 17 May 2021 17:55:05 +0800 Message-Id: <20210517095513.850-5-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This ensures that we will not use an invalid block size in config space (might come from an untrusted device). Signed-off-by: Xie Yongji --- drivers/block/virtio_blk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index ebb4d3fe803f..c848aa36d49b 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -826,7 +826,7 @@ static int virtblk_probe(struct virtio_device *vdev) err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE, struct virtio_blk_config, blk_size, &blk_size); - if (!err) + if (!err && blk_size > 0 && blk_size <= max_size) blk_queue_logical_block_size(q, blk_size); else blk_size = queue_logical_block_size(q); From patchwork Mon May 17 09:55:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261433 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43F8FC43460 for ; Mon, 17 May 2021 09:56:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2814A61005 for ; Mon, 17 May 2021 09:56:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236353AbhEQJ5y (ORCPT ); Mon, 17 May 2021 05:57:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236272AbhEQJ5d (ORCPT ); Mon, 17 May 2021 05:57:33 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 629A7C06138C for ; Mon, 17 May 2021 02:56:16 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id ep16-20020a17090ae650b029015d00f578a8so3401252pjb.2 for ; Mon, 17 May 2021 02:56:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=z7/sgPn1niIyplx/MCCBp/NxUC1m49ShQY4FddeicFg=; b=VtwX03C0ISBBYjMp9XrXXlg71h/dLD+Iw0BunstvVMTQaRODx4ebdUoHqalwuhZLrr c6dwQDJogEeD8IEYI5BVoSc848OKx+OElBQio9tTZdwuqKQ0c7pwHtONHQSD07vEDGtY HN+viIpqfyoeqR4A27UylxzodLAKfRlI/f+XLIm363PsM6oqelH6yU9zb92L1sJwUSq7 abxZhuME3e0rEDw0VLQtTiSVtO+eqRv2k1gs0aSRfId3eS75sHzQWNeLdSh2x86GyLUO mNu7bb5U26Dzx7s112VDNFcaoK/enWL+0vWA8YXtLtU2B5IfIOdx+soCEqxWkBtw/wkI SfFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=z7/sgPn1niIyplx/MCCBp/NxUC1m49ShQY4FddeicFg=; b=tFW39mF6N4WrUIwuxQomIaSQXAPstaoyp3kwuCAB7MmBrhH/mXOmx8nLDfnP6Q/RNp QBP3IoVCftcEqi9cyhk7IJ+fv4d9MXfIdlNJxjHYkOY7fPbDdZK05K+KDh7n1aLrRTb8 ZRZCR2tDwoxXBGXDRL7aN4RF+XtaHvbYXRJSsOR+heMC5qHt24mfbpbzH6i2Du29WePE cpc5ITbeminluW/dzQM9RILzuo6oue0UNGiBNtCsEBD9Qa0JRTdqc2WIBGFWo0AWBCKM 7pQ/0t3dnk0Me5vlptxut3SyF1arJCpXcb8foNHaZfW9iCCylJ2ePClzVPZNc0YMzaWA nN0g== X-Gm-Message-State: AOAM532JWo3xXC6yBGzSGFH4sSkjHJAHApeCNRvhWsxcf2O/nr9IAA70 PeQLBCs/H58Pv3h4mkx797WL X-Google-Smtp-Source: ABdhPJxaWd2Kn5/peSswOCwRKMHxtXfg5xbq6uIlNzE6im+gipuz3QFRX9Bwo9Y9e12+QeRiGNwVIQ== X-Received: by 2002:a17:90b:1992:: with SMTP id mv18mr9044020pjb.137.1621245376019; Mon, 17 May 2021 02:56:16 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id s184sm10234381pgc.29.2021.05.17.02.56.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:15 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 05/12] virtio_scsi: Add validation for residual bytes from response Date: Mon, 17 May 2021 17:55:06 +0800 Message-Id: <20210517095513.850-6-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This ensures that the residual bytes in response (might come from an untrusted device) will not exceed the data buffer length. Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/scsi/virtio_scsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index efcaf0674c8d..ad7d8cecec32 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -97,7 +97,7 @@ static inline struct Scsi_Host *virtio_scsi_host(struct virtio_device *vdev) static void virtscsi_compute_resid(struct scsi_cmnd *sc, u32 resid) { if (resid) - scsi_set_resid(sc, resid); + scsi_set_resid(sc, min(resid, scsi_bufflen(sc))); } /* From patchwork Mon May 17 09:55:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261435 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5292CC43460 for ; Mon, 17 May 2021 09:56:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 355F66121F for ; Mon, 17 May 2021 09:56:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236376AbhEQJ6E (ORCPT ); Mon, 17 May 2021 05:58:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236106AbhEQJ5g (ORCPT ); Mon, 17 May 2021 05:57:36 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AD76C061763 for ; Mon, 17 May 2021 02:56:20 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id j6-20020a17090adc86b02900cbfe6f2c96so3400311pjv.1 for ; Mon, 17 May 2021 02:56:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=WWAi/i4+n9XG5D57vjCca2tNocYxaQJOYj5KH2bJ384=; b=1gw758w4orgpYuGg4qlVZ1ufUErcwuFPa6rGxGKNeCcUdBT2DBFUOZgjjjHaru9QgZ 24Uyd+UWCUuLzaw9E42011Bl3nrphZj8tHi+w2dt1N95LK5Wsa8rVPZEyA3wd0tCL/YZ Q0Gb9G8Hrf+Ebn4e//D1+XOMJcJhst7YPhAINXNngxWyGUjrRpi8ZZGq+c6nRSHs9p2E RgSSWnemUMQge32qOTb5HFKnkujL0C/9X+8IUxectIdGr0wzBCe6yu4VdFtNi0Gh/irn lTbzi2UyOj7hY0zngADRRghjHYcPSHE7wF5oJ3y+4sJ65xbDRl6aOnUyYKIut3Yf9iWP tuUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WWAi/i4+n9XG5D57vjCca2tNocYxaQJOYj5KH2bJ384=; b=YlkrRJMSxnYYqiTRDaHKiocnASgPE6JYdK4EbQgx75KTpirMj6aOWz0Ayp4jjvjuZT 1p+ZkE38XymcwxggtKcEES2ngYJCIld2arI1m6+/LaEoBWe3eCnO6IouintbHW0evs/2 CVBiXQzGiN7iIlstqc80och4lzPSVi7JZY8gTF5AwHg0E+bJb8bEB2cxsiI+egSZ9HTq 0eaT27HVE8WpCq3i8/D+fByGlT3+m7Zx5zjKE+i6eEXCUzlCB0GVqprQmdTSX9qHJMJF CCFUZH7e+m3ujDEqxNfG29ajzDwFrYYGUxgyOHui2le/+IFA/57766QiTjspAb9nGRyV OBpw== X-Gm-Message-State: AOAM531DVJa9zAs5aUbm58Tf6tZO/ViV863oEtioL0UCCcGoWnuwhXoV 2wYDkyfBpG0McDBfibFAsvr6 X-Google-Smtp-Source: ABdhPJysOrc8SUm7dy3uINE+H+uKlr2ajo8HgNjhnimd4/k3NX9xtMHO7OR34WrKJCrjN10kcGUv+Q== X-Received: by 2002:a17:90a:4216:: with SMTP id o22mr25979551pjg.3.1621245380091; Mon, 17 May 2021 02:56:20 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id i5sm9478706pfd.159.2021.05.17.02.56.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:19 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 06/12] vhost-iotlb: Add an opaque pointer for vhost IOTLB Date: Mon, 17 May 2021 17:55:07 +0800 Message-Id: <20210517095513.850-7-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add an opaque pointer for vhost IOTLB. And introduce vhost_iotlb_add_range_ctx() to accept it. Suggested-by: Jason Wang Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/vhost/iotlb.c | 20 ++++++++++++++++---- include/linux/vhost_iotlb.h | 3 +++ 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c index 0fd3f87e913c..5c99e1112cbb 100644 --- a/drivers/vhost/iotlb.c +++ b/drivers/vhost/iotlb.c @@ -36,19 +36,21 @@ void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, EXPORT_SYMBOL_GPL(vhost_iotlb_map_free); /** - * vhost_iotlb_add_range - add a new range to vhost IOTLB + * vhost_iotlb_add_range_ctx - add a new range to vhost IOTLB * @iotlb: the IOTLB * @start: start of the IOVA range * @last: last of IOVA range * @addr: the address that is mapped to @start * @perm: access permission of this range + * @opaque: the opaque pointer for the new mapping * * Returns an error last is smaller than start or memory allocation * fails */ -int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, - u64 start, u64 last, - u64 addr, unsigned int perm) +int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, + u64 start, u64 last, + u64 addr, unsigned int perm, + void *opaque) { struct vhost_iotlb_map *map; @@ -71,6 +73,7 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, map->last = last; map->addr = addr; map->perm = perm; + map->opaque = opaque; iotlb->nmaps++; vhost_iotlb_itree_insert(map, &iotlb->root); @@ -80,6 +83,15 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, return 0; } +EXPORT_SYMBOL_GPL(vhost_iotlb_add_range_ctx); + +int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, + u64 start, u64 last, + u64 addr, unsigned int perm) +{ + return vhost_iotlb_add_range_ctx(iotlb, start, last, + addr, perm, NULL); +} EXPORT_SYMBOL_GPL(vhost_iotlb_add_range); /** diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h index 6b09b786a762..2d0e2f52f938 100644 --- a/include/linux/vhost_iotlb.h +++ b/include/linux/vhost_iotlb.h @@ -17,6 +17,7 @@ struct vhost_iotlb_map { u32 perm; u32 flags_padding; u64 __subtree_last; + void *opaque; }; #define VHOST_IOTLB_FLAG_RETIRE 0x1 @@ -29,6 +30,8 @@ struct vhost_iotlb { unsigned int flags; }; +int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, u64 start, u64 last, + u64 addr, unsigned int perm, void *opaque); int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last, u64 addr, unsigned int perm); void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last); From patchwork Mon May 17 09:55:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261437 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F04B0C433B4 for ; Mon, 17 May 2021 09:56:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D575A611CD for ; Mon, 17 May 2021 09:56:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236390AbhEQJ6N (ORCPT ); Mon, 17 May 2021 05:58:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236230AbhEQJ5r (ORCPT ); Mon, 17 May 2021 05:57:47 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CB3BC06134A for ; Mon, 17 May 2021 02:56:24 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id 69so2841040plc.5 for ; Mon, 17 May 2021 02:56:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9oY/2Hu7faZQbXhD/6Y+q2bi+BTQyO9MF7tf5wnhwEM=; b=Zr7zQjnhmh2FAhUa9rmPnf5ODWeo9mfG0g8KYDfYo1vHHSW+0b1J74HuKjmzIOOBU4 MqfuWNJ1bTzRiEZ1V3TwZrUhqAfJBEMFzZPA9DSFrCTvBIZkGSRjWb2q/e0BBjLqSppa 5Zce5PPRF5ac7ZtU+itBZvGpFs/wprOb0sNTwDzl36go+fm7cAhNe/wxVf2JhKtXj56r yiGFWZz6FKe2rZui0JOCNL2+Galoh1OmD2yJcFOftpyopKehjJAOUCgIYVlNApckbMAJ uphq7nGhHp59OmNObnFyBmuPhvecdcz3QJcwIy3Jmcl/rfSSA9IjJwoBC78tMXumW1v9 Qa/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9oY/2Hu7faZQbXhD/6Y+q2bi+BTQyO9MF7tf5wnhwEM=; b=RZNqLUDvF2YNWkgEN59rlHtWBRNZcT5/0gR5nQk8aCc7BJ2Ou0Fr5gR/VhN/uZ/WeX BSV/O3gtbiY0l7GYZ4DRKh6VBr0ik6lNW37vLUHWx84cx/ucCTvJZRhqzYIKEAmBvLGh 2GNdLsAPU4s8y///jcK/Acl+8pnmk+JSlgbGVh+DQB+ZM/bJEcQVLGsBs8Tmiq6jZxTZ 68Hsj0z5ZTZjZv5QiY9kEDiMdSr7fQq+SZrWhVyWPTq6s1JBVeUN0UewlPSwIOnWLF1/ JKk5DoluA1p4awtpoNiuJdSTRIJO7lrOJQlXplMWBY7VHfBchK+UpQ1BR+QgZyMHv/wR Uz/g== X-Gm-Message-State: AOAM532VpDcpI0o1ctIBVeUUBF836T00ydzHA6Gg5/j1iputgFKE93kY H4VW9luW/R9sPdEDJkxKOe6x X-Google-Smtp-Source: ABdhPJyr0oLXUpe02GaMYvuu9U2bSdT6Fnn+ZvfM/JYWqVCyuitHZoVVty5uBBvPClQSteFkfpVjOQ== X-Received: by 2002:a17:902:904a:b029:ef:9058:7314 with SMTP id w10-20020a170902904ab02900ef90587314mr25640485plz.12.1621245384212; Mon, 17 May 2021 02:56:24 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id m6sm6616904pfc.133.2021.05.17.02.56.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:23 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 07/12] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() Date: Mon, 17 May 2021 17:55:08 +0800 Message-Id: <20210517095513.850-8-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add an opaque pointer for DMA mapping. Suggested-by: Jason Wang Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 +++--- drivers/vhost/vdpa.c | 2 +- include/linux/vdpa.h | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 98f793bc9376..efd0cb3d964d 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -542,14 +542,14 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, } static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size, - u64 pa, u32 perm) + u64 pa, u32 perm, void *opaque) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); int ret; spin_lock(&vdpasim->iommu_lock); - ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa, - perm); + ret = vhost_iotlb_add_range_ctx(vdpasim->iommu, iova, iova + size - 1, + pa, perm, opaque); spin_unlock(&vdpasim->iommu_lock); return ret; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index a1496c596120..1dd8e07554f8 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -565,7 +565,7 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, return r; if (ops->dma_map) { - r = ops->dma_map(vdpa, iova, size, pa, perm); + r = ops->dma_map(vdpa, iova, size, pa, perm, NULL); } else if (ops->set_map) { if (!v->in_batch) r = ops->set_map(vdpa, dev->iotlb); diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index f311d227aa1b..281f768cb597 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -245,7 +245,7 @@ struct vdpa_config_ops { /* DMA ops */ int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb); int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size, - u64 pa, u32 perm); + u64 pa, u32 perm, void *opaque); int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size); /* Free device resources */ From patchwork Mon May 17 09:55:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261439 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12239C43462 for ; Mon, 17 May 2021 09:57:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EF47861005 for ; Mon, 17 May 2021 09:57:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236424AbhEQJ6a (ORCPT ); Mon, 17 May 2021 05:58:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233962AbhEQJ6C (ORCPT ); Mon, 17 May 2021 05:58:02 -0400 Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58C70C061355 for ; Mon, 17 May 2021 02:56:28 -0700 (PDT) Received: by mail-pg1-x52f.google.com with SMTP id m190so4296280pga.2 for ; Mon, 17 May 2021 02:56:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=m7UPVFphWKlLnjm3KyM/5Ztd/5bHILYcjpqPK+EYiek=; b=RqSNe4hMvGp9UfE8S1qzkSieY6l3c93RCMADFsjRIZsBtyh99/cj2FPeVRp7Ujbzrh IatlXMNLYYdNPseqmd/DvjECoj34MQvEHFQ3TKvgrJxCCeQDh9xgE7z5BmHIIYXzQatF SYfJm2jbzK3fgG6fyq/X67rUF9iaAHcsy6YpySJD02Qn0u8uwAggbwTgDh5BSijYEK+Q H45s+Lvwdi9pGSMCoXUvj1CvugMAyuhOMYv1DOpTlyiV0MG1jb/yZi2xbVIQp7e1Zzot W5BphNOPH24k/0p6K425HrIeFhHD4FxFAbFDQTlbKonwfPl89KP9iAcB3TQAF762/lHl 7i2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=m7UPVFphWKlLnjm3KyM/5Ztd/5bHILYcjpqPK+EYiek=; b=ZkzZ1yE0QD1PgW5zxsGlbQheoYxdjhXJxJ3kL0yrhYMqdFueAOT6Njn1BEof2RhETN Myfv8/S5FnZjUhOt9+bRDCxuhoBGQzyoax4+kdiAcgUW/o90WA+LSdL4V7/XMJcxL3Ik es819ij8gcT4ryVUpLOr+xmMDRsEe7YLjAO4w1/xz4BDaxtIi5q0ValvujbouPWIRM5K RVpzELHkkYRO2ZIEjL0h3iImN0K0weLgNpSXFmZe0IMH3e72M6hXgoeusRej0iv9mfxO akqm/ILHEA/A75hxl8N2pR353vUmMe4HrOVnRdXkBf+F2cKOjbfPAqdeSAaFsG7rTjTJ 9gHw== X-Gm-Message-State: AOAM531sS2AFfy9F9egkYB22ctO84W4NGVDaqngmBPvYXIlqmt4xbzw1 I7VB6DgwLxfYNszenzMB5VaM X-Google-Smtp-Source: ABdhPJwi1jtHgEPXsKetT04pmQxRSuMCRZDYLTHI9A4lskDUMumxaMX4VrgnB1EH6fiq4OHcD/S+bg== X-Received: by 2002:a05:6a00:22c1:b029:2dc:edbe:5e9 with SMTP id f1-20020a056a0022c1b02902dcedbe05e9mr2036267pfj.51.1621245387862; Mon, 17 May 2021 02:56:27 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id o186sm4930765pfg.170.2021.05.17.02.56.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:27 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 08/12] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() Date: Mon, 17 May 2021 17:55:09 +0800 Message-Id: <20210517095513.850-9-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The upcoming patch is going to support VA mapping/unmapping. So let's factor out the logic of PA mapping/unmapping firstly to make the code more readable. Suggested-by: Jason Wang Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/vhost/vdpa.c | 53 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 34 insertions(+), 19 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 1dd8e07554f8..3c773654bf40 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -498,7 +498,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, return r; } -static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) +static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last) { struct vhost_dev *dev = &v->vdev; struct vhost_iotlb *iotlb = dev->iotlb; @@ -520,6 +520,11 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) } } +static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) +{ + return vhost_vdpa_pa_unmap(v, start, last); +} + static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v) { struct vhost_dev *dev = &v->vdev; @@ -600,37 +605,28 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) } } -static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, - struct vhost_iotlb_msg *msg) +static int vhost_vdpa_pa_map(struct vhost_vdpa *v, + u64 iova, u64 size, u64 uaddr, u32 perm) { struct vhost_dev *dev = &v->vdev; - struct vhost_iotlb *iotlb = dev->iotlb; struct page **page_list; unsigned long list_size = PAGE_SIZE / sizeof(struct page *); unsigned int gup_flags = FOLL_LONGTERM; unsigned long npages, cur_base, map_pfn, last_pfn = 0; unsigned long lock_limit, sz2pin, nchunks, i; - u64 iova = msg->iova; + u64 start = iova; long pinned; int ret = 0; - if (msg->iova < v->range.first || - msg->iova + msg->size - 1 > v->range.last) - return -EINVAL; - - if (vhost_iotlb_itree_first(iotlb, msg->iova, - msg->iova + msg->size - 1)) - return -EEXIST; - /* Limit the use of memory for bookkeeping */ page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) return -ENOMEM; - if (msg->perm & VHOST_ACCESS_WO) + if (perm & VHOST_ACCESS_WO) gup_flags |= FOLL_WRITE; - npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT; + npages = PAGE_ALIGN(size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT; if (!npages) { ret = -EINVAL; goto free; @@ -644,7 +640,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, goto unlock; } - cur_base = msg->uaddr & PAGE_MASK; + cur_base = uaddr & PAGE_MASK; iova &= PAGE_MASK; nchunks = 0; @@ -675,7 +671,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT; ret = vhost_vdpa_map(v, iova, csize, map_pfn << PAGE_SHIFT, - msg->perm); + perm); if (ret) { /* * Unpin the pages that are left unmapped @@ -704,7 +700,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, /* Pin the rest chunk */ ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT, - map_pfn << PAGE_SHIFT, msg->perm); + map_pfn << PAGE_SHIFT, perm); out: if (ret) { if (nchunks) { @@ -723,13 +719,32 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, for (pfn = map_pfn; pfn <= last_pfn; pfn++) unpin_user_page(pfn_to_page(pfn)); } - vhost_vdpa_unmap(v, msg->iova, msg->size); + vhost_vdpa_unmap(v, start, size); } unlock: mmap_read_unlock(dev->mm); free: free_page((unsigned long)page_list); return ret; + +} + +static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, + struct vhost_iotlb_msg *msg) +{ + struct vhost_dev *dev = &v->vdev; + struct vhost_iotlb *iotlb = dev->iotlb; + + if (msg->iova < v->range.first || + msg->iova + msg->size - 1 > v->range.last) + return -EINVAL; + + if (vhost_iotlb_itree_first(iotlb, msg->iova, + msg->iova + msg->size - 1)) + return -EEXIST; + + return vhost_vdpa_pa_map(v, msg->iova, msg->size, msg->uaddr, + msg->perm); } static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, From patchwork Mon May 17 09:55:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261441 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F065BC433ED for ; Mon, 17 May 2021 09:57:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D47BA61029 for ; Mon, 17 May 2021 09:57:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236323AbhEQJ6q (ORCPT ); Mon, 17 May 2021 05:58:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236385AbhEQJ6N (ORCPT ); Mon, 17 May 2021 05:58:13 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A55D7C061760 for ; Mon, 17 May 2021 02:56:32 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id c17so4575330pfn.6 for ; Mon, 17 May 2021 02:56:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MiTzCoiARAPRNIQZKCdAoWpd+Y7xHLWKPx88X9tHUnM=; b=NnzXAfPY0YhOdrb/igGoB82JqscE+d2Pa+2xgOq6FRozhr0VDXEsa/3R74UDV6qDvL NmWOpid02s3aFySNIvclYDSpLeKDjHgaCX/LJ6sMXMaTFTz06M3BaWnjOE9I++fEcP1y 7ZLQGtdYTX8mBaV8cSWlkro8L4Y/uvBMSPP10SwPstjxEOyqFJKNvNcapjroMEH3O1Si 6lt2UFFNfVbH4kB4sJZzqaEcfIa8ixz0Mmthcl4Xp1OQKEIwZTY7ZAGkoiH5eZ4CulFp VjRJPXlxdg/OOy8wvVNAsw8U7mDvQRd8+2IlusVlsRX8U8YgIgGIjk4uY6gOGSBkBD+S iddg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MiTzCoiARAPRNIQZKCdAoWpd+Y7xHLWKPx88X9tHUnM=; b=TWhNHYuTodVNsF/aYkAJ2keJvZJwUjsc4LjBS1+AxgNDdujDf7aKMRW57cntac5tEe JlLdwLKyp+UrrnCxrHdt2kLXt+om/Lo+0mZ1PkSdTnuAbFVNmyOSZKo1p7ihg1fzHGTM ItC8JvdCYDfXe6+exxE0uOd3JieE/P9duzMBsFgXoiItUeuQl2qaDbG1X1gvH2OEwTL9 mrKlJJWg0on320TEGuN2ggNhRoL97RGmZ9rFfT/hOOI/26QJiPAfW0wSWcx95uEg7CHD pHXbO1kgRFqa17/lWFxLA+KLQ6CES0OCHdw79zd/61TmnL4te7kpAB0INISkLeWz1Hff kbXw== X-Gm-Message-State: AOAM530dHrbKHV08D/TV00tV4mRucLc0GmnGkRDp77+TVUtsQzkTmsrE IJOETm5HPitHPlHfzHBFOpY+ X-Google-Smtp-Source: ABdhPJwf/8KiAxmByhb3YRcFhmW7d2PIrOWfRucWh3OaV4e7GnS1UqxdzBy3y1vT4AYJtTbTW+MF1g== X-Received: by 2002:a63:6e87:: with SMTP id j129mr57052108pgc.45.1621245392180; Mon, 17 May 2021 02:56:32 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id jz7sm1414768pjb.32.2021.05.17.02.56.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:31 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 09/12] vdpa: Support transferring virtual addressing during DMA mapping Date: Mon, 17 May 2021 17:55:10 +0800 Message-Id: <20210517095513.850-10-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch introduces an attribute for vDPA device to indicate whether virtual address can be used. If vDPA device driver set it, vhost-vdpa bus driver will not pin user page and transfer userspace virtual address instead of physical address during DMA mapping. And corresponding vma->vm_file and offset will be also passed as an opaque pointer. Suggested-by: Jason Wang Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/vdpa/ifcvf/ifcvf_main.c | 2 +- drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- drivers/vdpa/vdpa.c | 9 +++- drivers/vdpa/vdpa_sim/vdpa_sim.c | 2 +- drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +- drivers/vhost/vdpa.c | 99 ++++++++++++++++++++++++++++++++++----- include/linux/vdpa.h | 19 ++++++-- 7 files changed, 116 insertions(+), 19 deletions(-) diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index ab0ab5cf0f6e..daf9746e51e6 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -476,7 +476,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) } adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, - dev, &ifc_vdpa_ops, NULL); + dev, &ifc_vdpa_ops, NULL, false); if (adapter == NULL) { IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); return -ENOMEM; diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 189e4385df40..8a4cb999152b 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -2005,7 +2005,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name) max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS); ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops, - name); + name, false); if (IS_ERR(ndev)) return PTR_ERR(ndev); diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index bb3f1d1f0422..8f01d6a7ecc5 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -71,6 +71,7 @@ static void vdpa_release_dev(struct device *d) * @config: the bus operations that is supported by this device * @size: size of the parent structure that contains private data * @name: name of the vdpa device; optional. + * @use_va: indicate whether virtual address must be used by this device * * Driver should use vdpa_alloc_device() wrapper macro instead of * using this directly. @@ -80,7 +81,8 @@ static void vdpa_release_dev(struct device *d) */ struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - size_t size, const char *name) + size_t size, const char *name, + bool use_va) { struct vdpa_device *vdev; int err = -EINVAL; @@ -91,6 +93,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, if (!!config->dma_map != !!config->dma_unmap) goto err; + /* It should only work for the device that use on-chip IOMMU */ + if (use_va && !(config->dma_map || config->set_map)) + goto err; + err = -ENOMEM; vdev = kzalloc(size, GFP_KERNEL); if (!vdev) @@ -106,6 +112,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, vdev->index = err; vdev->config = config; vdev->features_valid = false; + vdev->use_va = use_va; if (name) err = dev_set_name(&vdev->dev, "%s", name); diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index efd0cb3d964d..a43479cf57ea 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -250,7 +250,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr) ops = &vdpasim_config_ops; vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, - dev_attr->name); + dev_attr->name, false); if (!vdpasim) goto err_alloc; diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index c76ebb531212..f907f42e83bb 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -399,7 +399,7 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) return ret; vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa, - dev, &vp_vdpa_ops, NULL); + dev, &vp_vdpa_ops, NULL, false); if (vp_vdpa == NULL) { dev_err(dev, "vp_vdpa: Failed to allocate vDPA structure\n"); return -ENOMEM; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 3c773654bf40..20a0a9f32331 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -520,8 +520,28 @@ static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last) } } +static void vhost_vdpa_va_unmap(struct vhost_vdpa *v, u64 start, u64 last) +{ + struct vhost_dev *dev = &v->vdev; + struct vhost_iotlb *iotlb = dev->iotlb; + struct vhost_iotlb_map *map; + struct vdpa_map_file *map_file; + + while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) { + map_file = (struct vdpa_map_file *)map->opaque; + fput(map_file->file); + kfree(map_file); + vhost_iotlb_map_free(iotlb, map); + } +} + static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) { + struct vdpa_device *vdpa = v->vdpa; + + if (vdpa->use_va) + return vhost_vdpa_va_unmap(v, start, last); + return vhost_vdpa_pa_unmap(v, start, last); } @@ -556,21 +576,21 @@ static int perm_to_iommu_flags(u32 perm) return flags | IOMMU_CACHE; } -static int vhost_vdpa_map(struct vhost_vdpa *v, - u64 iova, u64 size, u64 pa, u32 perm) +static int vhost_vdpa_map(struct vhost_vdpa *v, u64 iova, + u64 size, u64 pa, u32 perm, void *opaque) { struct vhost_dev *dev = &v->vdev; struct vdpa_device *vdpa = v->vdpa; const struct vdpa_config_ops *ops = vdpa->config; int r = 0; - r = vhost_iotlb_add_range(dev->iotlb, iova, iova + size - 1, - pa, perm); + r = vhost_iotlb_add_range_ctx(dev->iotlb, iova, iova + size - 1, + pa, perm, opaque); if (r) return r; if (ops->dma_map) { - r = ops->dma_map(vdpa, iova, size, pa, perm, NULL); + r = ops->dma_map(vdpa, iova, size, pa, perm, opaque); } else if (ops->set_map) { if (!v->in_batch) r = ops->set_map(vdpa, dev->iotlb); @@ -578,13 +598,15 @@ static int vhost_vdpa_map(struct vhost_vdpa *v, r = iommu_map(v->domain, iova, pa, size, perm_to_iommu_flags(perm)); } - - if (r) + if (r) { vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1); - else + return r; + } + + if (!vdpa->use_va) atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm); - return r; + return 0; } static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) @@ -605,6 +627,56 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) } } +static int vhost_vdpa_va_map(struct vhost_vdpa *v, + u64 iova, u64 size, u64 uaddr, u32 perm) +{ + struct vhost_dev *dev = &v->vdev; + u64 offset, map_size, map_iova = iova; + struct vdpa_map_file *map_file; + struct vm_area_struct *vma; + int ret; + + mmap_read_lock(dev->mm); + + while (size) { + vma = find_vma(dev->mm, uaddr); + if (!vma) { + ret = -EINVAL; + break; + } + map_size = min(size, vma->vm_end - uaddr); + if (!(vma->vm_file && (vma->vm_flags & VM_SHARED) && + !(vma->vm_flags & (VM_IO | VM_PFNMAP)))) + goto next; + + map_file = kzalloc(sizeof(*map_file), GFP_KERNEL); + if (!map_file) { + ret = -ENOMEM; + break; + } + offset = (vma->vm_pgoff << PAGE_SHIFT) + uaddr - vma->vm_start; + map_file->offset = offset; + map_file->file = get_file(vma->vm_file); + ret = vhost_vdpa_map(v, map_iova, map_size, uaddr, + perm, map_file); + if (ret) { + fput(map_file->file); + kfree(map_file); + break; + } +next: + size -= map_size; + uaddr += map_size; + map_iova += map_size; + } + if (ret) + vhost_vdpa_unmap(v, iova, map_iova - iova); + + mmap_read_unlock(dev->mm); + + return ret; +} + static int vhost_vdpa_pa_map(struct vhost_vdpa *v, u64 iova, u64 size, u64 uaddr, u32 perm) { @@ -671,7 +743,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT; ret = vhost_vdpa_map(v, iova, csize, map_pfn << PAGE_SHIFT, - perm); + perm, NULL); if (ret) { /* * Unpin the pages that are left unmapped @@ -700,7 +772,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v, /* Pin the rest chunk */ ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT, - map_pfn << PAGE_SHIFT, perm); + map_pfn << PAGE_SHIFT, perm, NULL); out: if (ret) { if (nchunks) { @@ -733,6 +805,7 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, struct vhost_iotlb_msg *msg) { struct vhost_dev *dev = &v->vdev; + struct vdpa_device *vdpa = v->vdpa; struct vhost_iotlb *iotlb = dev->iotlb; if (msg->iova < v->range.first || @@ -743,6 +816,10 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, msg->iova + msg->size - 1)) return -EEXIST; + if (vdpa->use_va) + return vhost_vdpa_va_map(v, msg->iova, msg->size, + msg->uaddr, msg->perm); + return vhost_vdpa_pa_map(v, msg->iova, msg->size, msg->uaddr, msg->perm); } diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 281f768cb597..85124d197c55 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -44,6 +44,7 @@ struct vdpa_mgmt_dev; * @config: the configuration ops for this device. * @index: device index * @features_valid: were features initialized? for legacy guests + * @use_va: indicate whether virtual address must be used by this device * @nvqs: maximum number of supported virtqueues * @mdev: management device pointer; caller must setup when registering device as part * of dev_add() mgmtdev ops callback before invoking _vdpa_register_device(). @@ -54,6 +55,7 @@ struct vdpa_device { const struct vdpa_config_ops *config; unsigned int index; bool features_valid; + bool use_va; int nvqs; struct vdpa_mgmt_dev *mdev; }; @@ -69,6 +71,16 @@ struct vdpa_iova_range { }; /** + * Corresponding file area for device memory mapping + * @file: vma->vm_file for the mapping + * @offset: mapping offset in the vm_file + */ +struct vdpa_map_file { + struct file *file; + u64 offset; +}; + +/** * struct vdpa_config_ops - operations for configuring a vDPA device. * Note: vDPA device drivers are required to implement all of the * operations unless it is mentioned to be optional in the following @@ -254,14 +266,15 @@ struct vdpa_config_ops { struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - size_t size, const char *name); + size_t size, const char *name, + bool use_va); -#define vdpa_alloc_device(dev_struct, member, parent, config, name) \ +#define vdpa_alloc_device(dev_struct, member, parent, config, name, use_va) \ container_of(__vdpa_alloc_device( \ parent, config, \ sizeof(dev_struct) + \ BUILD_BUG_ON_ZERO(offsetof( \ - dev_struct, member)), name), \ + dev_struct, member)), name, use_va), \ dev_struct, member) int vdpa_register_device(struct vdpa_device *vdev, int nvqs); From patchwork Mon May 17 09:55:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261443 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBBDCC433B4 for ; Mon, 17 May 2021 09:57:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D012861029 for ; Mon, 17 May 2021 09:57:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236206AbhEQJ66 (ORCPT ); Mon, 17 May 2021 05:58:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236266AbhEQJ6X (ORCPT ); Mon, 17 May 2021 05:58:23 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1154C06138A for ; Mon, 17 May 2021 02:56:36 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id x18so277564pfi.9 for ; Mon, 17 May 2021 02:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hoAtLiRrlnpVO+dLi72CXlnN2jmtqImhZcu6dtX0W3g=; b=PaJhBXadgomCQdWGfB1BW2y4tYRpoQMxnd+NFKbnGwBPFJy0kXdSAkFQyLguDrKpzb shyqbgnHUFG8x2o+kKhvy68zLMMS92dKdRKb4W2Vm+6wHxjirLSsqyGKfF106fVYtA35 sHteREgSI1VlwzjbegasVMTNx0xImuAvTUk/NmzRV9IFWaGMDlX+bigpyh9q+1jZCdMJ sc7QHqG6A+CXZFaKBgTEqH6opAGk4aTqFwmhoMcxNarKFIGuMNE1qjnLGj1hiUYDi+8t D+JJk7diTtIjj+qepWteMQJnkMdncuXVhGU24et8HZuxkDYyh1uW/e+e9IH9n8OdFb8t rz0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hoAtLiRrlnpVO+dLi72CXlnN2jmtqImhZcu6dtX0W3g=; b=aQQaZtenioYVEmDQb/ctP6RvvxzbjvPD8+sqiJQXiLvh5Zi06nkbtpa4enSowPti6j Z45eY9v8filYid/gW75S+eq4mszI1ANRnsAJu4G/iZ/P0IK9hoQHX/FmfhwGXzMuR2mA GLiLU5WMXhPP8B8gkC7uY7pER7m5aYmWbZY5ByvNj79aqjNkNGXdDTzk6LwGs7ux8TEL F7VYxb8MzSkCjdE4LbZ9KfeWEU7oIHhGuR5vnsdE3I7c/uHdQnyNCqWcFF86JEFdroBU h/FePeZOo/6jJ6rg/JzaKSy26sMw9Dnx5g5BuMejZwB2L52qO0MzbqGc+bps18RJA9kw MqaA== X-Gm-Message-State: AOAM533xJ5nRkl2BUCVIBDebYYNxGBaTb9asw2SFw4tJw0znKS22hwcZ YABrw/sQo68ri1Zjr60A73zy X-Google-Smtp-Source: ABdhPJwG1Imyv2mTwQJr7/u81IGyTuXLr1894C/kISVUIDPCgLwo7wOyhcAN0mmhVYI8AqbKUT8dEQ== X-Received: by 2002:a05:6a00:2290:b029:2d5:451b:f16c with SMTP id f16-20020a056a002290b02902d5451bf16cmr16592506pfe.55.1621245396420; Mon, 17 May 2021 02:56:36 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id 11sm4945388pfh.182.2021.05.17.02.56.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:35 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 10/12] vduse: Implement an MMU-based IOMMU driver Date: Mon, 17 May 2021 17:55:11 +0800 Message-Id: <20210517095513.850-11-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This implements an MMU-based IOMMU driver to support mapping kernel dma buffer into userspace. The basic idea behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set up MMU mapping instead of IOMMU mapping for the DMA transfer so that the userspace process is able to use its virtual address to access the dma buffer in kernel. And to avoid security issue, a bounce-buffering mechanism is introduced to prevent userspace accessing the original buffer directly. Signed-off-by: Xie Yongji Acked-by: Jason Wang --- drivers/vdpa/vdpa_user/iova_domain.c | 531 +++++++++++++++++++++++++++++++++++ drivers/vdpa/vdpa_user/iova_domain.h | 70 +++++ 2 files changed, 601 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c new file mode 100644 index 000000000000..560b0516f6cd --- /dev/null +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -0,0 +1,531 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * MMU-based IOMMU implementation + * + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +static int vduse_iotlb_add_range(struct vduse_iova_domain *domain, + u64 start, u64 last, + u64 addr, unsigned int perm, + struct file *file, u64 offset) +{ + struct vdpa_map_file *map_file; + int ret; + + map_file = kmalloc(sizeof(*map_file), GFP_ATOMIC); + if (!map_file) + return -ENOMEM; + + map_file->file = get_file(file); + map_file->offset = offset; + + ret = vhost_iotlb_add_range_ctx(domain->iotlb, start, last, + addr, perm, map_file); + if (ret) { + fput(map_file->file); + kfree(map_file); + return ret; + } + return 0; +} + +static void vduse_iotlb_del_range(struct vduse_iova_domain *domain, + u64 start, u64 last) +{ + struct vdpa_map_file *map_file; + struct vhost_iotlb_map *map; + + while ((map = vhost_iotlb_itree_first(domain->iotlb, start, last))) { + map_file = (struct vdpa_map_file *)map->opaque; + fput(map_file->file); + kfree(map_file); + vhost_iotlb_map_free(domain->iotlb, map); + } +} + +int vduse_domain_set_map(struct vduse_iova_domain *domain, + struct vhost_iotlb *iotlb) +{ + struct vdpa_map_file *map_file; + struct vhost_iotlb_map *map; + u64 start = 0ULL, last = ULLONG_MAX; + int ret; + + spin_lock(&domain->iotlb_lock); + vduse_iotlb_del_range(domain, start, last); + + for (map = vhost_iotlb_itree_first(iotlb, start, last); map; + map = vhost_iotlb_itree_next(map, start, last)) { + map_file = (struct vdpa_map_file *)map->opaque; + ret = vduse_iotlb_add_range(domain, map->start, map->last, + map->addr, map->perm, + map_file->file, + map_file->offset); + if (ret) + goto err; + } + spin_unlock(&domain->iotlb_lock); + + return 0; +err: + vduse_iotlb_del_range(domain, start, last); + spin_unlock(&domain->iotlb_lock); + return ret; +} + +static int vduse_domain_map_bounce_page(struct vduse_iova_domain *domain, + u64 iova, u64 size, u64 paddr) +{ + struct vduse_bounce_map *map; + u64 last = iova + size - 1; + + while (iova <= last) { + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + if (!map->bounce_page) { + map->bounce_page = alloc_page(GFP_ATOMIC); + if (!map->bounce_page) + return -ENOMEM; + } + map->orig_phys = paddr; + paddr += PAGE_SIZE; + iova += PAGE_SIZE; + } + return 0; +} + +static void vduse_domain_unmap_bounce_page(struct vduse_iova_domain *domain, + u64 iova, u64 size) +{ + struct vduse_bounce_map *map; + u64 last = iova + size - 1; + + while (iova <= last) { + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + map->orig_phys = INVALID_PHYS_ADDR; + iova += PAGE_SIZE; + } +} + +static void do_bounce(phys_addr_t orig, void *addr, size_t size, + enum dma_data_direction dir) +{ + unsigned long pfn = PFN_DOWN(orig); + unsigned int offset = offset_in_page(orig); + char *buffer; + unsigned int sz = 0; + + while (size) { + sz = min_t(size_t, PAGE_SIZE - offset, size); + + buffer = kmap_atomic(pfn_to_page(pfn)); + if (dir == DMA_TO_DEVICE) + memcpy(addr, buffer + offset, sz); + else + memcpy(buffer + offset, addr, sz); + kunmap_atomic(buffer); + + size -= sz; + pfn++; + addr += sz; + offset = 0; + } +} + +static void vduse_domain_bounce(struct vduse_iova_domain *domain, + dma_addr_t iova, size_t size, + enum dma_data_direction dir) +{ + struct vduse_bounce_map *map; + unsigned int offset; + void *addr; + size_t sz; + + if (iova >= domain->bounce_size) + return; + + while (size) { + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + offset = offset_in_page(iova); + sz = min_t(size_t, PAGE_SIZE - offset, size); + + if (WARN_ON(!map->bounce_page || + map->orig_phys == INVALID_PHYS_ADDR)) + return; + + addr = page_address(map->bounce_page) + offset; + do_bounce(map->orig_phys + offset, addr, sz, dir); + size -= sz; + iova += sz; + } +} + +static struct page * +vduse_domain_get_coherent_page(struct vduse_iova_domain *domain, u64 iova) +{ + u64 start = iova & PAGE_MASK; + u64 last = start + PAGE_SIZE - 1; + struct vhost_iotlb_map *map; + struct page *page = NULL; + + spin_lock(&domain->iotlb_lock); + map = vhost_iotlb_itree_first(domain->iotlb, start, last); + if (!map) + goto out; + + page = pfn_to_page((map->addr + iova - map->start) >> PAGE_SHIFT); + get_page(page); +out: + spin_unlock(&domain->iotlb_lock); + + return page; +} + +static struct page * +vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova) +{ + struct vduse_bounce_map *map; + struct page *page = NULL; + + spin_lock(&domain->iotlb_lock); + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + if (!map->bounce_page) + goto out; + + page = map->bounce_page; + get_page(page); +out: + spin_unlock(&domain->iotlb_lock); + + return page; +} + +static void +vduse_domain_free_bounce_pages(struct vduse_iova_domain *domain) +{ + struct vduse_bounce_map *map; + unsigned long pfn, bounce_pfns; + + spin_lock(&domain->iotlb_lock); + bounce_pfns = domain->bounce_size >> PAGE_SHIFT; + + for (pfn = 0; pfn < bounce_pfns; pfn++) { + map = &domain->bounce_maps[pfn]; + if (WARN_ON(map->orig_phys != INVALID_PHYS_ADDR)) + continue; + + if (!map->bounce_page) + continue; + + __free_page(map->bounce_page); + map->bounce_page = NULL; + } + spin_unlock(&domain->iotlb_lock); +} + +void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain) +{ + if (!domain->bounce_map) + return; + + spin_lock(&domain->iotlb_lock); + if (!domain->bounce_map) + goto unlock; + + vduse_iotlb_del_range(domain, 0, domain->bounce_size - 1); + domain->bounce_map = 0; +unlock: + spin_unlock(&domain->iotlb_lock); +} + +static int vduse_domain_init_bounce_map(struct vduse_iova_domain *domain) +{ + int ret = 0; + + if (domain->bounce_map) + return 0; + + spin_lock(&domain->iotlb_lock); + if (domain->bounce_map) + goto unlock; + + ret = vduse_iotlb_add_range(domain, 0, domain->bounce_size - 1, + 0, VHOST_MAP_RW, domain->file, 0); + if (ret) + goto unlock; + + domain->bounce_map = 1; +unlock: + spin_unlock(&domain->iotlb_lock); + return ret; +} + +static dma_addr_t +vduse_domain_alloc_iova(struct iova_domain *iovad, + unsigned long size, unsigned long limit) +{ + unsigned long shift = iova_shift(iovad); + unsigned long iova_len = iova_align(iovad, size) >> shift; + unsigned long iova_pfn; + + /* + * Freeing non-power-of-two-sized allocations back into the IOVA caches + * will come back to bite us badly, so we have to waste a bit of space + * rounding up anything cacheable to make sure that can't happen. The + * order of the unadjusted size will still match upon freeing. + */ + if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) + iova_len = roundup_pow_of_two(iova_len); + iova_pfn = alloc_iova_fast(iovad, iova_len, limit >> shift, true); + + return iova_pfn << shift; +} + +static void vduse_domain_free_iova(struct iova_domain *iovad, + dma_addr_t iova, size_t size) +{ + unsigned long shift = iova_shift(iovad); + unsigned long iova_len = iova_align(iovad, size) >> shift; + + free_iova_fast(iovad, iova >> shift, iova_len); +} + +dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, + struct page *page, unsigned long offset, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct iova_domain *iovad = &domain->stream_iovad; + unsigned long limit = domain->bounce_size - 1; + phys_addr_t pa = page_to_phys(page) + offset; + dma_addr_t iova = vduse_domain_alloc_iova(iovad, size, limit); + + if (!iova) + return DMA_MAPPING_ERROR; + + if (vduse_domain_init_bounce_map(domain)) + goto err; + + if (vduse_domain_map_bounce_page(domain, (u64)iova, (u64)size, pa)) + goto err; + + if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) + vduse_domain_bounce(domain, iova, size, DMA_TO_DEVICE); + + return iova; +err: + vduse_domain_free_iova(iovad, iova, size); + return DMA_MAPPING_ERROR; +} + +void vduse_domain_unmap_page(struct vduse_iova_domain *domain, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + struct iova_domain *iovad = &domain->stream_iovad; + + if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) + vduse_domain_bounce(domain, dma_addr, size, DMA_FROM_DEVICE); + + vduse_domain_unmap_bounce_page(domain, (u64)dma_addr, (u64)size); + vduse_domain_free_iova(iovad, dma_addr, size); +} + +void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, + size_t size, dma_addr_t *dma_addr, + gfp_t flag, unsigned long attrs) +{ + struct iova_domain *iovad = &domain->consistent_iovad; + unsigned long limit = domain->iova_limit; + dma_addr_t iova = vduse_domain_alloc_iova(iovad, size, limit); + void *orig = alloc_pages_exact(size, flag); + + if (!iova || !orig) + goto err; + + spin_lock(&domain->iotlb_lock); + if (vduse_iotlb_add_range(domain, (u64)iova, (u64)iova + size - 1, + virt_to_phys(orig), VHOST_MAP_RW, + domain->file, (u64)iova)) { + spin_unlock(&domain->iotlb_lock); + goto err; + } + spin_unlock(&domain->iotlb_lock); + + *dma_addr = iova; + + return orig; +err: + *dma_addr = DMA_MAPPING_ERROR; + if (orig) + free_pages_exact(orig, size); + if (iova) + vduse_domain_free_iova(iovad, iova, size); + + return NULL; +} + +void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs) +{ + struct iova_domain *iovad = &domain->consistent_iovad; + struct vhost_iotlb_map *map; + struct vdpa_map_file *map_file; + phys_addr_t pa; + + spin_lock(&domain->iotlb_lock); + map = vhost_iotlb_itree_first(domain->iotlb, (u64)dma_addr, + (u64)dma_addr + size - 1); + if (WARN_ON(!map)) { + spin_unlock(&domain->iotlb_lock); + return; + } + map_file = (struct vdpa_map_file *)map->opaque; + fput(map_file->file); + kfree(map_file); + pa = map->addr; + vhost_iotlb_map_free(domain->iotlb, map); + spin_unlock(&domain->iotlb_lock); + + vduse_domain_free_iova(iovad, dma_addr, size); + free_pages_exact(phys_to_virt(pa), size); +} + +static vm_fault_t vduse_domain_mmap_fault(struct vm_fault *vmf) +{ + struct vduse_iova_domain *domain = vmf->vma->vm_private_data; + unsigned long iova = vmf->pgoff << PAGE_SHIFT; + struct page *page; + + if (!domain) + return VM_FAULT_SIGBUS; + + if (iova < domain->bounce_size) + page = vduse_domain_get_bounce_page(domain, iova); + else + page = vduse_domain_get_coherent_page(domain, iova); + + if (!page) + return VM_FAULT_SIGBUS; + + vmf->page = page; + + return 0; +} + +static const struct vm_operations_struct vduse_domain_mmap_ops = { + .fault = vduse_domain_mmap_fault, +}; + +static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct vduse_iova_domain *domain = file->private_data; + + vma->vm_flags |= VM_DONTDUMP | VM_DONTEXPAND; + vma->vm_private_data = domain; + vma->vm_ops = &vduse_domain_mmap_ops; + + return 0; +} + +static int vduse_domain_release(struct inode *inode, struct file *file) +{ + struct vduse_iova_domain *domain = file->private_data; + + vduse_domain_reset_bounce_map(domain); + vduse_domain_free_bounce_pages(domain); + put_iova_domain(&domain->stream_iovad); + put_iova_domain(&domain->consistent_iovad); + vhost_iotlb_free(domain->iotlb); + vfree(domain->bounce_maps); + kfree(domain); + + return 0; +} + +static const struct file_operations vduse_domain_fops = { + .owner = THIS_MODULE, + .mmap = vduse_domain_mmap, + .release = vduse_domain_release, +}; + +void vduse_domain_destroy(struct vduse_iova_domain *domain) +{ + fput(domain->file); +} + +struct vduse_iova_domain * +vduse_domain_create(unsigned long iova_limit, size_t bounce_size) +{ + struct vduse_iova_domain *domain; + struct file *file; + struct vduse_bounce_map *map; + unsigned long pfn, bounce_pfns; + + bounce_pfns = PAGE_ALIGN(bounce_size) >> PAGE_SHIFT; + if (iova_limit <= bounce_size) + return NULL; + + domain = kzalloc(sizeof(*domain), GFP_KERNEL); + if (!domain) + return NULL; + + domain->iotlb = vhost_iotlb_alloc(0, 0); + if (!domain->iotlb) + goto err_iotlb; + + domain->iova_limit = iova_limit; + domain->bounce_size = PAGE_ALIGN(bounce_size); + domain->bounce_maps = vzalloc(bounce_pfns * + sizeof(struct vduse_bounce_map)); + if (!domain->bounce_maps) + goto err_map; + + for (pfn = 0; pfn < bounce_pfns; pfn++) { + map = &domain->bounce_maps[pfn]; + map->orig_phys = INVALID_PHYS_ADDR; + } + file = anon_inode_getfile("[vduse-domain]", &vduse_domain_fops, + domain, O_RDWR); + if (IS_ERR(file)) + goto err_file; + + domain->file = file; + spin_lock_init(&domain->iotlb_lock); + init_iova_domain(&domain->stream_iovad, + PAGE_SIZE, IOVA_START_PFN); + init_iova_domain(&domain->consistent_iovad, + PAGE_SIZE, bounce_pfns); + + return domain; +err_file: + vfree(domain->bounce_maps); +err_map: + vhost_iotlb_free(domain->iotlb); +err_iotlb: + kfree(domain); + return NULL; +} + +int vduse_domain_init(void) +{ + return iova_cache_get(); +} + +void vduse_domain_exit(void) +{ + iova_cache_put(); +} diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/vdpa_user/iova_domain.h new file mode 100644 index 000000000000..31f822afa5f5 --- /dev/null +++ b/drivers/vdpa/vdpa_user/iova_domain.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * MMU-based IOMMU implementation + * + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#ifndef _VDUSE_IOVA_DOMAIN_H +#define _VDUSE_IOVA_DOMAIN_H + +#include +#include +#include + +#define IOVA_START_PFN 1 + +#define INVALID_PHYS_ADDR (~(phys_addr_t)0) + +struct vduse_bounce_map { + struct page *bounce_page; + u64 orig_phys; +}; + +struct vduse_iova_domain { + struct iova_domain stream_iovad; + struct iova_domain consistent_iovad; + struct vduse_bounce_map *bounce_maps; + size_t bounce_size; + unsigned long iova_limit; + int bounce_map; + struct vhost_iotlb *iotlb; + spinlock_t iotlb_lock; + struct file *file; +}; + +int vduse_domain_set_map(struct vduse_iova_domain *domain, + struct vhost_iotlb *iotlb); + +dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, + struct page *page, unsigned long offset, + size_t size, enum dma_data_direction dir, + unsigned long attrs); + +void vduse_domain_unmap_page(struct vduse_iova_domain *domain, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs); + +void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, + size_t size, dma_addr_t *dma_addr, + gfp_t flag, unsigned long attrs); + +void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs); + +void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); + +void vduse_domain_destroy(struct vduse_iova_domain *domain); + +struct vduse_iova_domain *vduse_domain_create(unsigned long iova_limit, + size_t bounce_size); + +int vduse_domain_init(void); + +void vduse_domain_exit(void); + +#endif /* _VDUSE_IOVA_DOMAIN_H */ From patchwork Mon May 17 09:55:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261445 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FC13C433ED for ; Mon, 17 May 2021 09:58:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D992261029 for ; Mon, 17 May 2021 09:58:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236467AbhEQJ7T (ORCPT ); Mon, 17 May 2021 05:59:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35238 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236440AbhEQJ6p (ORCPT ); Mon, 17 May 2021 05:58:45 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E712DC06137D for ; Mon, 17 May 2021 02:56:41 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id z4so648090plg.8 for ; Mon, 17 May 2021 02:56:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uvQCM34udD9y/BEMyK2IUMPFWxihVXap3X5rClX92zw=; b=m3ZJirVbYxOFq58UP6NVretKxNu2fW5w5HNYkOP+1MFixrArSrVrA2RUFxFEIlmepk KejC4zinwBG76DuM+CSJXc+pYAknTAAUx2VEiKoGiU1S3yot/Le3d2OIMUp47kZzonu9 +gY9463WjphFYg6MS2QHRUH9iM+kiYglvAFgKAaDTpswIGVlcHqj1zXFCjY5OJk8DauX mg/yN0Xsc++olfMF3r5w3pI58zYKaxM/MiCDl1yhq9dLzBKjEeGkIIaCdtNtiSO2au2L mE/tSoCqHnEb4+sDL1kwIumxk5ckFR7a3Qwpk7Ss8fLbgY7bxidrccDsJ02atgVAXAww w4Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uvQCM34udD9y/BEMyK2IUMPFWxihVXap3X5rClX92zw=; b=hKvIpJQ/RjjZhH8sQE/U+2QYUiVj/Puf2SrPjzmEQsPtrdwK08DKZ5lgf1DI/SB2yR unFjOEpRSNN5KyQHQYlmHNXFKjRgjaXJBSvqmU/7Mze6Z5ErKuuN7sEkREGyWh82Jwij Ar+dNT6nYd2vT6EQ5sjP/R1dYwMJgS5ZLy4vIHR67eNFhqXH6f8jGW1EFRcIndpQKiJ7 +0+xN2VGWLrZybH6XYHM82dGf/s2Sn5jUITQBdowSS78sy/4mn3bgcusBB8l3M8uhm2e V1YkfmfMY38fIyUEjfEJdxKbw1dpX7PuiKt9vXZF0i4AK4efs51SDJTIxB9LLzyRgjzQ EDgg== X-Gm-Message-State: AOAM5330YSOoB/MiZAPSrAzaokgdUH3Pv7OTAGV1zoXWzOkLXDG3dOZI FBQmBYAfGlX+TwqUMW1locdg X-Google-Smtp-Source: ABdhPJxrmztcgoOH36BKALLyLrLW2mTxhzyapG5fIplPuW0SZ+m9OizErfR3MLHSLcJT5389ltMtBA== X-Received: by 2002:a17:902:7c0d:b029:ed:61b5:337a with SMTP id x13-20020a1709027c0db02900ed61b5337amr59313380pll.20.1621245401245; Mon, 17 May 2021 02:56:41 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id g21sm9959032pfh.81.2021.05.17.02.56.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:40 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 11/12] vduse: Introduce VDUSE - vDPA Device in Userspace Date: Mon, 17 May 2021 17:55:12 +0800 Message-Id: <20210517095513.850-12-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This VDUSE driver enables implementing vDPA devices in userspace. Both control path and data path of vDPA devices will be able to be handled in userspace. In the control path, the VDUSE driver will make use of message mechnism to forward the config operation from vdpa bus driver to userspace. Userspace can use read()/write() to receive/reply those control messages. In the data path, VDUSE_IOTLB_GET_FD ioctl will be used to get the file descriptors referring to vDPA device's iova regions. Then userspace can use mmap() to access those iova regions. Besides, userspace can use ioctl() to inject interrupt and use the eventfd mechanism to receive virtqueue kicks. Signed-off-by: Xie Yongji --- Documentation/userspace-api/ioctl/ioctl-number.rst | 1 + drivers/vdpa/Kconfig | 10 + drivers/vdpa/Makefile | 1 + drivers/vdpa/vdpa_user/Makefile | 5 + drivers/vdpa/vdpa_user/vduse_dev.c | 1453 ++++++++++++++++++++ include/uapi/linux/vduse.h | 178 +++ 6 files changed, 1648 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index 599bd4493944..db44cdabed35 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -300,6 +300,7 @@ Code Seq# Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index a503c1b2bfd9..6e23bce6433a 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK vDPA block device simulator which terminates IO request in a memory buffer. +config VDPA_USER + tristate "VDUSE (vDPA Device in Userspace) support" + depends on EVENTFD && MMU && HAS_DMA + select DMA_OPS + select VHOST_IOTLB + select IOMMU_IOVA + help + With VDUSE it is possible to emulate a vDPA Device + in a userspace program. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index 67fe7f3d6943..f02ebed33f19 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ +obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF) += ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA) += virtio_pci/ diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile new file mode 100644 index 000000000000..260e0b26af99 --- /dev/null +++ b/drivers/vdpa/vdpa_user/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +vduse-y := vduse_dev.o iova_domain.o + +obj-$(CONFIG_VDPA_USER) += vduse.o diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c new file mode 100644 index 000000000000..9ac4cf100ccd --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -0,0 +1,1453 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +#define DRV_VERSION "1.0" +#define DRV_AUTHOR "Yongji Xie " +#define DRV_DESC "vDPA Device in Userspace" +#define DRV_LICENSE "GPL v2" + +#define VDUSE_DEV_MAX (1U << MINORBITS) + +struct vduse_virtqueue { + u16 index; + bool ready; + spinlock_t kick_lock; + spinlock_t irq_lock; + struct eventfd_ctx *kickfd; + struct vdpa_callback cb; + struct work_struct inject; +}; + +struct vduse_dev; + +struct vduse_vdpa { + struct vdpa_device vdpa; + struct vduse_dev *dev; +}; + +struct vduse_dev { + struct vduse_vdpa *vdev; + struct device dev; + struct cdev cdev; + struct vduse_virtqueue *vqs; + struct vduse_iova_domain *domain; + char *name; + struct mutex lock; + spinlock_t msg_lock; + atomic64_t msg_unique; + wait_queue_head_t waitq; + struct list_head send_list; + struct list_head recv_list; + struct list_head list; + struct vdpa_callback config_cb; + struct work_struct inject; + spinlock_t irq_lock; + unsigned long api_version; + bool connected; + int minor; + u16 vq_size_max; + u32 vq_num; + u32 vq_align; + u32 config_size; + u32 device_id; + u32 vendor_id; +}; + +struct vduse_dev_msg { + struct vduse_dev_request req; + struct vduse_dev_response resp; + struct list_head list; + wait_queue_head_t waitq; + bool completed; +}; + +struct vduse_control { + unsigned long api_version; +}; + +static unsigned long max_bounce_size = (64 * 1024 * 1024); +module_param(max_bounce_size, ulong, 0444); +MODULE_PARM_DESC(max_bounce_size, "Maximum bounce buffer size. (default: 64M)"); + +static unsigned long max_iova_size = (128 * 1024 * 1024); +module_param(max_iova_size, ulong, 0444); +MODULE_PARM_DESC(max_iova_size, "Maximum iova space size (default: 128M)"); + +static bool allow_unsafe_device_emulation; +module_param(allow_unsafe_device_emulation, bool, 0444); +MODULE_PARM_DESC(allow_unsafe_device_emulation, "Allow emulating unsafe device." + " We must make sure the userspace device emulation process is trusted." + " Otherwise, don't enable this option. (default: false)"); + +static DEFINE_MUTEX(vduse_lock); +static LIST_HEAD(vduse_devs); +static DEFINE_IDA(vduse_ida); + +static dev_t vduse_major; +static struct class *vduse_class; +static struct workqueue_struct *vduse_irq_wq; + +static unsigned int allowed_device_id[] = { + VIRTIO_ID_NET, VIRTIO_ID_BLOCK, + VIRTIO_ID_SCSI, VIRTIO_ID_FS, +}; + +static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) +{ + struct vduse_vdpa *vdev = container_of(vdpa, struct vduse_vdpa, vdpa); + + return vdev->dev; +} + +static inline struct vduse_dev *dev_to_vduse(struct device *dev) +{ + struct vdpa_device *vdpa = dev_to_vdpa(dev); + + return vdpa_to_vduse(vdpa); +} + +static struct vduse_dev_msg *vduse_find_msg(struct list_head *head, + uint32_t request_id) +{ + struct vduse_dev_msg *tmp, *msg = NULL; + + list_for_each_entry(tmp, head, list) { + if (tmp->req.request_id == request_id) { + msg = tmp; + list_del(&tmp->list); + break; + } + } + + return msg; +} + +static struct vduse_dev_msg *vduse_dequeue_msg(struct list_head *head) +{ + struct vduse_dev_msg *msg = NULL; + + if (!list_empty(head)) { + msg = list_first_entry(head, struct vduse_dev_msg, list); + list_del(&msg->list); + } + + return msg; +} + +static void vduse_enqueue_msg(struct list_head *head, + struct vduse_dev_msg *msg) +{ + list_add_tail(&msg->list, head); +} + +static int vduse_dev_msg_sync(struct vduse_dev *dev, + struct vduse_dev_msg *msg) +{ + init_waitqueue_head(&msg->waitq); + spin_lock(&dev->msg_lock); + vduse_enqueue_msg(&dev->send_list, msg); + wake_up(&dev->waitq); + spin_unlock(&dev->msg_lock); + wait_event_killable(msg->waitq, msg->completed); + spin_lock(&dev->msg_lock); + if (!msg->completed) { + list_del(&msg->list); + msg->resp.result = VDUSE_REQUEST_FAILED; + } + spin_unlock(&dev->msg_lock); + + return (msg->resp.result == VDUSE_REQUEST_OK) ? 0 : -1; +} + +static u32 vduse_dev_get_request_id(struct vduse_dev *dev) +{ + return atomic64_fetch_inc(&dev->msg_unique); +} + +static u64 vduse_dev_get_features(struct vduse_dev *dev) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_GET_FEATURES; + msg.req.request_id = vduse_dev_get_request_id(dev); + + if (WARN_ON(vduse_dev_msg_sync(dev, &msg) != 0)) + return 0; + + return msg.resp.f.features; +} + +static int vduse_dev_set_features(struct vduse_dev *dev, u64 features) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_FEATURES; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.f.features = features; + + return vduse_dev_msg_sync(dev, &msg); +} + +static u8 vduse_dev_get_status(struct vduse_dev *dev) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_GET_STATUS; + msg.req.request_id = vduse_dev_get_request_id(dev); + + if (WARN_ON(vduse_dev_msg_sync(dev, &msg) != 0)) + return 0; + + return msg.resp.s.status; +} + +static void vduse_dev_set_status(struct vduse_dev *dev, u8 status) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_STATUS; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.s.status = status; + + WARN_ON(vduse_dev_msg_sync(dev, &msg) != 0); +} + +static void vduse_dev_get_config(struct vduse_dev *dev, unsigned int offset, + void *buf, unsigned int len) +{ + struct vduse_dev_msg msg = { 0 }; + unsigned int sz; + + while (len) { + sz = min_t(unsigned int, len, sizeof(msg.req.config.data)); + msg.req.type = VDUSE_GET_CONFIG; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.config.offset = offset; + msg.req.config.len = sz; + if (WARN_ON(vduse_dev_msg_sync(dev, &msg) != 0)) + break; + + memcpy(buf, msg.resp.config.data, sz); + buf += sz; + offset += sz; + len -= sz; + } +} + +static void vduse_dev_set_config(struct vduse_dev *dev, unsigned int offset, + const void *buf, unsigned int len) +{ + struct vduse_dev_msg msg = { 0 }; + unsigned int sz; + + while (len) { + sz = min_t(unsigned int, len, sizeof(msg.req.config.data)); + msg.req.type = VDUSE_SET_CONFIG; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.config.offset = offset; + msg.req.config.len = sz; + memcpy(msg.req.config.data, buf, sz); + if (WARN_ON(vduse_dev_msg_sync(dev, &msg) != 0)) + break; + + buf += sz; + offset += sz; + len -= sz; + } +} + +static void vduse_dev_set_vq_num(struct vduse_dev *dev, + struct vduse_virtqueue *vq, u32 num) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_VQ_NUM; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.vq_num.index = vq->index; + msg.req.vq_num.num = num; + + WARN_ON(vduse_dev_msg_sync(dev, &msg) != 0); +} + +static int vduse_dev_set_vq_addr(struct vduse_dev *dev, + struct vduse_virtqueue *vq, u64 desc_addr, + u64 driver_addr, u64 device_addr) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_VQ_ADDR; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.vq_addr.index = vq->index; + msg.req.vq_addr.desc_addr = desc_addr; + msg.req.vq_addr.driver_addr = driver_addr; + msg.req.vq_addr.device_addr = device_addr; + + return vduse_dev_msg_sync(dev, &msg); +} + +static void vduse_dev_set_vq_ready(struct vduse_dev *dev, + struct vduse_virtqueue *vq, bool ready) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_VQ_READY; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.vq_ready.index = vq->index; + msg.req.vq_ready.ready = ready; + + WARN_ON(vduse_dev_msg_sync(dev, &msg) != 0); +} + +static bool vduse_dev_get_vq_ready(struct vduse_dev *dev, + struct vduse_virtqueue *vq) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_GET_VQ_READY; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.vq_ready.index = vq->index; + if (WARN_ON(vduse_dev_msg_sync(dev, &msg))) + return false; + + return msg.resp.vq_ready.ready; +} + +static int vduse_dev_get_vq_state(struct vduse_dev *dev, + struct vduse_virtqueue *vq, + struct vdpa_vq_state *state) +{ + struct vduse_dev_msg msg = { 0 }; + int ret; + + msg.req.type = VDUSE_GET_VQ_STATE; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.vq_state.index = vq->index; + + ret = vduse_dev_msg_sync(dev, &msg); + if (!ret) + state->avail_index = msg.resp.vq_state.avail_idx; + + return ret; +} + +static int vduse_dev_set_vq_state(struct vduse_dev *dev, + struct vduse_virtqueue *vq, + const struct vdpa_vq_state *state) +{ + struct vduse_dev_msg msg = { 0 }; + + msg.req.type = VDUSE_SET_VQ_STATE; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.vq_state.index = vq->index; + msg.req.vq_state.avail_idx = state->avail_index; + + return vduse_dev_msg_sync(dev, &msg); +} + +static int vduse_dev_update_iotlb(struct vduse_dev *dev, + u64 start, u64 last) +{ + struct vduse_dev_msg msg = { 0 }; + + if (last < start) + return -EINVAL; + + msg.req.type = VDUSE_UPDATE_IOTLB; + msg.req.request_id = vduse_dev_get_request_id(dev); + msg.req.iova.start = start; + msg.req.iova.last = last; + + return vduse_dev_msg_sync(dev, &msg); +} + +static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct file *file = iocb->ki_filp; + struct vduse_dev *dev = file->private_data; + struct vduse_dev_msg *msg; + int size = sizeof(struct vduse_dev_request); + ssize_t ret; + + if (iov_iter_count(to) < size) + return -EINVAL; + + spin_lock(&dev->msg_lock); + while (1) { + msg = vduse_dequeue_msg(&dev->send_list); + if (msg) + break; + + ret = -EAGAIN; + if (file->f_flags & O_NONBLOCK) + goto unlock; + + spin_unlock(&dev->msg_lock); + ret = wait_event_interruptible_exclusive(dev->waitq, + !list_empty(&dev->send_list)); + if (ret) + return ret; + + spin_lock(&dev->msg_lock); + } + spin_unlock(&dev->msg_lock); + ret = copy_to_iter(&msg->req, size, to); + spin_lock(&dev->msg_lock); + if (ret != size) { + ret = -EFAULT; + vduse_enqueue_msg(&dev->send_list, msg); + goto unlock; + } + vduse_enqueue_msg(&dev->recv_list, msg); +unlock: + spin_unlock(&dev->msg_lock); + + return ret; +} + +static ssize_t vduse_dev_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct vduse_dev *dev = file->private_data; + struct vduse_dev_response resp; + struct vduse_dev_msg *msg; + size_t ret; + + ret = copy_from_iter(&resp, sizeof(resp), from); + if (ret != sizeof(resp)) + return -EINVAL; + + spin_lock(&dev->msg_lock); + msg = vduse_find_msg(&dev->recv_list, resp.request_id); + if (!msg) { + ret = -ENOENT; + goto unlock; + } + + memcpy(&msg->resp, &resp, sizeof(resp)); + msg->completed = 1; + wake_up(&msg->waitq); +unlock: + spin_unlock(&dev->msg_lock); + + return ret; +} + +static __poll_t vduse_dev_poll(struct file *file, poll_table *wait) +{ + struct vduse_dev *dev = file->private_data; + __poll_t mask = 0; + + poll_wait(file, &dev->waitq, wait); + + if (!list_empty(&dev->send_list)) + mask |= EPOLLIN | EPOLLRDNORM; + if (!list_empty(&dev->recv_list)) + mask |= EPOLLOUT | EPOLLWRNORM; + + return mask; +} + +static void vduse_dev_reset(struct vduse_dev *dev) +{ + int i; + struct vduse_iova_domain *domain = dev->domain; + + /* The coherent mappings are handled in vduse_dev_free_coherent() */ + if (domain->bounce_map) { + vduse_domain_reset_bounce_map(domain); + vduse_dev_update_iotlb(dev, 0ULL, domain->bounce_size - 1); + } + + spin_lock(&dev->irq_lock); + dev->config_cb.callback = NULL; + dev->config_cb.private = NULL; + spin_unlock(&dev->irq_lock); + + for (i = 0; i < dev->vq_num; i++) { + struct vduse_virtqueue *vq = &dev->vqs[i]; + + spin_lock(&vq->irq_lock); + vq->ready = false; + vq->cb.callback = NULL; + vq->cb.private = NULL; + spin_unlock(&vq->irq_lock); + } +} + +static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx, + u64 desc_area, u64 driver_area, + u64 device_area) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_set_vq_addr(dev, vq, desc_area, + driver_area, device_area); +} + +static void vduse_vdpa_kick_vq(struct vdpa_device *vdpa, u16 idx) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + spin_lock(&vq->kick_lock); + if (vq->ready && vq->kickfd) + eventfd_signal(vq->kickfd, 1); + spin_unlock(&vq->kick_lock); +} + +static void vduse_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 idx, + struct vdpa_callback *cb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + spin_lock(&vq->irq_lock); + vq->cb.callback = cb->callback; + vq->cb.private = cb->private; + spin_unlock(&vq->irq_lock); +} + +static void vduse_vdpa_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vduse_dev_set_vq_num(dev, vq, num); +} + +static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa, + u16 idx, bool ready) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vduse_dev_set_vq_ready(dev, vq, ready); + vq->ready = ready; +} + +static bool vduse_vdpa_get_vq_ready(struct vdpa_device *vdpa, u16 idx) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + vq->ready = vduse_dev_get_vq_ready(dev, vq); + + return vq->ready; +} + +static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx, + const struct vdpa_vq_state *state) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_set_vq_state(dev, vq, state); +} + +static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx, + struct vdpa_vq_state *state) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + struct vduse_virtqueue *vq = &dev->vqs[idx]; + + return vduse_dev_get_vq_state(dev, vq, state); +} + +static u32 vduse_vdpa_get_vq_align(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vq_align; +} + +static u64 vduse_vdpa_get_features(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return vduse_dev_get_features(dev); +} + +static int vduse_vdpa_set_features(struct vdpa_device *vdpa, u64 features) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) + return -EINVAL; + + return vduse_dev_set_features(dev, features); +} + +static void vduse_vdpa_set_config_cb(struct vdpa_device *vdpa, + struct vdpa_callback *cb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + spin_lock(&dev->irq_lock); + dev->config_cb.callback = cb->callback; + dev->config_cb.private = cb->private; + spin_unlock(&dev->irq_lock); +} + +static u16 vduse_vdpa_get_vq_num_max(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vq_size_max; +} + +static u32 vduse_vdpa_get_device_id(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->device_id; +} + +static u32 vduse_vdpa_get_vendor_id(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->vendor_id; +} + +static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return vduse_dev_get_status(dev); +} + +static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + vduse_dev_set_status(dev, status); + + if (status == 0) + vduse_dev_reset(dev); +} + +static size_t vduse_vdpa_get_config_size(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + return dev->config_size; +} + +static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, + void *buf, unsigned int len) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + vduse_dev_get_config(dev, offset, buf, len); +} + +static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int offset, + const void *buf, unsigned int len) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + vduse_dev_set_config(dev, offset, buf, len); +} + +static int vduse_vdpa_set_map(struct vdpa_device *vdpa, + struct vhost_iotlb *iotlb) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + int ret; + + ret = vduse_domain_set_map(dev->domain, iotlb); + vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); + + return ret; +} + +static void vduse_vdpa_free(struct vdpa_device *vdpa) +{ + struct vduse_dev *dev = vdpa_to_vduse(vdpa); + + WARN_ON(!list_empty(&dev->send_list)); + WARN_ON(!list_empty(&dev->recv_list)); + dev->vdev = NULL; +} + +static const struct vdpa_config_ops vduse_vdpa_config_ops = { + .set_vq_address = vduse_vdpa_set_vq_address, + .kick_vq = vduse_vdpa_kick_vq, + .set_vq_cb = vduse_vdpa_set_vq_cb, + .set_vq_num = vduse_vdpa_set_vq_num, + .set_vq_ready = vduse_vdpa_set_vq_ready, + .get_vq_ready = vduse_vdpa_get_vq_ready, + .set_vq_state = vduse_vdpa_set_vq_state, + .get_vq_state = vduse_vdpa_get_vq_state, + .get_vq_align = vduse_vdpa_get_vq_align, + .get_features = vduse_vdpa_get_features, + .set_features = vduse_vdpa_set_features, + .set_config_cb = vduse_vdpa_set_config_cb, + .get_vq_num_max = vduse_vdpa_get_vq_num_max, + .get_device_id = vduse_vdpa_get_device_id, + .get_vendor_id = vduse_vdpa_get_vendor_id, + .get_status = vduse_vdpa_get_status, + .set_status = vduse_vdpa_set_status, + .get_config_size = vduse_vdpa_get_config_size, + .get_config = vduse_vdpa_get_config, + .set_config = vduse_vdpa_set_config, + .set_map = vduse_vdpa_set_map, + .free = vduse_vdpa_free, +}; + +static dma_addr_t vduse_dev_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return vduse_domain_map_page(domain, page, offset, size, dir, attrs); +} + +static void vduse_dev_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); +} + +static void *vduse_dev_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long iova; + void *addr; + + *dma_addr = DMA_MAPPING_ERROR; + addr = vduse_domain_alloc_coherent(domain, size, + (dma_addr_t *)&iova, flag, attrs); + if (!addr) + return NULL; + + *dma_addr = (dma_addr_t)iova; + vduse_dev_update_iotlb(vdev, iova, iova + size - 1); + + return addr; +} + +static void vduse_dev_free_coherent(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_addr, + unsigned long attrs) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + unsigned long start = (unsigned long)dma_addr; + unsigned long last = start + size - 1; + + vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs); + vduse_dev_update_iotlb(vdev, start, last); +} + +static size_t vduse_dev_max_mapping_size(struct device *dev) +{ + struct vduse_dev *vdev = dev_to_vduse(dev); + struct vduse_iova_domain *domain = vdev->domain; + + return domain->bounce_size; +} + +static const struct dma_map_ops vduse_dev_dma_ops = { + .map_page = vduse_dev_map_page, + .unmap_page = vduse_dev_unmap_page, + .alloc = vduse_dev_alloc_coherent, + .free = vduse_dev_free_coherent, + .max_mapping_size = vduse_dev_max_mapping_size, +}; + +static unsigned int perm_to_file_flags(u8 perm) +{ + unsigned int flags = 0; + + switch (perm) { + case VDUSE_ACCESS_WO: + flags |= O_WRONLY; + break; + case VDUSE_ACCESS_RO: + flags |= O_RDONLY; + break; + case VDUSE_ACCESS_RW: + flags |= O_RDWR; + break; + default: + WARN(1, "invalidate vhost IOTLB permission\n"); + break; + } + + return flags; +} + +static int vduse_kickfd_setup(struct vduse_dev *dev, + struct vduse_vq_eventfd *eventfd) +{ + struct eventfd_ctx *ctx = NULL; + struct vduse_virtqueue *vq; + u32 index; + + if (eventfd->index >= dev->vq_num) + return -EINVAL; + + index = array_index_nospec(eventfd->index, dev->vq_num); + vq = &dev->vqs[index]; + if (eventfd->fd >= 0) { + ctx = eventfd_ctx_fdget(eventfd->fd); + if (IS_ERR(ctx)) + return PTR_ERR(ctx); + } else if (eventfd->fd != VDUSE_EVENTFD_DEASSIGN) + return 0; + + spin_lock(&vq->kick_lock); + if (vq->kickfd) + eventfd_ctx_put(vq->kickfd); + vq->kickfd = ctx; + spin_unlock(&vq->kick_lock); + + return 0; +} + +static void vduse_dev_irq_inject(struct work_struct *work) +{ + struct vduse_dev *dev = container_of(work, struct vduse_dev, inject); + + spin_lock_irq(&dev->irq_lock); + if (dev->config_cb.callback) + dev->config_cb.callback(dev->config_cb.private); + spin_unlock_irq(&dev->irq_lock); +} + +static void vduse_vq_irq_inject(struct work_struct *work) +{ + struct vduse_virtqueue *vq = container_of(work, + struct vduse_virtqueue, inject); + + spin_lock_irq(&vq->irq_lock); + if (vq->ready && vq->cb.callback) + vq->cb.callback(vq->cb.private); + spin_unlock_irq(&vq->irq_lock); +} + +static long vduse_dev_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + struct vduse_dev *dev = file->private_data; + void __user *argp = (void __user *)arg; + int ret; + + switch (cmd) { + case VDUSE_IOTLB_GET_FD: { + struct vduse_iotlb_entry entry; + struct vhost_iotlb_map *map; + struct vdpa_map_file *map_file; + struct vduse_iova_domain *domain = dev->domain; + struct file *f = NULL; + + ret = -EFAULT; + if (copy_from_user(&entry, argp, sizeof(entry))) + break; + + ret = -EINVAL; + if (entry.start > entry.last) + break; + + spin_lock(&domain->iotlb_lock); + map = vhost_iotlb_itree_first(domain->iotlb, + entry.start, entry.last); + if (map) { + map_file = (struct vdpa_map_file *)map->opaque; + f = get_file(map_file->file); + entry.offset = map_file->offset; + entry.start = map->start; + entry.last = map->last; + entry.perm = map->perm; + } + spin_unlock(&domain->iotlb_lock); + ret = -EINVAL; + if (!f) + break; + + ret = -EFAULT; + if (copy_to_user(argp, &entry, sizeof(entry))) { + fput(f); + break; + } + ret = receive_fd(f, perm_to_file_flags(entry.perm)); + fput(f); + break; + } + case VDUSE_VQ_SETUP_KICKFD: { + struct vduse_vq_eventfd eventfd; + + ret = -EFAULT; + if (copy_from_user(&eventfd, argp, sizeof(eventfd))) + break; + + ret = vduse_kickfd_setup(dev, &eventfd); + break; + } + case VDUSE_INJECT_VQ_IRQ: { + u32 vq_index; + + ret = -EFAULT; + if (copy_from_user(&vq_index, argp, sizeof(u32))) + break; + + ret = -EINVAL; + if (vq_index >= dev->vq_num) + break; + + ret = 0; + vq_index = array_index_nospec(vq_index, dev->vq_num); + queue_work(vduse_irq_wq, &dev->vqs[vq_index].inject); + break; + } + case VDUSE_INJECT_CONFIG_IRQ: + ret = 0; + queue_work(vduse_irq_wq, &dev->inject); + break; + default: + ret = -ENOIOCTLCMD; + break; + } + + return ret; +} + +static int vduse_dev_release(struct inode *inode, struct file *file) +{ + struct vduse_dev *dev = file->private_data; + struct vduse_dev_msg *msg; + int i; + + for (i = 0; i < dev->vq_num; i++) { + struct vduse_virtqueue *vq = &dev->vqs[i]; + + spin_lock(&vq->kick_lock); + if (vq->kickfd) + eventfd_ctx_put(vq->kickfd); + vq->kickfd = NULL; + spin_unlock(&vq->kick_lock); + } + + spin_lock(&dev->msg_lock); + /* Make sure the inflight messages can processed after reconncection */ + while ((msg = vduse_dequeue_msg(&dev->recv_list))) + vduse_enqueue_msg(&dev->send_list, msg); + spin_unlock(&dev->msg_lock); + + dev->connected = false; + + return 0; +} + +static int vduse_dev_open(struct inode *inode, struct file *file) +{ + struct vduse_dev *dev = container_of(inode->i_cdev, + struct vduse_dev, cdev); + int ret = -EBUSY; + + mutex_lock(&dev->lock); + if (dev->connected) + goto unlock; + + ret = 0; + dev->connected = true; + file->private_data = dev; +unlock: + mutex_unlock(&dev->lock); + + return ret; +} + +static const struct file_operations vduse_dev_fops = { + .owner = THIS_MODULE, + .open = vduse_dev_open, + .release = vduse_dev_release, + .read_iter = vduse_dev_read_iter, + .write_iter = vduse_dev_write_iter, + .poll = vduse_dev_poll, + .unlocked_ioctl = vduse_dev_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .llseek = noop_llseek, +}; + +static struct vduse_dev *vduse_dev_create(void) +{ + struct vduse_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL); + + if (!dev) + return NULL; + + mutex_init(&dev->lock); + spin_lock_init(&dev->msg_lock); + INIT_LIST_HEAD(&dev->send_list); + INIT_LIST_HEAD(&dev->recv_list); + atomic64_set(&dev->msg_unique, 0); + spin_lock_init(&dev->irq_lock); + + INIT_WORK(&dev->inject, vduse_dev_irq_inject); + init_waitqueue_head(&dev->waitq); + + return dev; +} + +static void vduse_dev_destroy(struct vduse_dev *dev) +{ + kfree(dev); +} + +static struct vduse_dev *vduse_find_dev(const char *name) +{ + struct vduse_dev *tmp, *dev = NULL; + + list_for_each_entry(tmp, &vduse_devs, list) { + if (!strcmp(tmp->name, name)) { + dev = tmp; + break; + } + } + return dev; +} + +static int vduse_destroy_dev(char *name) +{ + struct vduse_dev *dev = vduse_find_dev(name); + + if (!dev) + return -EINVAL; + + mutex_lock(&dev->lock); + if (dev->vdev || dev->connected) { + mutex_unlock(&dev->lock); + return -EBUSY; + } + dev->connected = true; + mutex_unlock(&dev->lock); + + list_del(&dev->list); + cdev_device_del(&dev->cdev, &dev->dev); + put_device(&dev->dev); + module_put(THIS_MODULE); + + return 0; +} + +static void vduse_release_dev(struct device *device) +{ + struct vduse_dev *dev = + container_of(device, struct vduse_dev, dev); + + ida_simple_remove(&vduse_ida, dev->minor); + kfree(dev->vqs); + vduse_domain_destroy(dev->domain); + kfree(dev->name); + vduse_dev_destroy(dev); +} + +static bool device_is_allowed(unsigned int device_id) +{ + int i; + + if (allow_unsafe_device_emulation) + return true; + + for (i = 0; i < ARRAY_SIZE(allowed_device_id); i++) { + if (allowed_device_id[i] == device_id) + return true; + } + return false; +} + +static int vduse_create_dev(struct vduse_dev_config *config, + unsigned long api_version) +{ + int i, ret = -ENOMEM; + struct vduse_dev *dev; + + if (config->bounce_size > max_bounce_size) + return -EINVAL; + + if (config->vq_align > PAGE_SIZE) + return -EINVAL; + + if (!device_is_allowed(config->device_id)) + return -EINVAL; + + if (vduse_find_dev(config->name)) + return -EEXIST; + + dev = vduse_dev_create(); + if (!dev) + return -ENOMEM; + + dev->api_version = api_version; + dev->device_id = config->device_id; + dev->vendor_id = config->vendor_id; + dev->name = kstrdup(config->name, GFP_KERNEL); + if (!dev->name) + goto err_str; + + dev->domain = vduse_domain_create(max_iova_size - 1, + config->bounce_size); + if (!dev->domain) + goto err_domain; + + dev->config_size = config->config_size; + dev->vq_align = config->vq_align; + dev->vq_size_max = config->vq_size_max; + dev->vq_num = config->vq_num; + dev->vqs = kcalloc(dev->vq_num, sizeof(*dev->vqs), GFP_KERNEL); + if (!dev->vqs) + goto err_vqs; + + for (i = 0; i < dev->vq_num; i++) { + dev->vqs[i].index = i; + INIT_WORK(&dev->vqs[i].inject, vduse_vq_irq_inject); + spin_lock_init(&dev->vqs[i].kick_lock); + spin_lock_init(&dev->vqs[i].irq_lock); + } + + ret = ida_simple_get(&vduse_ida, 0, VDUSE_DEV_MAX, GFP_KERNEL); + if (ret < 0) + goto err_ida; + + dev->minor = ret; + device_initialize(&dev->dev); + dev->dev.release = vduse_release_dev; + dev->dev.class = vduse_class; + dev->dev.devt = MKDEV(MAJOR(vduse_major), dev->minor); + ret = dev_set_name(&dev->dev, "%s", config->name); + if (ret) + goto err; + + cdev_init(&dev->cdev, &vduse_dev_fops); + dev->cdev.owner = THIS_MODULE; + + ret = cdev_device_add(&dev->cdev, &dev->dev); + if (ret) + goto err; + + list_add(&dev->list, &vduse_devs); + __module_get(THIS_MODULE); + + return 0; +err_ida: + kfree(dev->vqs); +err_vqs: + vduse_domain_destroy(dev->domain); +err_domain: + kfree(dev->name); +err_str: + vduse_dev_destroy(dev); + return ret; +err: + put_device(&dev->dev); + return ret; +} + +static long vduse_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + int ret; + void __user *argp = (void __user *)arg; + struct vduse_control *control = file->private_data; + + mutex_lock(&vduse_lock); + switch (cmd) { + case VDUSE_GET_API_VERSION: + ret = control->api_version; + break; + case VDUSE_SET_API_VERSION: + ret = -EINVAL; + if (arg > VDUSE_API_VERSION) + break; + + ret = 0; + control->api_version = arg; + break; + case VDUSE_CREATE_DEV: { + struct vduse_dev_config config; + + ret = -EFAULT; + if (copy_from_user(&config, argp, sizeof(config))) + break; + + ret = vduse_create_dev(&config, control->api_version); + break; + } + case VDUSE_DESTROY_DEV: { + char name[VDUSE_NAME_MAX]; + + ret = -EFAULT; + if (copy_from_user(name, argp, VDUSE_NAME_MAX)) + break; + + ret = vduse_destroy_dev(name); + break; + } + default: + ret = -EINVAL; + break; + } + mutex_unlock(&vduse_lock); + + return ret; +} + +static int vduse_release(struct inode *inode, struct file *file) +{ + struct vduse_control *control = file->private_data; + + kfree(control); + return 0; +} + +static int vduse_open(struct inode *inode, struct file *file) +{ + struct vduse_control *control; + + control = kmalloc(sizeof(struct vduse_control), GFP_KERNEL); + if (!control) + return -ENOMEM; + + control->api_version = VDUSE_API_VERSION; + file->private_data = control; + + return 0; +} + +static const struct file_operations vduse_fops = { + .owner = THIS_MODULE, + .open = vduse_open, + .release = vduse_release, + .unlocked_ioctl = vduse_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .llseek = noop_llseek, +}; + +static char *vduse_devnode(struct device *dev, umode_t *mode) +{ + return kasprintf(GFP_KERNEL, "vduse/%s", dev_name(dev)); +} + +static struct miscdevice vduse_misc = { + .fops = &vduse_fops, + .minor = MISC_DYNAMIC_MINOR, + .name = "vduse", + .nodename = "vduse/control", +}; + +static void vduse_mgmtdev_release(struct device *dev) +{ +} + +static struct device vduse_mgmtdev = { + .init_name = "vduse", + .release = vduse_mgmtdev_release, +}; + +static struct vdpa_mgmt_dev mgmt_dev; + +static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name) +{ + struct vduse_vdpa *vdev; + int ret; + + if (dev->vdev) + return -EEXIST; + + vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, &dev->dev, + &vduse_vdpa_config_ops, name, true); + if (!vdev) + return -ENOMEM; + + dev->vdev = vdev; + vdev->dev = dev; + vdev->vdpa.dev.dma_mask = &vdev->vdpa.dev.coherent_dma_mask; + ret = dma_set_mask_and_coherent(&vdev->vdpa.dev, DMA_BIT_MASK(64)); + if (ret) { + put_device(&vdev->vdpa.dev); + return ret; + } + set_dma_ops(&vdev->vdpa.dev, &vduse_dev_dma_ops); + vdev->vdpa.dma_dev = &vdev->vdpa.dev; + vdev->vdpa.mdev = &mgmt_dev; + + return 0; +} + +static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name) +{ + struct vduse_dev *dev; + int ret; + + mutex_lock(&vduse_lock); + dev = vduse_find_dev(name); + if (!dev) { + mutex_unlock(&vduse_lock); + return -EINVAL; + } + ret = vduse_dev_init_vdpa(dev, name); + mutex_unlock(&vduse_lock); + if (ret) + return ret; + + ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num); + if (ret) { + put_device(&dev->vdev->vdpa.dev); + return ret; + } + + return 0; +} + +static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev) +{ + _vdpa_unregister_device(dev); +} + +static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { + .dev_add = vdpa_dev_add, + .dev_del = vdpa_dev_del, +}; + +static struct virtio_device_id id_table[] = { + { VIRTIO_DEV_ANY_ID, VIRTIO_DEV_ANY_ID }, + { 0 }, +}; + +static struct vdpa_mgmt_dev mgmt_dev = { + .device = &vduse_mgmtdev, + .id_table = id_table, + .ops = &vdpa_dev_mgmtdev_ops, +}; + +static int vduse_mgmtdev_init(void) +{ + int ret; + + ret = device_register(&vduse_mgmtdev); + if (ret) + return ret; + + ret = vdpa_mgmtdev_register(&mgmt_dev); + if (ret) + goto err; + + return 0; +err: + device_unregister(&vduse_mgmtdev); + return ret; +} + +static void vduse_mgmtdev_exit(void) +{ + vdpa_mgmtdev_unregister(&mgmt_dev); + device_unregister(&vduse_mgmtdev); +} + +static int vduse_init(void) +{ + int ret; + + if (max_bounce_size >= max_iova_size) + return -EINVAL; + + ret = misc_register(&vduse_misc); + if (ret) + return ret; + + vduse_class = class_create(THIS_MODULE, "vduse"); + if (IS_ERR(vduse_class)) { + ret = PTR_ERR(vduse_class); + goto err_class; + } + vduse_class->devnode = vduse_devnode; + + ret = alloc_chrdev_region(&vduse_major, 0, VDUSE_DEV_MAX, "vduse"); + if (ret) + goto err_chardev; + + vduse_irq_wq = alloc_workqueue("vduse-irq", + WQ_HIGHPRI | WQ_SYSFS | WQ_UNBOUND, 0); + if (!vduse_irq_wq) + goto err_wq; + + ret = vduse_domain_init(); + if (ret) + goto err_domain; + + ret = vduse_mgmtdev_init(); + if (ret) + goto err_mgmtdev; + + return 0; +err_mgmtdev: + vduse_domain_exit(); +err_domain: + destroy_workqueue(vduse_irq_wq); +err_wq: + unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX); +err_chardev: + class_destroy(vduse_class); +err_class: + misc_deregister(&vduse_misc); + return ret; +} +module_init(vduse_init); + +static void vduse_exit(void) +{ + misc_deregister(&vduse_misc); + class_destroy(vduse_class); + unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX); + destroy_workqueue(vduse_irq_wq); + vduse_domain_exit(); + vduse_mgmtdev_exit(); +} +module_exit(vduse_exit); + +MODULE_VERSION(DRV_VERSION); +MODULE_LICENSE(DRV_LICENSE); +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC); diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h new file mode 100644 index 000000000000..2484b1f7d2ef --- /dev/null +++ b/include/uapi/linux/vduse.h @@ -0,0 +1,178 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_VDUSE_H_ +#define _UAPI_VDUSE_H_ + +#include + +#define VDUSE_API_VERSION 0 + +#define VDUSE_MAX_TRANSFER_LEN 256 +#define VDUSE_NAME_MAX 256 + +/* the control messages definition for read/write */ + +enum vduse_req_type { + /* Set the vring address of virtqueue. */ + VDUSE_SET_VQ_NUM, + /* Set the vring address of virtqueue. */ + VDUSE_SET_VQ_ADDR, + /* Set ready status of virtqueue */ + VDUSE_SET_VQ_READY, + /* Get ready status of virtqueue */ + VDUSE_GET_VQ_READY, + /* Set the state for virtqueue */ + VDUSE_SET_VQ_STATE, + /* Get the state for virtqueue */ + VDUSE_GET_VQ_STATE, + /* Set virtio features supported by the driver */ + VDUSE_SET_FEATURES, + /* Get virtio features supported by the device */ + VDUSE_GET_FEATURES, + /* Set the device status */ + VDUSE_SET_STATUS, + /* Get the device status */ + VDUSE_GET_STATUS, + /* Write to device specific configuration space */ + VDUSE_SET_CONFIG, + /* Read from device specific configuration space */ + VDUSE_GET_CONFIG, + /* Notify userspace to update the memory mapping in device IOTLB */ + VDUSE_UPDATE_IOTLB, +}; + +struct vduse_vq_num { + __u32 index; /* virtqueue index */ + __u32 num; /* the size of virtqueue */ +}; + +struct vduse_vq_addr { + __u32 index; /* virtqueue index */ + __u32 padding; /* padding */ + __u64 desc_addr; /* address of desc area */ + __u64 driver_addr; /* address of driver area */ + __u64 device_addr; /* address of device area */ +}; + +struct vduse_vq_ready { + __u32 index; /* virtqueue index */ + __u8 ready; /* ready status of virtqueue */ +}; + +struct vduse_vq_state { + __u32 index; /* virtqueue index */ + __u32 avail_idx; /* virtqueue state (last_avail_idx) */ +}; + +struct vduse_dev_config_data { + __u32 offset; /* offset from the beginning of config space */ + __u32 len; /* the length to read/write */ + __u8 data[VDUSE_MAX_TRANSFER_LEN]; /* data buffer used to read/write */ +}; + +struct vduse_iova_range { + __u64 start; /* start of the IOVA range */ + __u64 last; /* end of the IOVA range */ +}; + +struct vduse_features { + __u64 features; /* virtio features */ +}; + +struct vduse_status { + __u8 status; /* device status */ +}; + +struct vduse_dev_request { + __u32 type; /* request type */ + __u32 request_id; /* request id */ + __u32 reserved[2]; /* for future use */ + union { + struct vduse_vq_num vq_num; /* virtqueue num */ + struct vduse_vq_addr vq_addr; /* virtqueue address */ + struct vduse_vq_ready vq_ready; /* virtqueue ready status */ + struct vduse_vq_state vq_state; /* virtqueue state */ + struct vduse_dev_config_data config; /* virtio device config space */ + struct vduse_iova_range iova; /* iova range for updating */ + struct vduse_features f; /* virtio features */ + struct vduse_status s; /* device status */ + __u32 padding[128]; /* padding */ + }; +}; + +struct vduse_dev_response { + __u32 request_id; /* corresponding request id */ +#define VDUSE_REQUEST_OK 0x00 +#define VDUSE_REQUEST_FAILED 0x01 + __u32 result; /* the result of request */ + __u32 reserved[2]; /* for future use */ + union { + struct vduse_vq_ready vq_ready; /* virtqueue ready status */ + struct vduse_vq_state vq_state; /* virtqueue state */ + struct vduse_dev_config_data config; /* virtio device config space */ + struct vduse_features f; /* virtio features */ + struct vduse_status s; /* device status */ + __u32 padding[128]; /* padding */ + }; +}; + +/* ioctls */ + +struct vduse_dev_config { + char name[VDUSE_NAME_MAX]; /* vduse device name */ + __u32 vendor_id; /* virtio vendor id */ + __u32 device_id; /* virtio device id */ + __u64 bounce_size; /* bounce buffer size for iommu */ + __u16 vq_size_max; /* the max size of virtqueue */ + __u16 padding; /* padding */ + __u32 vq_num; /* the number of virtqueues */ + __u32 vq_align; /* the allocation alignment of virtqueue's metadata */ + __u32 config_size; /* the size of the configuration space */ + __u32 reserved[15]; /* for future use */ +}; + +struct vduse_iotlb_entry { + __u64 offset; /* the mmap offset on fd */ + __u64 start; /* start of the IOVA range */ + __u64 last; /* last of the IOVA range */ +#define VDUSE_ACCESS_RO 0x1 +#define VDUSE_ACCESS_WO 0x2 +#define VDUSE_ACCESS_RW 0x3 + __u8 perm; /* access permission of this range */ +}; + +struct vduse_vq_eventfd { + __u32 index; /* virtqueue index */ +#define VDUSE_EVENTFD_DEASSIGN -1 + int fd; /* eventfd, -1 means de-assigning the eventfd */ +}; + +#define VDUSE_BASE 0x81 + +/* Get the version of VDUSE API. This is used for future extension */ +#define VDUSE_GET_API_VERSION _IO(VDUSE_BASE, 0x00) + +/* Set the version of VDUSE API. */ +#define VDUSE_SET_API_VERSION _IO(VDUSE_BASE, 0x01) + +/* Create a vduse device which is represented by a char device (/dev/vduse/) */ +#define VDUSE_CREATE_DEV _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config) + +/* Destroy a vduse device. Make sure there are no references to the char device */ +#define VDUSE_DESTROY_DEV _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX]) + +/* + * Get a file descriptor for the first overlapped iova region, + * -EINVAL means the iova region doesn't exist. + */ +#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x04, struct vduse_iotlb_entry) + +/* Setup an eventfd to receive kick for virtqueue */ +#define VDUSE_VQ_SETUP_KICKFD _IOW(VDUSE_BASE, 0x05, struct vduse_vq_eventfd) + +/* Inject an interrupt for specific virtqueue */ +#define VDUSE_INJECT_VQ_IRQ _IOW(VDUSE_BASE, 0x06, __u32) + +/* Inject a config interrupt */ +#define VDUSE_INJECT_CONFIG_IRQ _IO(VDUSE_BASE, 0x07) + +#endif /* _UAPI_VDUSE_H_ */ From patchwork Mon May 17 09:55:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongji Xie X-Patchwork-Id: 12261447 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B9D2C433B4 for ; Mon, 17 May 2021 09:58:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2AF7461029 for ; Mon, 17 May 2021 09:58:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236480AbhEQJ7k (ORCPT ); Mon, 17 May 2021 05:59:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236424AbhEQJ7D (ORCPT ); Mon, 17 May 2021 05:59:03 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D194C0612EC for ; Mon, 17 May 2021 02:56:45 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id q15so4267771pgg.12 for ; Mon, 17 May 2021 02:56:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iU39BPoZqk2HqORX3UA9Ncb7R+Ai9Wci3A+87yTYnGc=; b=vJS0FLHo+jEjRSpl4KDvMAXIKV6coCiING2EQZA1u2Y7gKzBes8Am7G0tgOJgxQlrZ Bkdd4TMVPeYuCtXAnN7lHvgd+fGo5r7S18DNsqWRS/+zLqnQXdMn4+dWRtxgaHnmabi7 THQzQT3wQSsMuC7EE60tyCdc6m40mjKDwjpd/1WW7rO963r/zo6y3sBIX7e5E2Mp0D7e wV5rIuUrhiuPXFHAatRod8B2KEBFLN32CdPEoRTra2ZEhpJqs8EuDriMsfg43ibR1sFY E1p+zen4658VBftZ+03vFxrfs5JNVUwGNM8ZRnujdVaAFuKkf2tNZ8xNjzcx1TST/zJB o6mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iU39BPoZqk2HqORX3UA9Ncb7R+Ai9Wci3A+87yTYnGc=; b=Z5ECwA2CISutF2JfTrdzhfnyYEqn+Cw3BZXZEd7NkTkMjMnCg8/eatssDohDlV9X3N HaPOr5WA1QMz9p3w/WHhfFW6krOHkR/MU0+PYbkn9571j5P13nm+awx3jDaAeNJxjfGr raF4mKW0w4i3BsBX8oVj0ifnQLDblVl+pQWRgYR0K1ncElpaiUQLigG2g20+96JcWAyx LkzkdndBV4F81qnBleWYigW7Cl/okg//ClgPO+fQOiCMW5RUJ1Ph64+O2YAHMA6RWI47 z7e9VJlwThuGcxiQXMFf/UurFWRcuGlKCgJa5jUwDH4iLLjj7P8YZW2RYXuDu9SsFVdX HwxA== X-Gm-Message-State: AOAM531CTozwI7lCmCN3PueVvAhctHVYKaBq+vF6sZ+ISu88UvjtVuIY owGnwoIqe3MlgUuSQ7IROusu X-Google-Smtp-Source: ABdhPJwaUmuqxTT6QOW1/dIV2zi3/klVmP6fV6W/ZGKjYhDf6qJpzO5wUPp9BNC/Sl2so2TDfCiGJg== X-Received: by 2002:a63:6f8e:: with SMTP id k136mr61696699pgc.326.1621245405101; Mon, 17 May 2021 02:56:45 -0700 (PDT) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id v14sm10441643pgl.86.2021.05.17.02.56.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 02:56:44 -0700 (PDT) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, parav@nvidia.com, hch@infradead.org, christian.brauner@canonical.com, rdunlap@infradead.org, willy@infradead.org, viro@zeniv.linux.org.uk, axboe@kernel.dk, bcrl@kvack.org, corbet@lwn.net, mika.penttila@nextfour.com, dan.carpenter@oracle.com, joro@8bytes.org Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: [PATCH v7 12/12] Documentation: Add documentation for VDUSE Date: Mon, 17 May 2021 17:55:13 +0800 Message-Id: <20210517095513.850-13-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210517095513.850-1-xieyongji@bytedance.com> References: <20210517095513.850-1-xieyongji@bytedance.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org VDUSE (vDPA Device in Userspace) is a framework to support implementing software-emulated vDPA devices in userspace. This document is intended to clarify the VDUSE design and usage. Signed-off-by: Xie Yongji --- Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/vduse.rst | 243 ++++++++++++++++++++++++++++++++++ 2 files changed, 244 insertions(+) create mode 100644 Documentation/userspace-api/vduse.rst diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index d29b020e5622..2dc25dd9f2aa 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -25,6 +25,7 @@ place where this information is gathered. iommu media/index sysfs-platform_profile + vduse .. only:: subproject and html diff --git a/Documentation/userspace-api/vduse.rst b/Documentation/userspace-api/vduse.rst new file mode 100644 index 000000000000..a804be347545 --- /dev/null +++ b/Documentation/userspace-api/vduse.rst @@ -0,0 +1,243 @@ +================================== +VDUSE - "vDPA Device in Userspace" +================================== + +vDPA (virtio data path acceleration) device is a device that uses a +datapath which complies with the virtio specifications with vendor +specific control path. vDPA devices can be both physically located on +the hardware or emulated by software. VDUSE is a framework that makes it +possible to implement software-emulated vDPA devices in userspace. + +In general, the userspace process that emulates the device is able to +run unprivileged. And to reduce security risks, we only support emulating +a few vDPA devices by default, including: virtio-net device, virtio-blk +device, virtio-scsi device and virtio-fs device. Only when a sysadmin trusts +the userspace process enough, it can relax the limitation with a +'allow_unsafe_device_emulation' module parameter. + +How VDUSE works +=============== + +Start/Stop VDUSE devices +------------------------ + +VDUSE devices are started as follows: + +1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on + /dev/vduse/control. + +2. Begin processing VDUSE messages from /dev/vduse/$NAME. The first + messages will arrive while attaching the VDUSE instance to vDPA. + +3. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE + instance to vDPA. + +VDUSE devices are stopped as follows: + +1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE + instance to vDPA. + +2. Close the file descriptor referring to /dev/vduse/$NAME + +3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on + /dev/vduse/control + +The netlink messages metioned above can be sent via vdpa tool in iproute2 +or use the below sample codes: + +.. code-block:: c + + static int netlink_add_vduse(const char *name, enum vdpa_command cmd) + { + struct nl_sock *nlsock; + struct nl_msg *msg; + int famid; + + nlsock = nl_socket_alloc(); + if (!nlsock) + return -ENOMEM; + + if (genl_connect(nlsock)) + goto free_sock; + + famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME); + if (famid < 0) + goto close_sock; + + msg = nlmsg_alloc(); + if (!msg) + goto close_sock; + + if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, cmd, 0)) + goto nla_put_failure; + + NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name); + if (cmd == VDPA_CMD_DEV_NEW) + NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse"); + + if (nl_send_sync(nlsock, msg)) + goto close_sock; + + nl_close(nlsock); + nl_socket_free(nlsock); + + return 0; + nla_put_failure: + nlmsg_free(msg); + close_sock: + nl_close(nlsock); + free_sock: + nl_socket_free(nlsock); + return -1; + } + +Emulate VDUSE devices +--------------------- + +To emulate a VDUSE device, we always need to implement both control path +and data path for it. + +To implement control path, a message-based communication protocol and some +types of control messages are introduced in the VDUSE framework: + +- VDUSE_SET_VQ_ADDR: Set the vring address of virtqueue. + +- VDUSE_SET_VQ_NUM: Set the size of virtqueue + +- VDUSE_SET_VQ_READY: Set ready status of virtqueue + +- VDUSE_GET_VQ_READY: Get ready status of virtqueue + +- VDUSE_SET_VQ_STATE: Set the state for virtqueue + +- VDUSE_GET_VQ_STATE: Get the state for virtqueue + +- VDUSE_SET_FEATURES: Set virtio features supported by the driver + +- VDUSE_GET_FEATURES: Get virtio features supported by the device + +- VDUSE_SET_STATUS: Set the device status + +- VDUSE_GET_STATUS: Get the device status + +- VDUSE_SET_CONFIG: Write to device specific configuration space + +- VDUSE_GET_CONFIG: Read from device specific configuration space + +- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping in device IOTLB + +Those control messages are mostly based on the vdpa_config_ops in +include/linux/vdpa.h which defines a unified interface to control +different types of vdpa device. Userspace needs to read()/write() +on /dev/vduse/$NAME to receive/reply those control messages +from/to VDUSE kernel module as follows: + +.. code-block:: c + + static int vduse_message_handler(int dev_fd) + { + int len; + struct vduse_dev_request req; + struct vduse_dev_response resp; + + len = read(dev_fd, &req, sizeof(req)); + if (len != sizeof(req)) + return -1; + + resp.request_id = req.request_id; + + switch (req.type) { + + /* handle different types of message */ + + } + + len = write(dev_fd, &resp, sizeof(resp)); + if (len != sizeof(resp)) + return -1; + + return 0; + } + +In the data path, vDPA device's iova regions will be mapped into userspace +with the help of VDUSE_IOTLB_GET_FD ioctl on /dev/vduse/$NAME: + +- VDUSE_IOTLB_GET_FD: get the file descriptor to the first overlapped iova region. + Userspace can access this iova region by passing fd and corresponding size, offset, + perm to mmap(). For example: + +.. code-block:: c + + static int perm_to_prot(uint8_t perm) + { + int prot = 0; + + switch (perm) { + case VDUSE_ACCESS_WO: + prot |= PROT_WRITE; + break; + case VDUSE_ACCESS_RO: + prot |= PROT_READ; + break; + case VDUSE_ACCESS_RW: + prot |= PROT_READ | PROT_WRITE; + break; + } + + return prot; + } + + static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len) + { + int fd; + void *addr; + size_t size; + struct vduse_iotlb_entry entry; + + entry.start = iova; + entry.last = iova + 1; + fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, &entry); + if (fd < 0) + return NULL; + + size = entry.last - entry.start + 1; + *len = entry.last - iova + 1; + addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED, + fd, entry.offset); + close(fd); + if (addr == MAP_FAILED) + return NULL; + + /* do something to cache this iova region */ + + return addr + iova - entry.start; + } + +Besides, the following ioctls on /dev/vduse/$NAME are provided to support +interrupt injection and setting up eventfd for virtqueue kicks: + +- VDUSE_VQ_SETUP_KICKFD: set the kickfd for virtqueue, this eventfd is used + by VDUSE kernel module to notify userspace to consume the vring. + +- VDUSE_INJECT_VQ_IRQ: inject an interrupt for specific virtqueue + +- VDUSE_INJECT_CONFIG_IRQ: inject a config interrupt + +MMU-based IOMMU Driver +====================== + +VDUSE framework implements an MMU-based on-chip IOMMU driver to support +mapping the kernel DMA buffer into the userspace iova region dynamically. +This is mainly designed for virtio-vdpa case (kernel virtio drivers). + +The basic idea behind this driver is treating MMU (VA->PA) as IOMMU (IOVA->PA). +The driver will set up MMU mapping instead of IOMMU mapping for the DMA transfer +so that the userspace process is able to use its virtual address to access +the DMA buffer in kernel. + +And to avoid security issue, a bounce-buffering mechanism is introduced to +prevent userspace accessing the original buffer directly which may contain other +kernel data. During the mapping, unmapping, the driver will copy the data from +the original buffer to the bounce buffer and back, depending on the direction of +the transfer. And the bounce-buffer addresses will be mapped into the user address +space instead of the original one.