From patchwork Fri Nov 19 13:47:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12628887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EC28C433EF for ; Fri, 19 Nov 2021 13:51:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2B36B61247 for ; Fri, 19 Nov 2021 13:51:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2B36B61247 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 136D36B0081; Fri, 19 Nov 2021 08:49:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 096D26B0082; Fri, 19 Nov 2021 08:49:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E04ED6B0083; Fri, 19 Nov 2021 08:49:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id C69C76B0081 for ; Fri, 19 Nov 2021 08:49:28 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 82B6E18028214 for ; Fri, 19 Nov 2021 13:49:18 +0000 (UTC) X-FDA: 78825811596.11.05009A2 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf10.hostedemail.com (Postfix) with ESMTP id B31B760019B3 for ; Fri, 19 Nov 2021 13:49:16 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10172"; a="234650770" X-IronPort-AV: E=Sophos;i="5.87,247,1631602800"; d="scan'208";a="234650770" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2021 05:49:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,247,1631602800"; d="scan'208";a="507904918" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 19 Nov 2021 05:49:08 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC v2 PATCH 05/13] KVM: Implement fd-based memory using new memfd interfaces Date: Fri, 19 Nov 2021 21:47:31 +0800 Message-Id: <20211119134739.20218-6-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211119134739.20218-1-chao.p.peng@linux.intel.com> References: <20211119134739.20218-1-chao.p.peng@linux.intel.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B31B760019B3 X-Stat-Signature: km8a8bw63mzgc5h3hmbcfdfie6wmisk7 Authentication-Results: imf10.hostedemail.com; dkim=none; spf=none (imf10.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=chao.p.peng@linux.intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-HE-Tag: 1637329756-181584 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch pairs a fd-based memslot to a memory backing store. Two sides handshake to exchange callbacks that will be called later. KVM->memfd: - get_pfn: get or allocate(when alloc is true) page at specified offset in the fd, the page will be locked - put_pfn: put and unlock the pfn memfd->KVM: - invalidate_page_range: called when userspace punch hole on the fd, KVM should unmap related pages in the second MMU - fallocate: called when userspace fallocate space on the fd, KVM can map related pages in the second MMU Currently tmpfs behind memfd interface is supported. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/Makefile | 3 +- include/linux/kvm_host.h | 6 +++ virt/kvm/memfd.c | 101 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 109 insertions(+), 1 deletion(-) create mode 100644 virt/kvm/memfd.c diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index f919df73e5e3..5d7f289b1ca0 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -11,7 +11,8 @@ KVM := ../../../virt/kvm kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o \ - $(KVM)/dirty_ring.o $(KVM)/binary_stats.o + $(KVM)/dirty_ring.o $(KVM)/binary_stats.o \ + $(KVM)/memfd.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1d4ac0c9b63b..e8646103356b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -769,6 +769,12 @@ static inline void kvm_irqfd_exit(void) { } #endif + +int kvm_memfd_register(struct kvm *kvm, + const struct kvm_userspace_memory_region_ext *mem, + struct kvm_memory_slot *slot); +void kvm_memfd_unregister(struct kvm *kvm, struct kvm_memory_slot *slot); + int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, struct module *module); void kvm_exit(void); diff --git a/virt/kvm/memfd.c b/virt/kvm/memfd.c new file mode 100644 index 000000000000..bd930dcb455f --- /dev/null +++ b/virt/kvm/memfd.c @@ -0,0 +1,101 @@ + +// SPDX-License-Identifier: GPL-2.0-only +/* + * memfd.c: routines for fd based guest memory backing store + * Copyright (c) 2021, Intel Corporation. + * + * Author: + * Chao Peng + */ + +#include +#include +const static struct guest_mem_ops *memfd_ops; + +static void memfd_invalidate_page_range(struct inode *inode, void *owner, + pgoff_t start, pgoff_t end) +{ + //!!!We can get here after the owner no longer exists +} + +static void memfd_fallocate(struct inode *inode, void *owner, + pgoff_t start, pgoff_t end) +{ + //!!!We can get here after the owner no longer exists +} + +static const struct guest_ops memfd_notifier = { + .invalidate_page_range = memfd_invalidate_page_range, + .fallocate = memfd_fallocate, +}; + +static kvm_pfn_t kvm_memfd_get_pfn(struct kvm_memory_slot *slot, + struct file *file, gfn_t gfn, + bool alloc, int *order) +{ + pgoff_t index = gfn - slot->base_gfn + + (slot->userspace_addr >> PAGE_SHIFT); + + return memfd_ops->get_lock_pfn(file->f_inode, index, alloc, order); +} + +static void kvm_memfd_put_pfn(kvm_pfn_t pfn) +{ + memfd_ops->put_unlock_pfn(pfn); +} + +static struct kvm_memfd_ops kvm_memfd_ops = { + .get_pfn = kvm_memfd_get_pfn, + .put_pfn = kvm_memfd_put_pfn, +}; + +int kvm_memfd_register(struct kvm *kvm, + const struct kvm_userspace_memory_region_ext *mem, + struct kvm_memory_slot *slot) +{ + int ret; + struct fd fd = fdget(mem->fd); + + if (!fd.file) + return -EINVAL; + + ret = memfd_register_guest(fd.file->f_inode, kvm, + &memfd_notifier, &memfd_ops); + if (ret) + return ret; + slot->file = fd.file; + + if (mem->private_fd >= 0) { + fd = fdget(mem->private_fd); + if (!fd.file) { + ret = -EINVAL; + goto err; + } + + ret = memfd_register_guest(fd.file->f_inode, kvm, + &memfd_notifier, &memfd_ops); + if (ret) + goto err; + slot->priv_file = fd.file; + } + + slot->memfd_ops = &kvm_memfd_ops; + return 0; +err: + kvm_memfd_unregister(kvm, slot); + return ret; +} + +void kvm_memfd_unregister(struct kvm *kvm, struct kvm_memory_slot *slot) +{ + if (slot->file) { + fput(slot->file); + slot->file = NULL; + } + + if (slot->priv_file) { + fput(slot->priv_file); + slot->priv_file = NULL; + } + slot->memfd_ops = NULL; +}