From patchwork Fri Nov 19 13:47:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12628877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D977C433F5 for ; Fri, 19 Nov 2021 13:48:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D222C61131 for ; Fri, 19 Nov 2021 13:48:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org D222C61131 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id ECA006B0075; Fri, 19 Nov 2021 08:48:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E52E66B0078; Fri, 19 Nov 2021 08:48:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCC8B6B007B; Fri, 19 Nov 2021 08:48:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0097.hostedemail.com [216.40.44.97]) by kanga.kvack.org (Postfix) with ESMTP id BA79B6B0075 for ; Fri, 19 Nov 2021 08:48:47 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 77F3D181C9675 for ; Fri, 19 Nov 2021 13:48:37 +0000 (UTC) X-FDA: 78825809874.13.143D316 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf17.hostedemail.com (Postfix) with ESMTP id 82CDAF0001C3 for ; Fri, 19 Nov 2021 13:48:36 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10172"; a="233134145" X-IronPort-AV: E=Sophos;i="5.87,247,1631602800"; d="scan'208";a="233134145" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2021 05:48:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,247,1631602800"; d="scan'208";a="507904726" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga008.jf.intel.com with ESMTP; 19 Nov 2021 05:48:26 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC v2 PATCH 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory Date: Fri, 19 Nov 2021 21:47:26 +0800 Message-Id: <20211119134739.20218-1-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 X-Rspamd-Queue-Id: 82CDAF0001C3 X-Stat-Signature: 8d1p5x17q5obgo7d9y79jwh41kw7uwkc Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf17.hostedemail.com: domain of chao.p.peng@linux.intel.com has no SPF policy when checking 192.55.52.120) smtp.mailfrom=chao.p.peng@linux.intel.com X-Rspamd-Server: rspam02 X-HE-Tag: 1637329716-266565 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This RFC series try to implement the fd-based KVM guest private memory proposal described at [1] and an improved 'New Proposal' described at [2]. In general this patch series introduce fd-based memslot which provide guest memory through fd[offset,size] instead of hva/size. The fd then can be created from a supported memory filesystem like tmpfs/hugetlbfs, etc which we refer as memory backing store. KVM and backing store exchange some callbacks when such memslot gets created. At runtime KVM will call into callbacks provided by backing store to get the pfn with the fd+offset. Backing store will also call into KVM callbacks when userspace fallocate/punch hole on fd to notify KVM to map/unmap second MMU page tables. Comparing to existing hva-based memslot, this new type of memslot allow guest memory unmapped from host userspace like QEMU and even the kernel itself, therefore reduce attack surface and bring some other benefits. Based on this fd-based memslot, we can build guest private memory that is going to be used in confidential computing environments such as Intel TDX and AMD SEV. When supported, the backing store can provide more enforcement on the fd and KVM can use a single memslot to hold both private and shared part of the guest memory. For more detailed description please refer to [2]. Because this design introducing some callbacks between memory backing store and KVM, and for private memory KVM relies on backing store to do additonal enforcement and to tell if a address is private or shared, I would like KVM/mm/fs people can have a look at this part. [1] https://lkml.kernel.org/kvm/51a6f74f-6c05-74b9-3fd7-b7cd900fb8cc@redhat.com/ [2] https://lkml.kernel.org/linux-fsdevel/20211111141352.26311-1-chao.p.peng@linux.intel.com/ Thanks, Chao --- Chao Peng (12): KVM: Add KVM_EXIT_MEMORY_ERROR exit KVM: Extend kvm_userspace_memory_region to support fd based memslot KVM: Add fd-based memslot data structure and utils KVM: Implement fd-based memory using new memfd interfaces KVM: Register/unregister memfd backed memslot KVM: Handle page fault for fd based memslot KVM: Rename hva memory invalidation code to cover fd-based offset KVM: Introduce kvm_memfd_invalidate_range KVM: Match inode for invalidation of fd-based slot KVM: Add kvm_map_gfn_range KVM: Introduce kvm_memfd_fallocate_range KVM: Enable memfd based page invalidation/fallocate Kirill A. Shutemov (1): mm/shmem: Introduce F_SEAL_GUEST arch/arm64/kvm/mmu.c | 14 +-- arch/mips/kvm/mips.c | 14 +-- arch/powerpc/include/asm/kvm_ppc.h | 28 ++--- arch/powerpc/kvm/book3s.c | 14 +-- arch/powerpc/kvm/book3s_hv.c | 14 +-- arch/powerpc/kvm/book3s_pr.c | 14 +-- arch/powerpc/kvm/booke.c | 14 +-- arch/powerpc/kvm/powerpc.c | 14 +-- arch/riscv/kvm/mmu.c | 14 +-- arch/s390/kvm/kvm-s390.c | 14 +-- arch/x86/include/asm/kvm_host.h | 6 +- arch/x86/kvm/Makefile | 3 +- arch/x86/kvm/mmu/mmu.c | 122 ++++++++++++++++++++- arch/x86/kvm/vmx/main.c | 6 +- arch/x86/kvm/vmx/tdx.c | 6 +- arch/x86/kvm/vmx/tdx_stubs.c | 6 +- arch/x86/kvm/x86.c | 16 +-- include/linux/kvm_host.h | 58 ++++++++-- include/linux/memfd.h | 24 +++++ include/linux/shmem_fs.h | 9 ++ include/uapi/linux/fcntl.h | 1 + include/uapi/linux/kvm.h | 27 +++++ mm/memfd.c | 33 +++++- mm/shmem.c | 123 ++++++++++++++++++++- virt/kvm/kvm_main.c | 165 +++++++++++++++++++++++------ virt/kvm/memfd.c | 123 +++++++++++++++++++++ 26 files changed, 733 insertions(+), 149 deletions(-) create mode 100644 virt/kvm/memfd.c