From patchwork Thu Nov 11 14:13:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BA2DC433FE for ; Thu, 11 Nov 2021 14:15:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 06BA86115A for ; Thu, 11 Nov 2021 14:15:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233685AbhKKORx (ORCPT ); Thu, 11 Nov 2021 09:17:53 -0500 Received: from mga01.intel.com ([192.55.52.88]:20919 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230177AbhKKORw (ORCPT ); Thu, 11 Nov 2021 09:17:52 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="256621557" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="256621557" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:15:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555301" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:14:51 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 1/6] mm: Add F_SEAL_GUEST to shmem/memfd Date: Thu, 11 Nov 2021 22:13:40 +0800 Message-Id: <20211111141352.26311-2-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The new seal is only allowed if there's no pre-existing pages in the fd and there's no existing mapping of the file. After the seal is set, no read/write/mmap from userspace is allowed. Signed-off-by: Kirill A. Shutemov Signed-off-by: Yu Zhang Signed-off-by: Chao Peng Signed-off-by: Kirill A. Shutemov --- include/linux/memfd.h | 22 +++++++ include/linux/shmem_fs.h | 9 +++ include/uapi/linux/fcntl.h | 1 + mm/memfd.c | 34 +++++++++- mm/shmem.c | 127 ++++++++++++++++++++++++++++++++++++- 5 files changed, 189 insertions(+), 4 deletions(-) diff --git a/include/linux/memfd.h b/include/linux/memfd.h index 4f1600413f91..ea213f5e3f95 100644 --- a/include/linux/memfd.h +++ b/include/linux/memfd.h @@ -4,13 +4,35 @@ #include +struct guest_ops { + void (*invalidate_page_range)(struct inode *inode, void *owner, + pgoff_t start, pgoff_t end); +}; + +struct guest_mem_ops { + unsigned long (*get_lock_pfn)(struct inode *inode, pgoff_t offset, + int *page_level); + void (*put_unlock_pfn)(unsigned long pfn); + +}; + #ifdef CONFIG_MEMFD_CREATE extern long memfd_fcntl(struct file *file, unsigned int cmd, unsigned long arg); + +extern inline int memfd_register_guest(struct inode *inode, void *owner, + const struct guest_ops *guest_ops, + const struct guest_mem_ops **guest_mem_ops); #else static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned long a) { return -EINVAL; } +static inline int memfd_register_guest(struct inode *inode, void *owner, + const struct guest_ops *guest_ops, + const struct guest_mem_ops **guest_mem_ops) +{ + return -EINVAL; +} #endif #endif /* __LINUX_MEMFD_H */ diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index d82b6f396588..1b4c032680d5 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -12,6 +12,9 @@ /* inode in-kernel data */ +struct guest_ops; +struct guest_mem_ops; + struct shmem_inode_info { spinlock_t lock; unsigned int seals; /* shmem seals */ @@ -24,6 +27,8 @@ struct shmem_inode_info { struct simple_xattrs xattrs; /* list of xattrs */ atomic_t stop_eviction; /* hold when working on inode */ struct inode vfs_inode; + void *guest_owner; + const struct guest_ops *guest_ops; }; struct shmem_sb_info { @@ -90,6 +95,10 @@ extern unsigned long shmem_swap_usage(struct vm_area_struct *vma); extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, pgoff_t start, pgoff_t end); +extern int shmem_register_guest(struct inode *inode, void *owner, + const struct guest_ops *guest_ops, + const struct guest_mem_ops **guest_mem_ops); + /* Flag allocation requirements to shmem_getpage */ enum sgp_type { SGP_READ, /* don't exceed i_size, don't allocate page */ diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 2f86b2ad6d7e..c79bc8572721 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -43,6 +43,7 @@ #define F_SEAL_GROW 0x0004 /* prevent file from growing */ #define F_SEAL_WRITE 0x0008 /* prevent writes */ #define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */ +#define F_SEAL_GUEST 0x0020 /* (1U << 31) is reserved for signed error codes */ /* diff --git a/mm/memfd.c b/mm/memfd.c index 2647c898990c..5a34173f55f4 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -130,11 +130,26 @@ static unsigned int *memfd_file_seals_ptr(struct file *file) return NULL; } +int memfd_register_guest(struct inode *inode, void *owner, + const struct guest_ops *guest_ops, + const struct guest_mem_ops **guest_mem_ops) +{ + if (shmem_mapping(inode->i_mapping)) { + return shmem_register_guest(inode, owner, + guest_ops, guest_mem_ops); + } + + return -EINVAL; +} + +EXPORT_SYMBOL_GPL(memfd_register_guest); + #define F_ALL_SEALS (F_SEAL_SEAL | \ F_SEAL_SHRINK | \ F_SEAL_GROW | \ F_SEAL_WRITE | \ - F_SEAL_FUTURE_WRITE) + F_SEAL_FUTURE_WRITE | \ + F_SEAL_GUEST) static int memfd_add_seals(struct file *file, unsigned int seals) { @@ -203,10 +218,27 @@ static int memfd_add_seals(struct file *file, unsigned int seals) } } + if (seals & F_SEAL_GUEST) { + i_mmap_lock_read(inode->i_mapping); + + if (!RB_EMPTY_ROOT(&inode->i_mapping->i_mmap.rb_root)) { + error = -EBUSY; + goto unlock; + } + + if (i_size_read(inode)) { + error = -EBUSY; + goto unlock; + } + } + *file_seals |= seals; error = 0; unlock: + if (seals & F_SEAL_GUEST) + i_mmap_unlock_read(inode->i_mapping); + inode_unlock(inode); return error; } diff --git a/mm/shmem.c b/mm/shmem.c index b2db4ed0fbc7..978c841c42c4 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -80,6 +80,7 @@ static struct vfsmount *shm_mnt; #include #include #include +#include #include @@ -883,6 +884,21 @@ static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end) return split_huge_page(page) >= 0; } +static void guest_invalidate_page(struct inode *inode, + struct page *page, pgoff_t start, pgoff_t end) +{ + struct shmem_inode_info *info = SHMEM_I(inode); + + if (!info->guest_ops || !info->guest_ops->invalidate_page_range) + return; + + start = max(start, page->index); + end = min(end, page->index + HPAGE_PMD_NR) - 1; + + info->guest_ops->invalidate_page_range(inode, info->guest_owner, + start, end); +} + /* * Remove range of pages and swap entries from page cache, and free them. * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate. @@ -923,6 +939,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, } index += thp_nr_pages(page) - 1; + guest_invalidate_page(inode, page, start, end); + if (!unfalloc || !PageUptodate(page)) truncate_inode_page(mapping, page); unlock_page(page); @@ -999,6 +1017,9 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend, index--; break; } + + guest_invalidate_page(inode, page, start, end); + VM_BUG_ON_PAGE(PageWriteback(page), page); if (shmem_punch_compound(page, start, end)) truncate_inode_page(mapping, page); @@ -1074,6 +1095,9 @@ static int shmem_setattr(struct user_namespace *mnt_userns, (newsize > oldsize && (info->seals & F_SEAL_GROW))) return -EPERM; + if ((info->seals & F_SEAL_GUEST) && (newsize & ~PAGE_MASK)) + return -EINVAL; + if (newsize != oldsize) { error = shmem_reacct_size(SHMEM_I(inode)->flags, oldsize, newsize); @@ -1348,6 +1372,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) goto redirty; if (!total_swap_pages) goto redirty; + if (info->seals & F_SEAL_GUEST) + goto redirty; /* * Our capabilities prevent regular writeback or sync from ever calling @@ -2278,6 +2304,9 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma) vma->vm_flags &= ~(VM_MAYWRITE); } + if (info->seals & F_SEAL_GUEST) + return -EPERM; + /* arm64 - allow memory tagging on RAM-based files */ vma->vm_flags |= VM_MTE_ALLOWED; @@ -2519,12 +2548,14 @@ shmem_write_begin(struct file *file, struct address_space *mapping, pgoff_t index = pos >> PAGE_SHIFT; /* i_mutex is held by caller */ - if (unlikely(info->seals & (F_SEAL_GROW | - F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) { + if (unlikely(info->seals & (F_SEAL_GROW | F_SEAL_WRITE | + F_SEAL_FUTURE_WRITE | F_SEAL_GUEST))) { if (info->seals & (F_SEAL_WRITE | F_SEAL_FUTURE_WRITE)) return -EPERM; if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size) return -EPERM; + if (info->seals & F_SEAL_GUEST) + return -EPERM; } return shmem_getpage(inode, index, pagep, SGP_WRITE); @@ -2598,6 +2629,20 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to) end_index = i_size >> PAGE_SHIFT; if (index > end_index) break; + + /* + * inode_lock protects setting up seals as well as write to + * i_size. Setting F_SEAL_GUEST only allowed with i_size == 0. + * + * Check F_SEAL_GUEST after i_size. It effectively serialize + * read vs. setting F_SEAL_GUEST without taking inode_lock in + * read path. + */ + if (SHMEM_I(inode)->seals & F_SEAL_GUEST) { + error = -EPERM; + break; + } + if (index == end_index) { nr = i_size & ~PAGE_MASK; if (nr <= offset) @@ -2723,6 +2768,12 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, goto out; } + if ((info->seals & F_SEAL_GUEST) && + (offset & ~PAGE_MASK || len & ~PAGE_MASK)) { + error = -EINVAL; + goto out; + } + shmem_falloc.waitq = &shmem_falloc_waitq; shmem_falloc.start = (u64)unmap_start >> PAGE_SHIFT; shmem_falloc.next = (unmap_end + 1) >> PAGE_SHIFT; @@ -3806,6 +3857,20 @@ static void shmem_destroy_inodecache(void) kmem_cache_destroy(shmem_inode_cachep); } +#ifdef CONFIG_MIGRATION +int shmem_migrate_page(struct address_space *mapping, + struct page *newpage, struct page *page, + enum migrate_mode mode) +{ + struct inode *inode = mapping->host; + struct shmem_inode_info *info = SHMEM_I(inode); + + if (info->seals & F_SEAL_GUEST) + return -ENOTSUPP; + return migrate_page(mapping, newpage, page, mode); +} +#endif + const struct address_space_operations shmem_aops = { .writepage = shmem_writepage, .set_page_dirty = __set_page_dirty_no_writeback, @@ -3814,12 +3879,68 @@ const struct address_space_operations shmem_aops = { .write_end = shmem_write_end, #endif #ifdef CONFIG_MIGRATION - .migratepage = migrate_page, + .migratepage = shmem_migrate_page, #endif .error_remove_page = generic_error_remove_page, }; EXPORT_SYMBOL(shmem_aops); +static unsigned long shmem_get_lock_pfn(struct inode *inode, pgoff_t offset, + int *page_level) +{ + struct page *page; + int ret; + + ret = shmem_getpage(inode, offset, &page, SGP_WRITE); + if (ret) + return ret; + + if (is_transparent_hugepage(page)) + *page_level = PG_LEVEL_2M; + else + *page_level = PG_LEVEL_4K; + + return page_to_pfn(page); +} + +static void shmem_put_unlock_pfn(unsigned long pfn) +{ + struct page *page = pfn_to_page(pfn); + + VM_BUG_ON_PAGE(!PageLocked(page), page); + + set_page_dirty(page); + unlock_page(page); + put_page(page); +} + +static const struct guest_mem_ops shmem_guest_ops = { + .get_lock_pfn = shmem_get_lock_pfn, + .put_unlock_pfn = shmem_put_unlock_pfn, +}; + +int shmem_register_guest(struct inode *inode, void *owner, + const struct guest_ops *guest_ops, + const struct guest_mem_ops **guest_mem_ops) +{ + struct shmem_inode_info *info = SHMEM_I(inode); + + if (!owner) + return -EINVAL; + + if (info->guest_owner) { + if (info->guest_owner == owner) + return 0; + else + return -EPERM; + } + + info->guest_owner = owner; + info->guest_ops = guest_ops; + *guest_mem_ops = &shmem_guest_ops; + return 0; +} + static const struct file_operations shmem_file_operations = { .mmap = shmem_mmap, .get_unmapped_area = shmem_get_unmapped_area, From patchwork Thu Nov 11 14:13:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA9E0C4332F for ; Thu, 11 Nov 2021 14:15:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C08236115A for ; Thu, 11 Nov 2021 14:15:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233712AbhKKOST (ORCPT ); Thu, 11 Nov 2021 09:18:19 -0500 Received: from mga06.intel.com ([134.134.136.31]:59123 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233770AbhKKOSS (ORCPT ); Thu, 11 Nov 2021 09:18:18 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="293739832" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="293739832" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:15:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555477" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:15:01 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 2/6] kvm: x86: Introduce guest private memory address space to memslot Date: Thu, 11 Nov 2021 22:13:41 +0800 Message-Id: <20211111141352.26311-3-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Existing memslots functions are extended to pass a bool ‘private’ parameter to indicate whether the operation is on guest private memory address space or not. Signed-off-by: Sean Christopherson Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/include/asm/kvm_host.h | 5 +++-- arch/x86/include/uapi/asm/kvm.h | 4 ++++ arch/x86/kvm/mmu/mmu.c | 2 +- include/linux/kvm_host.h | 23 ++++++++++++++++++++--- virt/kvm/kvm_main.c | 9 ++++++++- 5 files changed, 36 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 20dfcdd20e81..048089883650 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1741,9 +1741,10 @@ enum { #define HF_SMM_INSIDE_NMI_MASK (1 << 7) #define __KVM_VCPU_MULTIPLE_ADDRESS_SPACE -#define KVM_ADDRESS_SPACE_NUM 2 +#define KVM_ADDRESS_SPACE_NUM 3 -#define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 1 : 0) +#define kvm_arch_vcpu_memslots_id(vcpu, private) \ + (((vcpu)->arch.hflags & HF_SMM_MASK) ? 1 : (!!private) << 1) #define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm) asmlinkage void kvm_spurious_fault(void); diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 47bc1a0df5ee..65189cfd3837 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -53,6 +53,10 @@ /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 +#define KVM_DEFAULT_ADDRESS_SPACE 0 +#define KVM_SMM_ADDRESS_SPACE 1 +#define KVM_PRIVATE_ADDRESS_SPACE 2 + struct kvm_memory_alias { __u32 slot; /* this has a different namespace than memory slots */ __u32 flags; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 79d4ae465a96..8483c15eac6f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3938,7 +3938,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, return false; } - /* Don't expose private memslots to L2. */ + /* Don't expose KVM's internal memslots to L2. */ if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) { *pfn = KVM_PFN_NOSLOT; *writable = false; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 597841fe3d7a..8e5b197230ed 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -442,7 +442,7 @@ struct kvm_irq_routing_table { #define KVM_USER_MEM_SLOTS (KVM_MEM_SLOTS_NUM - KVM_PRIVATE_MEM_SLOTS) #ifndef __KVM_VCPU_MULTIPLE_ADDRESS_SPACE -static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu) +static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu, bool private) { return 0; } @@ -699,13 +699,19 @@ static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) return __kvm_memslots(kvm, 0); } -static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu) +static inline struct kvm_memslots *__kvm_vcpu_memslots(struct kvm_vcpu *vcpu, + bool private) { - int as_id = kvm_arch_vcpu_memslots_id(vcpu); + int as_id = kvm_arch_vcpu_memslots_id(vcpu, private); return __kvm_memslots(vcpu->kvm, as_id); } +static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu) +{ + return __kvm_vcpu_memslots(vcpu, false); +} + static inline struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id) { @@ -721,6 +727,15 @@ struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id) return slot; } +static inline bool memslot_is_private(const struct kvm_memory_slot *slot) +{ +#ifdef KVM_PRIVATE_ADDRESS_SPACE + return slot && slot->as_id == KVM_PRIVATE_ADDRESS_SPACE; +#else + return false; +#endif +} + /* * KVM_SET_USER_MEMORY_REGION ioctl allows the following operations: * - create a new memory slot @@ -860,6 +875,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, g void mark_page_dirty(struct kvm *kvm, gfn_t gfn); struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu); +struct kvm_memory_slot *__kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, + gfn_t gfn, bool private); struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn); kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn); kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8815218630dc..fe62df334054 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1721,9 +1721,16 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn) } EXPORT_SYMBOL_GPL(gfn_to_memslot); +struct kvm_memory_slot *__kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, + gfn_t gfn, bool private) +{ + return __gfn_to_memslot(__kvm_vcpu_memslots(vcpu, private), gfn); +} +EXPORT_SYMBOL_GPL(__kvm_vcpu_gfn_to_memslot); + struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn) { - return __gfn_to_memslot(kvm_vcpu_memslots(vcpu), gfn); + return __kvm_vcpu_gfn_to_memslot(vcpu, gfn, false); } EXPORT_SYMBOL_GPL(kvm_vcpu_gfn_to_memslot); From patchwork Thu Nov 11 14:13:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8F52C433FE for ; Thu, 11 Nov 2021 14:15:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A1CE61179 for ; Thu, 11 Nov 2021 14:15:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233729AbhKKOSV (ORCPT ); Thu, 11 Nov 2021 09:18:21 -0500 Received: from mga06.intel.com ([134.134.136.31]:59112 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233773AbhKKOSS (ORCPT ); Thu, 11 Nov 2021 09:18:18 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="293739862" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="293739862" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:15:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555579" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:15:12 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 3/6] kvm: x86: add private_ops to memslot Date: Thu, 11 Nov 2021 22:13:42 +0800 Message-Id: <20211111141352.26311-4-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Guest memory for guest private memslot is designed to be backed by an "enlightened" file descriptor(fd). Some callbacks (working on the fd) are implemented by some other kernel subsystems who want to provide guest private memory to help KVM to establish the memory mapping. Signed-off-by: Sean Christopherson Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- Documentation/virt/kvm/api.rst | 1 + arch/x86/kvm/mmu/mmu.c | 47 ++++++++++++++++++++++++++++++---- arch/x86/kvm/mmu/paging_tmpl.h | 3 ++- include/linux/kvm_host.h | 8 ++++++ include/uapi/linux/kvm.h | 3 +++ 5 files changed, 56 insertions(+), 6 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 47054a79d395..16c06bf10302 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -1260,6 +1260,7 @@ yet and must be cleared on entry. __u64 guest_phys_addr; __u64 memory_size; /* bytes */ __u64 userspace_addr; /* start of the userspace allocated memory */ + __u32 fd; /* memory fd that provides guest memory */ }; /* for kvm_memory_region::flags */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 8483c15eac6f..af5ecf4ef62a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2932,6 +2932,19 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, return level; } +static int host_private_pfn_mapping_level(const struct kvm_memory_slot *slot, + gfn_t gfn) +{ + kvm_pfn_t pfn; + int page_level = PG_LEVEL_4K; + + pfn = slot->private_ops->get_lock_pfn(slot, gfn, &page_level); + if (pfn >= 0) + slot->private_ops->put_unlock_pfn(pfn); + + return page_level; +} + int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t pfn, int max_level) { @@ -2947,6 +2960,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot, if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; + if (memslot_is_private(slot)) + return host_private_pfn_mapping_level(slot, gfn); + return host_pfn_mapping_level(kvm, gfn, pfn, slot); } @@ -3926,10 +3942,13 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva, - bool write, bool *writable) + bool write, bool *writable, bool private) { - struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + struct kvm_memory_slot *slot; bool async; + int page_level; + + slot = __kvm_vcpu_gfn_to_memslot(vcpu, gfn, private); /* Don't expose aliases for no slot GFNs or private memslots */ if ((cr2_or_gpa & vcpu_gpa_stolen_mask(vcpu)) && @@ -3945,6 +3964,17 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, return false; } + if (private) { + *pfn = slot->private_ops->get_lock_pfn(slot, gfn, &page_level); + if (*pfn < 0) + *pfn = KVM_PFN_ERR_FAULT; + if (writable) + *writable = slot->flags & KVM_MEM_READONLY ? + false : true; + + return false; + } + async = false; *pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async, write, writable, hva); @@ -3971,7 +4001,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, kvm_pfn_t *pfn) { bool write = error_code & PFERR_WRITE_MASK; + bool private = is_private_gfn(vcpu, gpa >> PAGE_SHIFT); bool map_writable; + struct kvm_memory_slot *slot; gfn_t gfn = vcpu_gpa_to_gfn_unalias(vcpu, gpa); unsigned long mmu_seq; @@ -3995,7 +4027,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, smp_rmb(); if (try_async_pf(vcpu, prefault, gfn, gpa, pfn, &hva, - write, &map_writable)) + write, &map_writable, private)) return RET_PF_RETRY; if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, *pfn, ACC_ALL, &r)) @@ -4008,7 +4040,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, else write_lock(&vcpu->kvm->mmu_lock); - if (!is_noslot_pfn(*pfn) && + if (!private && !is_noslot_pfn(*pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva)) goto out_unlock; r = make_mmu_pages_available(vcpu); @@ -4027,7 +4059,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, read_unlock(&vcpu->kvm->mmu_lock); else write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(*pfn); + + if (!private) + kvm_release_pfn_clean(*pfn); + else + slot->private_ops->put_unlock_pfn(*pfn); + return r; } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 1fc3a0826072..5ffeb9c85fba 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -799,6 +799,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, { bool write_fault = error_code & PFERR_WRITE_MASK; bool user_fault = error_code & PFERR_USER_MASK; + bool private = is_private_gfn(vcpu, addr >> PAGE_SHIFT); struct guest_walker walker; int r; kvm_pfn_t pfn; @@ -854,7 +855,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, smp_rmb(); if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva, - write_fault, &map_writable)) + write_fault, &map_writable, private)) return RET_PF_RETRY; if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r)) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8e5b197230ed..83345460c5f5 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -347,6 +347,12 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) return cmpxchg(&vcpu->mode, IN_GUEST_MODE, EXITING_GUEST_MODE); } +struct kvm_private_memory_ops { + unsigned long (*get_lock_pfn)(const struct kvm_memory_slot *slot, + gfn_t gfn, int *page_level); + void (*put_unlock_pfn)(unsigned long pfn); +}; + /* * Some of the bitops functions do not support too long bitmaps. * This number must be determined not to exceed such limits. @@ -362,6 +368,8 @@ struct kvm_memory_slot { u32 flags; short id; u16 as_id; + struct file *file; + struct kvm_private_memory_ops *private_ops; }; static inline bool kvm_slot_dirty_track_enabled(struct kvm_memory_slot *slot) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d3c9caf86d80..8d20caae9180 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -100,6 +100,9 @@ struct kvm_userspace_memory_region { __u64 guest_phys_addr; __u64 memory_size; /* bytes */ __u64 userspace_addr; /* start of the userspace allocated memory */ +#ifdef KVM_PRIVATE_ADDRESS_SPACE + __u32 fd; /* valid if memslot is guest private memory */ +#endif }; /* From patchwork Thu Nov 11 14:13:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9452C433EF for ; Thu, 11 Nov 2021 14:15:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F754611C0 for ; Thu, 11 Nov 2021 14:15:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233758AbhKKOS0 (ORCPT ); Thu, 11 Nov 2021 09:18:26 -0500 Received: from mga09.intel.com ([134.134.136.24]:49655 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233742AbhKKOSX (ORCPT ); Thu, 11 Nov 2021 09:18:23 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="232759684" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="232759684" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:15:32 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555679" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:15:22 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 4/6] kvm: x86: implement private_ops for memfd backing store Date: Thu, 11 Nov 2021 22:13:43 +0800 Message-Id: <20211111141352.26311-5-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Call memfd_register_guest() module API to setup private_ops for a given private memslot. Signed-off-by: Sean Christopherson Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/memfd.c | 63 ++++++++++++++++++++++++++++++++++++++++ include/linux/kvm_host.h | 6 ++++ virt/kvm/kvm_main.c | 29 ++++++++++++++++-- 4 files changed, 96 insertions(+), 4 deletions(-) create mode 100644 arch/x86/kvm/memfd.c diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index e7ed25070206..72ad96c78bed 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -16,7 +16,7 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ - hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \ + hyperv.o debugfs.o memfd.o mmu/mmu.o mmu/page_track.o \ mmu/spte.o kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o kvm-$(CONFIG_KVM_XEN) += xen.o diff --git a/arch/x86/kvm/memfd.c b/arch/x86/kvm/memfd.c new file mode 100644 index 000000000000..e08ab61d09f2 --- /dev/null +++ b/arch/x86/kvm/memfd.c @@ -0,0 +1,63 @@ + +// SPDX-License-Identifier: GPL-2.0-only +/* + * memfd.c: routines for fd based memory backing store + * Copyright (c) 2021, Intel Corporation. + * + */ + +#include +#include +const static struct guest_mem_ops *memfd_ops; + +static void test_guest_invalidate_page_range(struct inode *inode, void *owner, + pgoff_t start, pgoff_t end) +{ + //!!!We can get here after the owner no longer exists +} + +static const struct guest_ops guest_ops = { + .invalidate_page_range = test_guest_invalidate_page_range, +}; + +static unsigned long memfd_get_lock_pfn(const struct kvm_memory_slot *slot, + gfn_t gfn, int *page_level) +{ + pgoff_t index = gfn - slot->base_gfn + + (slot->userspace_addr >> PAGE_SHIFT); + + return memfd_ops->get_lock_pfn(slot->file->f_inode, index, page_level); +} + +static void memfd_put_unlock_pfn(unsigned long pfn) +{ + memfd_ops->put_unlock_pfn(pfn); +} + +static struct kvm_private_memory_ops memfd_private_ops = { + .get_lock_pfn = memfd_get_lock_pfn, + .put_unlock_pfn = memfd_put_unlock_pfn, +}; + +int kvm_register_private_memslot(struct kvm *kvm, + const struct kvm_userspace_memory_region *mem, + struct kvm_memory_slot *slot) +{ + struct fd memfd = fdget(mem->fd); + + if(!memfd.file) + return -EINVAL; + + slot->file = memfd.file; + slot->private_ops = &memfd_private_ops; + + memfd_register_guest(slot->file->f_inode, kvm, &guest_ops, &memfd_ops); + return 0; +} + +void kvm_unregister_private_memslot(struct kvm *kvm, + const struct kvm_userspace_memory_region *mem, + struct kvm_memory_slot *slot) +{ + fput(slot->file); +} diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 83345460c5f5..17fabb4f53bf 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -777,6 +777,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, struct kvm_memory_slot *old, const struct kvm_memory_slot *new, enum kvm_mr_change change); +int kvm_register_private_memslot(struct kvm *kvm, + const struct kvm_userspace_memory_region *mem, + struct kvm_memory_slot *slot); +void kvm_unregister_private_memslot(struct kvm *kvm, + const struct kvm_userspace_memory_region *mem, + struct kvm_memory_slot *slot); /* flush all memory translations */ void kvm_arch_flush_shadow_all(struct kvm *kvm); /* flush memory translations pointing to 'slot' */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fe62df334054..e8e2c5b28aa4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1250,7 +1250,19 @@ static int kvm_set_memslot(struct kvm *kvm, kvm_arch_flush_shadow_memslot(kvm, slot); } +#ifdef KVM_PRIVATE_ADDRESS_SPACE + if (change == KVM_MR_CREATE && as_id == KVM_PRIVATE_ADDRESS_SPACE) { + r = kvm_register_private_memslot(kvm, mem, new); + if (r) + goto out_slots; + } +#endif + r = kvm_arch_prepare_memory_region(kvm, new, mem, change); +#ifdef KVM_PRIVATE_ADDRESS_SPACE + if ((r || change == KVM_MR_DELETE) && as_id == KVM_PRIVATE_ADDRESS_SPACE) + kvm_unregister_private_memslot(kvm, mem, new); +#endif if (r) goto out_slots; @@ -1324,10 +1336,15 @@ int __kvm_set_memory_region(struct kvm *kvm, return -EINVAL; if (mem->guest_phys_addr & (PAGE_SIZE - 1)) return -EINVAL; - /* We can read the guest memory with __xxx_user() later on. */ if ((mem->userspace_addr & (PAGE_SIZE - 1)) || - (mem->userspace_addr != untagged_addr(mem->userspace_addr)) || - !access_ok((void __user *)(unsigned long)mem->userspace_addr, + (mem->userspace_addr != untagged_addr(mem->userspace_addr))) + return -EINVAL; + /* We can read the guest memory with __xxx_user() later on. */ + if ( +#ifdef KVM_PRIVATE_ADDRESS_SPACE + as_id != KVM_PRIVATE_ADDRESS_SPACE && +#endif + !access_ok((void __user *)(unsigned long)mem->userspace_addr, mem->memory_size)) return -EINVAL; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM) @@ -1368,6 +1385,12 @@ int __kvm_set_memory_region(struct kvm *kvm, new.dirty_bitmap = NULL; memset(&new.arch, 0, sizeof(new.arch)); } else { /* Modify an existing slot. */ +#ifdef KVM_PRIVATE_ADDRESS_SPACE + /* Private memslots are immutable, they can only be deleted. */ + if (as_id == KVM_PRIVATE_ADDRESS_SPACE) + return -EINVAL; +#endif + if ((new.userspace_addr != old.userspace_addr) || (new.npages != old.npages) || ((new.flags ^ old.flags) & KVM_MEM_READONLY)) From patchwork Thu Nov 11 14:13:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 117F7C433F5 for ; Thu, 11 Nov 2021 14:15:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F278F6117A for ; Thu, 11 Nov 2021 14:15:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233801AbhKKOSg (ORCPT ); Thu, 11 Nov 2021 09:18:36 -0500 Received: from mga01.intel.com ([192.55.52.88]:20974 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233796AbhKKOSc (ORCPT ); Thu, 11 Nov 2021 09:18:32 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="256621684" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="256621684" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:15:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555758" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:15:32 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 5/6] kvm: x86: add KVM_EXIT_MEMORY_ERROR exit Date: Thu, 11 Nov 2021 22:13:44 +0800 Message-Id: <20211111141352.26311-6-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Currently support to exit to userspace for private/shared memory conversion. Signed-off-by: Sean Christopherson Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/mmu/mmu.c | 20 ++++++++++++++++++++ include/uapi/linux/kvm.h | 15 +++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index af5ecf4ef62a..780868888aa8 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3950,6 +3950,17 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, slot = __kvm_vcpu_gfn_to_memslot(vcpu, gfn, private); + /* + * Exit to userspace to map the requested private/shared memory region + * if there is no memslot and (a) the access is private or (b) there is + * an existing private memslot. Emulated MMIO must be accessed through + * shared GPAs, thus a memslot miss on a private GPA is always handled + * as an implicit conversion "request". + */ + if (!slot && + (private || __kvm_vcpu_gfn_to_memslot(vcpu, gfn, true))) + goto out_convert; + /* Don't expose aliases for no slot GFNs or private memslots */ if ((cr2_or_gpa & vcpu_gpa_stolen_mask(vcpu)) && !kvm_is_visible_memslot(slot)) { @@ -3994,6 +4005,15 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, *pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL, write, writable, hva); return false; + +out_convert: + vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR; + vcpu->run->mem.type = private ? KVM_EXIT_MEM_MAP_PRIVATE + : KVM_EXIT_MEM_MAP_SHARE; + vcpu->run->mem.u.map.gpa = cr2_or_gpa; + vcpu->run->mem.u.map.size = PAGE_SIZE; + return true; + } static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 8d20caae9180..470c472a9451 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -233,6 +233,18 @@ struct kvm_xen_exit { } u; }; +struct kvm_memory_exit { +#define KVM_EXIT_MEM_MAP_SHARE 1 +#define KVM_EXIT_MEM_MAP_PRIVATE 2 + __u32 type; + union { + struct { + __u64 gpa; + __u64 size; + } map; + } u; +}; + #define KVM_S390_GET_SKEYS_NONE 1 #define KVM_S390_SKEYS_MAX 1048576 @@ -272,6 +284,7 @@ struct kvm_xen_exit { #define KVM_EXIT_X86_BUS_LOCK 33 #define KVM_EXIT_XEN 34 #define KVM_EXIT_TDVMCALL 35 +#define KVM_EXIT_MEMORY_ERROR 36 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -455,6 +468,8 @@ struct kvm_run { __u64 subfunc; __u64 param[4]; } tdvmcall; + /* KVM_EXIT_MEMORY_ERROR */ + struct kvm_memory_exit mem; /* Fix the size of the union. */ char padding[256]; }; From patchwork Thu Nov 11 14:13:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A4BFC433EF for ; Thu, 11 Nov 2021 14:16:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 046A06117A for ; Thu, 11 Nov 2021 14:16:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233794AbhKKOSr (ORCPT ); Thu, 11 Nov 2021 09:18:47 -0500 Received: from mga18.intel.com ([134.134.136.126]:10961 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233835AbhKKOSn (ORCPT ); Thu, 11 Nov 2021 09:18:43 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="219806832" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="219806832" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:15:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555822" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:15:43 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 6/6] KVM: add KVM_SPLIT_MEMORY_REGION Date: Thu, 11 Nov 2021 22:13:45 +0800 Message-Id: <20211111141352.26311-7-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This new ioctl let user to split an exising memory region into two parts. The first part reuses the existing memory region but have a shrinked size. The second part is a newly created one. Signed-off-by: Yu Zhang Signed-off-by: Chao Peng --- arch/x86/kvm/x86.c | 3 +- include/linux/kvm_host.h | 4 ++ include/uapi/linux/kvm.h | 16 +++++ virt/kvm/kvm_main.c | 147 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 169 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 98dbe602f47b..1d490c3d7766 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11020,7 +11020,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, enum kvm_mr_change change) { - if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) + if (change == KVM_MR_CREATE || change == KVM_MR_MOVE || + change == KVM_MR_SHRINK) return kvm_alloc_memslot_metadata(memslot, mem->memory_size >> PAGE_SHIFT); return 0; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 17fabb4f53bf..8b5a9217231b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -752,6 +752,9 @@ static inline bool memslot_is_private(const struct kvm_memory_slot *slot) * -- move it in the guest physical memory space * -- just change its flags * + * KVM_SPLIT_MEMORY_REGION ioctl allows the following operation: + * - shrink an existing memory slot + * * Since flags can be changed by some of these operations, the following * differentiation is the best we can do for __kvm_set_memory_region(): */ @@ -760,6 +763,7 @@ enum kvm_mr_change { KVM_MR_DELETE, KVM_MR_MOVE, KVM_MR_FLAGS_ONLY, + KVM_MR_SHRINK, }; int kvm_set_memory_region(struct kvm *kvm, diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 470c472a9451..e61c0eac91e7 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1108,6 +1108,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_DIRTY_LOG_RING 192 #define KVM_CAP_X86_BUS_LOCK_EXIT 193 #define KVM_CAP_PPC_DAWR1 194 +#define KVM_CAP_MEMORY_REGION_SPLIT 195 #define KVM_CAP_VM_TYPES 1000 @@ -1885,4 +1886,19 @@ struct kvm_dirty_gfn { #define KVM_BUS_LOCK_DETECTION_OFF (1 << 0) #define KVM_BUS_LOCK_DETECTION_EXIT (1 << 1) +/** + * struct kvm_split_memory_region_info - Infomation for memory region split. + * @slot1: The slot to be split. + * @slot2: The slot for the newly split part. + * @offset: The offset(bytes) in @slot1 to split. + */ +struct kvm_split_memory_region_info { + __u32 slot1; + __u32 slot2; + __u64 offset; +}; + +#define KVM_SPLIT_MEMORY_REGION _IOW(KVMIO, 0xcf, \ + struct kvm_split_memory_region_info) + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e8e2c5b28aa4..11b0f3d8b9ee 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1467,6 +1467,140 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, return kvm_set_memory_region(kvm, mem); } +static void memslot_to_memory_region(struct kvm_userspace_memory_region *mem, + struct kvm_memory_slot *slot) +{ + mem->slot = (u32)slot->as_id << 16 | slot->id; + mem->flags = slot->flags; + mem->guest_phys_addr = slot->base_gfn >> PAGE_SHIFT; + mem->memory_size = slot->npages << PAGE_SHIFT; + mem->userspace_addr = slot->userspace_addr; +} + +static int kvm_split_memory_region(struct kvm *kvm, int as_id, int id1, int id2, + gfn_t offset) +{ + struct kvm_memory_slot *slot1; + struct kvm_memory_slot slot2, old; + struct kvm_userspace_memory_region mem; + unsigned long *dirty_bitmap_slot1; + struct kvm_memslots *slots; + int r; + + /* Make a full copy of the old memslot. */ + slot1 = id_to_memslot(__kvm_memslots(kvm, as_id), id1); + if (!slot1) + return -EINVAL; + else + old = *slot1; + + if( offset <= old.base_gfn || + offset >= old.base_gfn + old.npages ) + return -EINVAL; + + /* Prepare the second half. */ + slot2.as_id = as_id; + slot2.id = id2; + slot2.base_gfn = old.npages + offset; + slot2.npages = old.npages - offset; + slot2.flags = old.flags; + slot2.userspace_addr = old.userspace_addr + (offset >> PAGE_SHIFT); + slot2.file = old.file; + slot2.private_ops = old.private_ops; + + if (!(old.flags & KVM_MEM_LOG_DIRTY_PAGES)) + slot2.dirty_bitmap = NULL; + else if (!kvm->dirty_ring_size) { + slot1->npages = offset; + r = kvm_alloc_dirty_bitmap(slot1); + if (r) + return r; + else + dirty_bitmap_slot1 = slot1->dirty_bitmap; + + r = kvm_alloc_dirty_bitmap(&slot2); + if (r) + goto out_bitmap; + + //TODO: copy dirty_bitmap or return -EINVAL if logging is running + } + +// mutex_lock(&kvm->slots_arch_lock); + + slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), KVM_MR_CREATE); + if (!slots) { +// mutex_unlock(&kvm->slots_arch_lock); + r = -ENOMEM; + goto out_bitmap; + } + + slot1 = id_to_memslot(slots, id1); + slot1->npages = offset; + slot1->dirty_bitmap = dirty_bitmap_slot1; + + memslot_to_memory_region(&mem, slot1); + r = kvm_arch_prepare_memory_region(kvm, slot1, &mem, KVM_MR_SHRINK); + if (r) + goto out_slots; + + memslot_to_memory_region(&mem, &slot2); + r = kvm_arch_prepare_memory_region(kvm, &slot2, &mem, KVM_MR_CREATE); + if (r) + goto out_slots; + + update_memslots(slots, slot1, KVM_MR_SHRINK); + update_memslots(slots, &slot2, KVM_MR_CREATE); + + slots = install_new_memslots(kvm, as_id, slots); + + kvm_free_memslot(kvm, &old); + + kvfree(slots); + return 0; + +out_slots: +// mutex_unlock(&kvm->slots_arch_lock); + kvfree(slots); +out_bitmap: + if (dirty_bitmap_slot1) + kvm_destroy_dirty_bitmap(slot1); + if (slot2.dirty_bitmap) + kvm_destroy_dirty_bitmap(&slot2); + + return r; +} + +static int kvm_vm_ioctl_split_memory_region(struct kvm *kvm, + struct kvm_split_memory_region_info *info) +{ + int as_id1, as_id2, id1, id2; + int r; + + if ((u16)info->slot1 >= KVM_USER_MEM_SLOTS || + (u16)info->slot2 >= KVM_USER_MEM_SLOTS) + return -EINVAL; + if (info->offset & (PAGE_SIZE - 1)) + return -EINVAL; + + as_id1 = info->slot1 >> 16; + as_id2 = info->slot2 >> 16; + + if (as_id1 != as_id2 || as_id1 >= KVM_ADDRESS_SPACE_NUM) + return -EINVAL; + + id1 = (u16)info->slot1; + id2 = (u16)info->slot2; + if (id1 == id2 || id1 >= KVM_MEM_SLOTS_NUM || id2 >= KVM_MEM_SLOTS_NUM) + return -EINVAL; + + mutex_lock(&kvm->slots_lock); + r = kvm_split_memory_region(kvm, as_id1, id1, id2, + info->offset >> PAGE_SHIFT); + mutex_unlock(&kvm->slots_lock); + + return r; +} + #ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT /** * kvm_get_dirty_log - get a snapshot of dirty pages @@ -3765,6 +3899,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #else return 0; #endif + case KVM_CAP_MEMORY_REGION_SPLIT: + return 1; default: break; } @@ -3901,6 +4037,17 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_vm_ioctl_set_memory_region(kvm, &kvm_userspace_mem); break; } + case KVM_SPLIT_MEMORY_REGION: { + struct kvm_split_memory_region_info info; + + r = -EFAULT; + if (copy_from_user(&info, argp, sizeof(info))) + goto out; + + r = kvm_vm_ioctl_split_memory_region(kvm, &info); + break; + } + case KVM_GET_DIRTY_LOG: { struct kvm_dirty_log log; From patchwork Thu Nov 11 14:13:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA9F9C433EF for ; Thu, 11 Nov 2021 14:16:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2632611C9 for ; Thu, 11 Nov 2021 14:16:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233823AbhKKOS6 (ORCPT ); Thu, 11 Nov 2021 09:18:58 -0500 Received: from mga17.intel.com ([192.55.52.151]:19227 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233299AbhKKOS5 (ORCPT ); Thu, 11 Nov 2021 09:18:57 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="213642358" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="213642358" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:16:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555907" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:15:54 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 07/13] linux-headers: Update Date: Thu, 11 Nov 2021 22:13:46 +0800 Message-Id: <20211111141352.26311-8-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Chao Peng --- linux-headers/asm-x86/kvm.h | 5 +++++ linux-headers/linux/kvm.h | 29 +++++++++++++++++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h index a6c327f8ad..f9aadf0ebb 100644 --- a/linux-headers/asm-x86/kvm.h +++ b/linux-headers/asm-x86/kvm.h @@ -53,6 +53,10 @@ /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 +#define KVM_DEFAULT_ADDRESS_SPACE 0 +#define KVM_SMM_ADDRESS_SPACE 1 +#define KVM_PRIVATE_ADDRESS_SPACE 2 + struct kvm_memory_alias { __u32 slot; /* this has a different namespace than memory slots */ __u32 flags; @@ -295,6 +299,7 @@ struct kvm_debug_exit_arch { #define KVM_GUESTDBG_USE_HW_BP 0x00020000 #define KVM_GUESTDBG_INJECT_DB 0x00040000 #define KVM_GUESTDBG_INJECT_BP 0x00080000 +#define KVM_GUESTDBG_BLOCKIRQ 0x00100000 /* for KVM_SET_GUEST_DEBUG */ struct kvm_guest_debug_arch { diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index bcaf66cc4d..0a43202c04 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -101,6 +101,9 @@ struct kvm_userspace_memory_region { __u64 guest_phys_addr; __u64 memory_size; /* bytes */ __u64 userspace_addr; /* start of the userspace allocated memory */ +#ifdef KVM_PRIVATE_ADDRESS_SPACE + __u32 fd; /* valid if memslot is guest private memory */ +#endif }; /* @@ -231,6 +234,18 @@ struct kvm_xen_exit { } u; }; +struct kvm_memory_exit { +#define KVM_EXIT_MEM_MAP_SHARE 1 +#define KVM_EXIT_MEM_MAP_PRIVATE 2 + __u32 type; + union { + struct { + __u64 gpa; + __u64 size; + } map; + } u; +}; + #define KVM_S390_GET_SKEYS_NONE 1 #define KVM_S390_SKEYS_MAX 1048576 @@ -269,6 +284,7 @@ struct kvm_xen_exit { #define KVM_EXIT_AP_RESET_HOLD 32 #define KVM_EXIT_X86_BUS_LOCK 33 #define KVM_EXIT_XEN 34 +#define KVM_EXIT_MEMORY_ERROR 35 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -469,6 +485,8 @@ struct kvm_run { } msr; /* KVM_EXIT_XEN */ struct kvm_xen_exit xen; + /* KVM_EXIT_MEMORY_ERROR */ + struct kvm_memory_exit mem; /* Fix the size of the union. */ char padding[256]; }; @@ -1965,7 +1983,9 @@ struct kvm_stats_header { #define KVM_STATS_TYPE_CUMULATIVE (0x0 << KVM_STATS_TYPE_SHIFT) #define KVM_STATS_TYPE_INSTANT (0x1 << KVM_STATS_TYPE_SHIFT) #define KVM_STATS_TYPE_PEAK (0x2 << KVM_STATS_TYPE_SHIFT) -#define KVM_STATS_TYPE_MAX KVM_STATS_TYPE_PEAK +#define KVM_STATS_TYPE_LINEAR_HIST (0x3 << KVM_STATS_TYPE_SHIFT) +#define KVM_STATS_TYPE_LOG_HIST (0x4 << KVM_STATS_TYPE_SHIFT) +#define KVM_STATS_TYPE_MAX KVM_STATS_TYPE_LOG_HIST #define KVM_STATS_UNIT_SHIFT 4 #define KVM_STATS_UNIT_MASK (0xF << KVM_STATS_UNIT_SHIFT) @@ -1988,8 +2008,9 @@ struct kvm_stats_header { * @size: The number of data items for this stats. * Every data item is of type __u64. * @offset: The offset of the stats to the start of stat structure in - * struture kvm or kvm_vcpu. - * @unused: Unused field for future usage. Always 0 for now. + * structure kvm or kvm_vcpu. + * @bucket_size: A parameter value used for histogram stats. It is only used + * for linear histogram stats, specifying the size of the bucket; * @name: The name string for the stats. Its size is indicated by the * &kvm_stats_header->name_size. */ @@ -1998,7 +2019,7 @@ struct kvm_stats_desc { __s16 exponent; __u16 size; __u32 offset; - __u32 unused; + __u32 bucket_size; char name[]; }; From patchwork Thu Nov 11 14:13:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61C2EC433EF for ; Thu, 11 Nov 2021 14:16:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C2FE61994 for ; Thu, 11 Nov 2021 14:16:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233899AbhKKOTO (ORCPT ); Thu, 11 Nov 2021 09:19:14 -0500 Received: from mga04.intel.com ([192.55.52.120]:48782 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233809AbhKKOTK (ORCPT ); Thu, 11 Nov 2021 09:19:10 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="231640028" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="231640028" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:16:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492555963" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:16:08 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 08/13] hostmem: Add guest private memory to memory backend Date: Thu, 11 Nov 2021 22:13:47 +0800 Message-Id: <20211111141352.26311-9-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Currently only memfd is supported. Signed-off-by: Chao Peng --- backends/hostmem-memfd.c | 12 +++++++++--- backends/hostmem.c | 24 ++++++++++++++++++++++++ include/exec/memory.h | 3 +++ include/exec/ram_addr.h | 3 ++- include/qemu/memfd.h | 5 +++++ include/sysemu/hostmem.h | 1 + softmmu/physmem.c | 33 +++++++++++++++++++-------------- util/memfd.c | 32 +++++++++++++++++++++++++------- 8 files changed, 88 insertions(+), 25 deletions(-) diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c index 3fc85c3db8..ef057586a0 100644 --- a/backends/hostmem-memfd.c +++ b/backends/hostmem-memfd.c @@ -36,6 +36,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) { HostMemoryBackendMemfd *m = MEMORY_BACKEND_MEMFD(backend); uint32_t ram_flags; + unsigned int seals; char *name; int fd; @@ -44,10 +45,14 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) return; } + seals = backend->guest_private ? F_SEAL_GUEST : 0; + + if (m->seal) { + seals |= F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL; + } + fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size, - m->hugetlb, m->hugetlbsize, m->seal ? - F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0, - errp); + m->hugetlb, m->hugetlbsize, seals, errp); if (fd == -1) { return; } @@ -55,6 +60,7 @@ memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp) name = host_memory_backend_get_name(backend); ram_flags = backend->share ? RAM_SHARED : 0; ram_flags |= backend->reserve ? 0 : RAM_NORESERVE; + ram_flags |= backend->guest_private ? RAM_GUEST_PRIVATE : 0; memory_region_init_ram_from_fd(&backend->mr, OBJECT(backend), name, backend->size, ram_flags, fd, 0, errp); g_free(name); diff --git a/backends/hostmem.c b/backends/hostmem.c index 4c05862ed5..a90d1be0a0 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -472,6 +472,23 @@ host_memory_backend_set_use_canonical_path(Object *obj, bool value, backend->use_canonical_path = value; } +static bool +host_memory_backend_get_guest_private(Object *obj, Error **errp) +{ + HostMemoryBackend *backend = MEMORY_BACKEND(obj); + + return backend->guest_private; + +} + +static void +host_memory_backend_set_guest_private(Object *obj, bool value, Error **errp) +{ + HostMemoryBackend *backend = MEMORY_BACKEND(obj); + + backend->guest_private = value; +} + static void host_memory_backend_class_init(ObjectClass *oc, void *data) { @@ -542,6 +559,13 @@ host_memory_backend_class_init(ObjectClass *oc, void *data) object_class_property_add_bool(oc, "x-use-canonical-path-for-ramblock-id", host_memory_backend_get_use_canonical_path, host_memory_backend_set_use_canonical_path); + + object_class_property_add_bool(oc, "guest-private", + host_memory_backend_get_guest_private, + host_memory_backend_set_guest_private); + object_class_property_set_description(oc, "guest-private", + "Guest private memory"); + } static const TypeInfo host_memory_backend_info = { diff --git a/include/exec/memory.h b/include/exec/memory.h index c3d417d317..ae9d3bc574 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -190,6 +190,9 @@ typedef struct IOMMUTLBEvent { */ #define RAM_NORESERVE (1 << 7) +/* RAM is guest private memory that can not be mmap-ed. */ +#define RAM_GUEST_PRIVATE (1 << 8) + static inline void iommu_notifier_init(IOMMUNotifier *n, IOMMUNotify fn, IOMMUNotifierFlag flags, hwaddr start, hwaddr end, diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h index 551876bed0..32768291de 100644 --- a/include/exec/ram_addr.h +++ b/include/exec/ram_addr.h @@ -74,7 +74,8 @@ static inline bool clear_bmap_test_and_clear(RAMBlock *rb, uint64_t page) static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset) { - return (b && b->host && offset < b->used_length) ? true : false; + return (b && (b->flags & RAM_GUEST_PRIVATE || b->host) + && offset < b->used_length) ? true : false; } static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset) diff --git a/include/qemu/memfd.h b/include/qemu/memfd.h index 975b6bdb77..f021a0730a 100644 --- a/include/qemu/memfd.h +++ b/include/qemu/memfd.h @@ -14,6 +14,11 @@ #define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */ #define F_SEAL_GROW 0x0004 /* prevent file from growing */ #define F_SEAL_WRITE 0x0008 /* prevent writes */ + +#endif + +#ifndef F_SEAL_GUEST +#define F_SEAL_GUEST 0x0020 /* guest private memory */ #endif #ifndef MFD_CLOEXEC diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h index 9ff5c16963..ddf742a69b 100644 --- a/include/sysemu/hostmem.h +++ b/include/sysemu/hostmem.h @@ -65,6 +65,7 @@ struct HostMemoryBackend { uint64_t size; bool merge, dump, use_canonical_path; bool prealloc, is_mapped, share, reserve; + bool guest_private; uint32_t prealloc_threads; DECLARE_BITMAP(host_nodes, MAX_NODES + 1); HostMemPolicy policy; diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 23e77cb771..f4d6eeaa17 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -1591,15 +1591,19 @@ static void *file_ram_alloc(RAMBlock *block, perror("ftruncate"); } - qemu_map_flags = readonly ? QEMU_MAP_READONLY : 0; - qemu_map_flags |= (block->flags & RAM_SHARED) ? QEMU_MAP_SHARED : 0; - qemu_map_flags |= (block->flags & RAM_PMEM) ? QEMU_MAP_SYNC : 0; - qemu_map_flags |= (block->flags & RAM_NORESERVE) ? QEMU_MAP_NORESERVE : 0; - area = qemu_ram_mmap(fd, memory, block->mr->align, qemu_map_flags, offset); - if (area == MAP_FAILED) { - error_setg_errno(errp, errno, - "unable to map backing store for guest RAM"); - return NULL; + if (block->flags & RAM_GUEST_PRIVATE) { + area = (void*)offset; + } else { + qemu_map_flags = readonly ? QEMU_MAP_READONLY : 0; + qemu_map_flags |= (block->flags & RAM_SHARED) ? QEMU_MAP_SHARED : 0; + qemu_map_flags |= (block->flags & RAM_PMEM) ? QEMU_MAP_SYNC : 0; + qemu_map_flags |= (block->flags & RAM_NORESERVE) ? QEMU_MAP_NORESERVE : 0; + area = qemu_ram_mmap(fd, memory, block->mr->align, qemu_map_flags, offset); + if (area == MAP_FAILED) { + error_setg_errno(errp, errno, + "unable to map backing store for guest RAM"); + return NULL; + } } block->fd = fd; @@ -1971,7 +1975,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) qemu_mutex_lock_ramlist(); new_block->offset = find_ram_offset(new_block->max_length); - if (!new_block->host) { + if (!new_block->host && !(new_block->flags & RAM_GUEST_PRIVATE)) { if (xen_enabled()) { xen_ram_alloc(new_block->offset, new_block->max_length, new_block->mr, &err); @@ -2028,7 +2032,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) new_block->used_length, DIRTY_CLIENTS_ALL); - if (new_block->host) { + if (new_block->host && !(new_block->flags & RAM_GUEST_PRIVATE)) { qemu_ram_setup_dump(new_block->host, new_block->max_length); qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE); /* @@ -2055,7 +2059,8 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, int64_t file_size, file_align; /* Just support these ram flags by now. */ - assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE)) == 0); + assert((ram_flags & ~(RAM_SHARED | RAM_PMEM | RAM_NORESERVE | + RAM_GUEST_PRIVATE)) == 0); if (xen_enabled()) { error_setg(errp, "-mem-path not supported with Xen"); @@ -2092,7 +2097,7 @@ RAMBlock *qemu_ram_alloc_from_fd(ram_addr_t size, MemoryRegion *mr, new_block->flags = ram_flags; new_block->host = file_ram_alloc(new_block, size, fd, readonly, !file_size, offset, errp); - if (!new_block->host) { + if (!new_block->host && !(ram_flags & RAM_GUEST_PRIVATE)) { g_free(new_block); return NULL; } @@ -2392,7 +2397,7 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset, RAMBLOCK_FOREACH(block) { /* This case append when the block is not mapped. */ - if (block->host == NULL) { + if (block->host == NULL && !(block->flags & RAM_GUEST_PRIVATE)) { continue; } if (host - block->host < block->max_length) { diff --git a/util/memfd.c b/util/memfd.c index 4a3c07e0be..3b4b88d81e 100644 --- a/util/memfd.c +++ b/util/memfd.c @@ -76,14 +76,32 @@ int qemu_memfd_create(const char *name, size_t size, bool hugetlb, goto err; } - if (ftruncate(mfd, size) == -1) { - error_setg_errno(errp, errno, "failed to resize memfd to %zu", size); - goto err; - } - if (seals && fcntl(mfd, F_ADD_SEALS, seals) == -1) { - error_setg_errno(errp, errno, "failed to add seals 0x%x", seals); - goto err; + /* + * The call sequence of F_ADD_SEALS and ftruncate matters here. + * For SEAL_GUEST, it requires the size to be 0 at the time of setting seal + * For SEAL_GROW/SHRINK, ftruncate should be called before setting seal. + */ + if (seals & F_SEAL_GUEST) { + if (seals && fcntl(mfd, F_ADD_SEALS, seals) == -1) { + error_setg_errno(errp, errno, "failed to add seals 0x%x", seals); + goto err; + } + + if (ftruncate(mfd, size) == -1) { + error_setg_errno(errp, errno, "failed to resize memfd to %zu", size); + goto err; + } + } else { + if (ftruncate(mfd, size) == -1) { + error_setg_errno(errp, errno, "failed to resize memfd to %zu", size); + goto err; + } + + if (seals && fcntl(mfd, F_ADD_SEALS, seals) == -1) { + error_setg_errno(errp, errno, "failed to add seals 0x%x", seals); + goto err; + } } return mfd; From patchwork Thu Nov 11 14:13:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E399DC433F5 for ; Thu, 11 Nov 2021 14:16:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CCF5361152 for ; Thu, 11 Nov 2021 14:16:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233889AbhKKOTV (ORCPT ); Thu, 11 Nov 2021 09:19:21 -0500 Received: from mga05.intel.com ([192.55.52.43]:50160 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233551AbhKKOTU (ORCPT ); Thu, 11 Nov 2021 09:19:20 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="319117507" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="319117507" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:16:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492556049" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:16:18 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 09/13] qmp: Include "guest-private" property for memory backends Date: Thu, 11 Nov 2021 22:13:48 +0800 Message-Id: <20211111141352.26311-10-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Chao Peng --- hw/core/machine-hmp-cmds.c | 3 +++ hw/core/machine-qmp-cmds.c | 1 + qapi/machine.json | 3 +++ qapi/qom.json | 3 +++ 4 files changed, 10 insertions(+) diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c index 76b22b00d6..6bd66c25b7 100644 --- a/hw/core/machine-hmp-cmds.c +++ b/hw/core/machine-hmp-cmds.c @@ -112,6 +112,9 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict) m->value->prealloc ? "true" : "false"); monitor_printf(mon, " share: %s\n", m->value->share ? "true" : "false"); + monitor_printf(mon, " guest private: %s\n", + m->value->guest_private ? "true" : "false"); + if (m->value->has_reserve) { monitor_printf(mon, " reserve: %s\n", m->value->reserve ? "true" : "false"); diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c index 216fdfaf3a..2c1c1de73f 100644 --- a/hw/core/machine-qmp-cmds.c +++ b/hw/core/machine-qmp-cmds.c @@ -174,6 +174,7 @@ static int query_memdev(Object *obj, void *opaque) m->dump = object_property_get_bool(obj, "dump", &error_abort); m->prealloc = object_property_get_bool(obj, "prealloc", &error_abort); m->share = object_property_get_bool(obj, "share", &error_abort); + m->guest_private = object_property_get_bool(obj, "guest-private", &error_abort); m->reserve = object_property_get_bool(obj, "reserve", &err); if (err) { error_free_or_abort(&err); diff --git a/qapi/machine.json b/qapi/machine.json index 157712f006..f568a6a0bf 100644 --- a/qapi/machine.json +++ b/qapi/machine.json @@ -798,6 +798,8 @@ # # @share: whether memory is private to QEMU or shared (since 6.1) # +# @guest-private: whether memory is private to guest (since X.X) +# # @reserve: whether swap space (or huge pages) was reserved if applicable. # This corresponds to the user configuration and not the actual # behavior implemented in the OS to perform the reservation. @@ -818,6 +820,7 @@ 'dump': 'bool', 'prealloc': 'bool', 'share': 'bool', + 'guest-private': 'bool', '*reserve': 'bool', 'host-nodes': ['uint16'], 'policy': 'HostMemPolicy' }} diff --git a/qapi/qom.json b/qapi/qom.json index a25616bc7a..93af9b106e 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -550,6 +550,8 @@ # @share: if false, the memory is private to QEMU; if true, it is shared # (default: false) # +# @guest-private: if true, the memory is guest private memory (default: false) +# # @reserve: if true, reserve swap space (or huge pages) if applicable # (default: true) (since 6.1) # @@ -580,6 +582,7 @@ '*prealloc': 'bool', '*prealloc-threads': 'uint32', '*share': 'bool', + '*guest-private': 'bool', '*reserve': 'bool', 'size': 'size', '*x-use-canonical-path-for-ramblock-id': 'bool' } } From patchwork Thu Nov 11 14:13:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 735A7C433FE for ; Thu, 11 Nov 2021 14:16:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 580466117A for ; Thu, 11 Nov 2021 14:16:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233940AbhKKOTb (ORCPT ); Thu, 11 Nov 2021 09:19:31 -0500 Received: from mga12.intel.com ([192.55.52.136]:9380 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233893AbhKKOTa (ORCPT ); Thu, 11 Nov 2021 09:19:30 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="212952223" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="212952223" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:16:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492556158" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:16:29 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 10/13] softmmu/physmem: Add private memory address space Date: Thu, 11 Nov 2021 22:13:49 +0800 Message-Id: <20211111141352.26311-11-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Chao Peng --- include/exec/address-spaces.h | 2 ++ softmmu/physmem.c | 13 +++++++++++++ 2 files changed, 15 insertions(+) diff --git a/include/exec/address-spaces.h b/include/exec/address-spaces.h index db8bfa9a92..b3f45001c0 100644 --- a/include/exec/address-spaces.h +++ b/include/exec/address-spaces.h @@ -27,6 +27,7 @@ * until a proper bus interface is available. */ MemoryRegion *get_system_memory(void); +MemoryRegion *get_system_private_memory(void); /* Get the root I/O port region. This interface should only be used * temporarily until a proper bus interface is available. @@ -34,6 +35,7 @@ MemoryRegion *get_system_memory(void); MemoryRegion *get_system_io(void); extern AddressSpace address_space_memory; +extern AddressSpace address_space_private_memory; extern AddressSpace address_space_io; #endif diff --git a/softmmu/physmem.c b/softmmu/physmem.c index f4d6eeaa17..a2d339fd88 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -85,10 +85,13 @@ RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list.blocks) }; static MemoryRegion *system_memory; +static MemoryRegion *system_private_memory; static MemoryRegion *system_io; AddressSpace address_space_io; AddressSpace address_space_memory; +AddressSpace address_space_private_memory; + static MemoryRegion io_mem_unassigned; @@ -2669,6 +2672,11 @@ static void memory_map_init(void) memory_region_init(system_memory, NULL, "system", UINT64_MAX); address_space_init(&address_space_memory, system_memory, "memory"); + system_private_memory = g_malloc(sizeof(*system_private_memory)); + + memory_region_init(system_private_memory, NULL, "system-private", UINT64_MAX); + address_space_init(&address_space_private_memory, system_private_memory, "private-memory"); + system_io = g_malloc(sizeof(*system_io)); memory_region_init_io(system_io, NULL, &unassigned_io_ops, NULL, "io", 65536); @@ -2680,6 +2688,11 @@ MemoryRegion *get_system_memory(void) return system_memory; } +MemoryRegion *get_system_private_memory(void) +{ + return system_private_memory; +} + MemoryRegion *get_system_io(void) { return system_io; From patchwork Thu Nov 11 14:13:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614923 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89627C433FE for ; Thu, 11 Nov 2021 14:17:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 72BFE6023E for ; Thu, 11 Nov 2021 14:17:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233950AbhKKOUD (ORCPT ); Thu, 11 Nov 2021 09:20:03 -0500 Received: from mga07.intel.com ([134.134.136.100]:50678 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233793AbhKKOTz (ORCPT ); Thu, 11 Nov 2021 09:19:55 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="296353377" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="296353377" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:16:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492556213" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:16:40 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 11/13] kvm: register private memory slots Date: Thu, 11 Nov 2021 22:13:50 +0800 Message-Id: <20211111141352.26311-12-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Chao Peng --- accel/kvm/kvm-all.c | 9 +++++++++ include/sysemu/kvm_int.h | 1 + 2 files changed, 10 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 0125c17edb..d336458e9e 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -138,6 +138,7 @@ struct KVMState QTAILQ_HEAD(, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE]; #endif KVMMemoryListener memory_listener; + KVMMemoryListener private_memory_listener; QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus; /* For "info mtree -f" to tell if an MR is registered in KVM */ @@ -359,6 +360,7 @@ static int kvm_set_user_memory_region(KVMMemoryListener *kml, KVMSlot *slot, boo mem.guest_phys_addr = slot->start_addr; mem.userspace_addr = (unsigned long)slot->ram; mem.flags = slot->flags; + mem.fd = slot->fd; if (slot->memory_size && !new && (mem.flags ^ slot->old_flags) & KVM_MEM_READONLY) { /* Set the slot size to 0 before setting the slot to the desired @@ -1423,6 +1425,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, mem->ram_start_offset = ram_start_offset; mem->ram = ram; mem->flags = kvm_mem_flags(mr); + if (mr->ram_block) { + mem->fd = mr->ram_block->fd; + } kvm_slot_init_dirty_bitmap(mem); err = kvm_set_user_memory_region(kml, mem, true); if (err) { @@ -2580,6 +2585,9 @@ static int kvm_init(MachineState *ms) kvm_memory_listener_register(s, &s->memory_listener, &address_space_memory, 0); + kvm_memory_listener_register(s, &s->private_memory_listener, + &address_space_private_memory, 2); + if (kvm_eventfds_allowed) { memory_listener_register(&kvm_io_listener, &address_space_io); @@ -2613,6 +2621,7 @@ err: close(s->fd); } g_free(s->memory_listener.slots); + g_free(s->private_memory_listener.slots); return ret; } diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index c788452cd9..0c11c63263 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -28,6 +28,7 @@ typedef struct KVMSlot int as_id; /* Cache of the offset in ram address space */ ram_addr_t ram_start_offset; + int fd; } KVMSlot; typedef struct KVMMemoryListener { From patchwork Thu Nov 11 14:13:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F035C433FE for ; Thu, 11 Nov 2021 14:17:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 721DB6117A for ; Thu, 11 Nov 2021 14:17:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233943AbhKKOTv (ORCPT ); Thu, 11 Nov 2021 09:19:51 -0500 Received: from mga06.intel.com ([134.134.136.31]:59265 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232823AbhKKOTu (ORCPT ); Thu, 11 Nov 2021 09:19:50 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="293740117" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="293740117" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:17:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492556270" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:16:50 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 12/13] kvm: handle private to shared memory conversion Date: Thu, 11 Nov 2021 22:13:51 +0800 Message-Id: <20211111141352.26311-13-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Chao Peng --- accel/kvm/kvm-all.c | 49 ++++++++++++++++++++++++++++++++++++++++++ include/sysemu/kvm.h | 1 + target/arm/kvm.c | 5 +++++ target/i386/kvm/kvm.c | 27 +++++++++++++++++++++++ target/mips/kvm.c | 5 +++++ target/ppc/kvm.c | 5 +++++ target/s390x/kvm/kvm.c | 5 +++++ 7 files changed, 97 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index d336458e9e..6feda9c89b 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1445,6 +1445,38 @@ out: kvm_slots_unlock(); } +static int kvm_map_private_memory(hwaddr start, hwaddr size) +{ + return 0; +} + +static int kvm_map_shared_memory(hwaddr start, hwaddr size) +{ + MemoryRegionSection section; + void *addr; + RAMBlock *rb; + ram_addr_t offset; + + /* Punch a hole in private memory. */ + section = memory_region_find(get_system_private_memory(), start, size); + if (section.mr) { + addr = memory_region_get_ram_ptr(section.mr) + + section.offset_within_region; + rb = qemu_ram_block_from_host(addr, false, &offset); + ram_block_discard_range(rb, offset, size); + memory_region_unref(section.mr); + } + + /* Create new shared memory. */ + section = memory_region_find(get_system_memory(), start, size); + if (section.mr) { + memory_region_unref(section.mr); + return -1; /*Already existed. */ + } + + return kvm_arch_map_shared_memory(start, size); +} + static void *kvm_dirty_ring_reaper_thread(void *data) { KVMState *s = data; @@ -2957,6 +2989,23 @@ int kvm_cpu_exec(CPUState *cpu) break; } break; + case KVM_EXIT_MEMORY_ERROR: + switch (run->mem.type) { + case KVM_EXIT_MEM_MAP_PRIVATE: + ret = kvm_map_private_memory(run->mem.u.map.gpa, + run->mem.u.map.size); + break; + case KVM_EXIT_MEM_MAP_SHARE: + ret = kvm_map_shared_memory(run->mem.u.map.gpa, + run->mem.u.map.size); + break; + default: + DPRINTF("kvm_arch_handle_exit\n"); + ret = kvm_arch_handle_exit(cpu, run); + break; + } + break; + default: DPRINTF("kvm_arch_handle_exit\n"); ret = kvm_arch_handle_exit(cpu, run); diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index a1ab1ee12d..5f00aa0ee0 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -547,4 +547,5 @@ bool kvm_cpu_check_are_resettable(void); bool kvm_arch_cpu_check_are_resettable(void); +int kvm_arch_map_shared_memory(hwaddr start, hwaddr size); #endif diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 5d55de1a49..97e51b8b88 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -1051,3 +1051,8 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +int kvm_arch_map_shared_memory(hwaddr start, hwaddr size) +{ + return 0; +} diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 500d2e0e68..b3209402bc 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -4925,3 +4925,30 @@ bool kvm_arch_cpu_check_are_resettable(void) { return !sev_es_enabled(); } + +int kvm_arch_map_shared_memory(hwaddr start, hwaddr size) +{ + MachineState *pcms = current_machine; + X86MachineState *x86ms = X86_MACHINE(pcms); + MemoryRegion *system_memory = get_system_memory(); + MemoryRegion *region; + char name[134]; + hwaddr offset; + + if (start + size < x86ms->below_4g_mem_size) { + sprintf(name, "0x%lx@0x%lx", size, start); + region = g_malloc(sizeof(*region)); + memory_region_init_alias(region, NULL, name, pcms->ram, start, size); + memory_region_add_subregion(system_memory, start, region); + return 0; + } else if (start > 0x100000000ULL){ + sprintf(name, "0x%lx@0x%lx", size, start); + offset = start - 0x100000000ULL + x86ms->below_4g_mem_size; + region = g_malloc(sizeof(*region)); + memory_region_init_alias(region, NULL, name, pcms->ram, offset, size); + memory_region_add_subregion(system_memory, start, region); + return 0; + } + + return -1; +} diff --git a/target/mips/kvm.c b/target/mips/kvm.c index 086debd9f0..4aed54aa9f 100644 --- a/target/mips/kvm.c +++ b/target/mips/kvm.c @@ -1295,3 +1295,8 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +int kvm_arch_map_shared_memory(hwaddr start, hwaddr size) +{ + return 0; +} diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index dc93b99189..cc31a7c38d 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2959,3 +2959,8 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +int kvm_arch_map_shared_memory(hwaddr start, hwaddr size) +{ + return 0; +} diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c index 5b1fdb55c4..4a9161ba3a 100644 --- a/target/s390x/kvm/kvm.c +++ b/target/s390x/kvm/kvm.c @@ -2562,3 +2562,8 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +int kvm_arch_map_shared_memory(hwaddr start, hwaddr size) +{ + return 0; +} From patchwork Thu Nov 11 14:13:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chao Peng X-Patchwork-Id: 12614925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28BECC433F5 for ; Thu, 11 Nov 2021 14:17:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 139EC61106 for ; Thu, 11 Nov 2021 14:17:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234036AbhKKOUP (ORCPT ); Thu, 11 Nov 2021 09:20:15 -0500 Received: from mga12.intel.com ([192.55.52.136]:9423 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234019AbhKKOUD (ORCPT ); Thu, 11 Nov 2021 09:20:03 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10164"; a="212952293" X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="212952293" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Nov 2021 06:17:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,226,1631602800"; d="scan'208";a="492556372" Received: from chaop.bj.intel.com ([10.240.192.101]) by orsmga007.jf.intel.com with ESMTP; 11 Nov 2021 06:17:01 -0800 From: Chao Peng To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini , Jonathan Corbet , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Hugh Dickins , Jeff Layton , "J . Bruce Fields" , Andrew Morton , Yu Zhang , Chao Peng , "Kirill A . Shutemov" , luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC PATCH 13/13] machine: Add 'private-memory-backend' property Date: Thu, 11 Nov 2021 22:13:52 +0800 Message-Id: <20211111141352.26311-14-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211111141352.26311-1-chao.p.peng@linux.intel.com> References: <20211111141352.26311-1-chao.p.peng@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Signed-off-by: Chao Peng --- hw/core/machine.c | 38 ++++++++++++++++++++++++++++++++++++++ hw/i386/pc.c | 22 ++++++++++++++++------ include/hw/boards.h | 2 ++ softmmu/vl.c | 16 ++++++++++------ 4 files changed, 66 insertions(+), 12 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index 067f42b528..d092bf400b 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -589,6 +589,22 @@ static void machine_set_memdev(Object *obj, const char *value, Error **errp) ms->ram_memdev_id = g_strdup(value); } +static char *machine_get_private_memdev(Object *obj, Error **errp) +{ + MachineState *ms = MACHINE(obj); + + return g_strdup(ms->private_ram_memdev_id); +} + +static void machine_set_private_memdev(Object *obj, const char *value, + Error **errp) +{ + MachineState *ms = MACHINE(obj); + + g_free(ms->private_ram_memdev_id); + ms->private_ram_memdev_id = g_strdup(value); +} + static void machine_init_notify(Notifier *notifier, void *data) { MachineState *machine = MACHINE(qdev_get_machine()); @@ -962,6 +978,13 @@ static void machine_class_init(ObjectClass *oc, void *data) object_class_property_set_description(oc, "memory-backend", "Set RAM backend" "Valid value is ID of hostmem based backend"); + + object_class_property_add_str(oc, "private-memory-backend", + machine_get_private_memdev, + machine_set_private_memdev); + object_class_property_set_description(oc, "private-memory-backend", + "Set guest private RAM backend" + "Valid value is ID of hostmem based backend"); } static void machine_class_base_init(ObjectClass *oc, void *data) @@ -1208,6 +1231,21 @@ void machine_run_board_init(MachineState *machine) machine->ram = machine_consume_memdev(machine, MEMORY_BACKEND(o)); } + if (machine->private_ram_memdev_id) { + Object *o; + HostMemoryBackend *backend; + o = object_resolve_path_type(machine->private_ram_memdev_id, + TYPE_MEMORY_BACKEND, NULL); + backend = MEMORY_BACKEND(o); + if (backend->guest_private) { + machine->private_ram = machine_consume_memdev(machine, backend); + } else { + error_report("memorybaend %s is not guest private memory.", + object_get_canonical_path_component(OBJECT(backend))); + exit(EXIT_FAILURE); + } + } + if (machine->numa_state) { numa_complete_configuration(machine); if (machine->numa_state->num_nodes) { diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 1276bfeee4..e6209428c1 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -865,30 +865,40 @@ void pc_memory_init(PCMachineState *pcms, MachineClass *mc = MACHINE_GET_CLASS(machine); PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); X86MachineState *x86ms = X86_MACHINE(pcms); + MemoryRegion *ram, *root_region; assert(machine->ram_size == x86ms->below_4g_mem_size + x86ms->above_4g_mem_size); linux_boot = (machine->kernel_filename != NULL); + *ram_memory = machine->ram; + + /* Map private memory if set. Shared memory will be mapped per request. */ + if (machine->private_ram) { + ram = machine->private_ram; + root_region = get_system_private_memory(); + } else { + ram = machine->ram; + root_region = system_memory; + } + /* * Split single memory region and use aliases to address portions of it, * done for backwards compatibility with older qemus. */ - *ram_memory = machine->ram; ram_below_4g = g_malloc(sizeof(*ram_below_4g)); - memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", machine->ram, + memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram, 0, x86ms->below_4g_mem_size); - memory_region_add_subregion(system_memory, 0, ram_below_4g); + memory_region_add_subregion(root_region, 0, ram_below_4g); e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM); if (x86ms->above_4g_mem_size > 0) { ram_above_4g = g_malloc(sizeof(*ram_above_4g)); memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", - machine->ram, + ram, x86ms->below_4g_mem_size, x86ms->above_4g_mem_size); - memory_region_add_subregion(system_memory, 0x100000000ULL, - ram_above_4g); + memory_region_add_subregion(root_region, 0x100000000ULL, ram_above_4g); e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM); } diff --git a/include/hw/boards.h b/include/hw/boards.h index 463a5514f9..dd6a3a3e03 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -313,11 +313,13 @@ struct MachineState { bool enable_graphics; ConfidentialGuestSupport *cgs; char *ram_memdev_id; + char *private_ram_memdev_id; /* * convenience alias to ram_memdev_id backend memory region * or to numa container memory region */ MemoryRegion *ram; + MemoryRegion *private_ram; DeviceMemoryState *device_memory; ram_addr_t ram_size; diff --git a/softmmu/vl.c b/softmmu/vl.c index ea05bb39c5..9665ccdb16 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -1985,17 +1985,15 @@ static bool have_custom_ram_size(void) return !!qemu_opt_get_size(opts, "size", 0); } -static void qemu_resolve_machine_memdev(void) +static void check_memdev(char *id) { - if (current_machine->ram_memdev_id) { + if (id) { Object *backend; ram_addr_t backend_size; - backend = object_resolve_path_type(current_machine->ram_memdev_id, - TYPE_MEMORY_BACKEND, NULL); + backend = object_resolve_path_type(id, TYPE_MEMORY_BACKEND, NULL); if (!backend) { - error_report("Memory backend '%s' not found", - current_machine->ram_memdev_id); + error_report("Memory backend '%s' not found", id); exit(EXIT_FAILURE); } backend_size = object_property_get_uint(backend, "size", &error_abort); @@ -2011,6 +2009,12 @@ static void qemu_resolve_machine_memdev(void) } ram_size = backend_size; } +} + +static void qemu_resolve_machine_memdev(void) +{ + check_memdev(current_machine->ram_memdev_id); + check_memdev(current_machine->private_ram_memdev_id); if (!xen_enabled()) { /* On 32-bit hosts, QEMU is limited by virtual address space */