From patchwork Fri Nov 8 15:50:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13868408 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D130D21A718 for ; Fri, 8 Nov 2024 15:51:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081070; cv=none; b=pA2nGbDfXP3S9q5+/IiCxgyXzF1eSS+UHe8SKeXyrf/3CBzcoWUAssXiV3bJKJc180QFRO9UX0sj04kTAimXVdUpTN1Ub3QNRz8LU/XM+50/4+qIV3b/EHTJHMDY9Xag+V/REgOjRqRXxoB8TOg5+5VKHATebr/R0uPFX3NVlWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081070; c=relaxed/simple; bh=LScR6Xj7avBL1rJJLStBikYzGWJDQNgm+T6bJpW8hgU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OqAN9Crk+gpEjC4WZkaQdYmoX0xrrwMaq0dUMyUBoiPxJGCc9hmjTkJfs3qxkYTZxAVsiRkN8b+IashTaU354ZsOAp4iIlgDLyJI0ZwzJm4+CwX1VtizAQjfHH2X72If06NNGw3EUSk37FkiJDkZMm9uKjKI3FB6YFwbyZuZrfs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UkyBB0E1; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UkyBB0E1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731081066; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KbK3ApElp81ArROKxgA5UKvEaAc0M1YUxQEdMoV1TmE=; b=UkyBB0E1st+BGs9Rzsu5t8pHAR4G0zbaW1ERjNRYj281O8GZoSd+1jeUVj1L9+sUMPrC3S fvffrZWPSFNH7zApyq9Semv3rZlFIad+bT/Fo3eTNhYf4fuhQynRQlPYA4jdN2fhkjnrSf J+hNzhHTnOjfnOMx/kEPeG8uLpJzy6k= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-365-5j0Jw1_dNpKYQfnr7qmC4A-1; Fri, 08 Nov 2024 10:51:05 -0500 X-MC-Unique: 5j0Jw1_dNpKYQfnr7qmC4A-1 X-Mimecast-MFC-AGG-ID: 5j0Jw1_dNpKYQfnr7qmC4A Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 25F1F1953943; Fri, 8 Nov 2024 15:51:04 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6797A300019F; Fri, 8 Nov 2024 15:51:03 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: michael.roth@amd.com, seanjc@google.com Subject: [PATCH 1/3] KVM: gmem: allocate private data for the gmem inode Date: Fri, 8 Nov 2024 10:50:54 -0500 Message-ID: <20241108155056.332412-2-pbonzini@redhat.com> In-Reply-To: <20241108155056.332412-1-pbonzini@redhat.com> References: <20241108155056.332412-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 In preparation for removing the usage of the uptodate flag, reintroduce the gmem filesystem type. We need it in order to free the private inode information. Signed-off-by: Paolo Bonzini --- include/uapi/linux/magic.h | 1 + virt/kvm/guest_memfd.c | 117 +++++++++++++++++++++++++++++++++---- virt/kvm/kvm_main.c | 7 ++- virt/kvm/kvm_mm.h | 8 ++- 4 files changed, 119 insertions(+), 14 deletions(-) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index bb575f3ab45e..d856dd6a7ed9 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -103,5 +103,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ +#define KVM_GUEST_MEM_MAGIC 0x474d454d /* "GMEM" */ #endif /* __LINUX_MAGIC_H__ */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 8f079a61a56d..3ea5a7597fd4 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,9 +4,74 @@ #include #include #include +#include #include "kvm_mm.h" +/* Do all the filesystem crap just for evict_inode... */ + +static struct vfsmount *kvm_gmem_mnt __read_mostly; + +static void gmem_evict_inode(struct inode *inode) +{ + kvfree(inode->i_private); + truncate_inode_pages_final(&inode->i_data); + clear_inode(inode); +} + +static const struct super_operations gmem_super_operations = { + .drop_inode = generic_delete_inode, + .evict_inode = gmem_evict_inode, + .statfs = simple_statfs, +}; + +static int gmem_init_fs_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx = init_pseudo(fc, KVM_GUEST_MEM_MAGIC); + if (!ctx) + return -ENOMEM; + + ctx->ops = &gmem_super_operations; + return 0; +} + +static struct file_system_type kvm_gmem_fs_type = { + .name = "kvm_gmemfs", + .init_fs_context = gmem_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static struct file *kvm_gmem_create_file(const char *name, const struct file_operations *fops) +{ + struct inode *inode; + struct file *file; + + if (fops->owner && !try_module_get(fops->owner)) + return ERR_PTR(-ENOENT); + + inode = alloc_anon_inode(kvm_gmem_mnt->mnt_sb); + if (IS_ERR(inode)) { + file = ERR_CAST(inode); + goto err; + } + file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, fops); + if (IS_ERR(file)) + goto err_iput; + + return file; + +err_iput: + iput(inode); +err: + module_put(fops->owner); + return file; +} + + +struct kvm_gmem_inode { + unsigned long flags; +}; + struct kvm_gmem { struct kvm *kvm; struct xarray bindings; @@ -308,9 +373,31 @@ static struct file_operations kvm_gmem_fops = { .fallocate = kvm_gmem_fallocate, }; -void kvm_gmem_init(struct module *module) +int kvm_gmem_init(struct module *module) { + int ret; + + ret = register_filesystem(&kvm_gmem_fs_type); + if (ret) { + pr_err("kvm-gmem: cannot register file system (%d)\n", ret); + return ret; + } + + kvm_gmem_mnt = kern_mount(&kvm_gmem_fs_type); + if (IS_ERR(kvm_gmem_mnt)) { + pr_err("kvm-gmem: kernel mount failed (%ld)\n", PTR_ERR(kvm_gmem_mnt)); + return PTR_ERR(kvm_gmem_mnt); + } + kvm_gmem_fops.owner = module; + + return 0; +} + +void kvm_gmem_exit(void) +{ + kern_unmount(kvm_gmem_mnt); + unregister_filesystem(&kvm_gmem_fs_type); } static int kvm_gmem_migrate_folio(struct address_space *mapping, @@ -394,15 +481,23 @@ static const struct inode_operations kvm_gmem_iops = { static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) { - const char *anon_name = "[kvm-gmem]"; + const char *gmem_name = "[kvm-gmem]"; + struct kvm_gmem_inode *i_gmem; struct kvm_gmem *gmem; struct inode *inode; struct file *file; int fd, err; + i_gmem = kvzalloc(sizeof(struct kvm_gmem_inode), GFP_KERNEL); + if (!i_gmem) + return -ENOMEM; + i_gmem->flags = flags; + fd = get_unused_fd_flags(0); - if (fd < 0) - return fd; + if (fd < 0) { + err = fd; + goto err_i_gmem; + } gmem = kzalloc(sizeof(*gmem), GFP_KERNEL); if (!gmem) { @@ -410,19 +505,19 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_fd; } - file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem, - O_RDWR, NULL); + file = kvm_gmem_create_file(gmem_name, &kvm_gmem_fops); if (IS_ERR(file)) { err = PTR_ERR(file); goto err_gmem; } + inode = file->f_inode; + + file->f_mapping = inode->i_mapping; + file->private_data = gmem; file->f_flags |= O_LARGEFILE; - inode = file->f_inode; - WARN_ON(file->f_mapping != inode->i_mapping); - - inode->i_private = (void *)(unsigned long)flags; + inode->i_private = i_gmem; inode->i_op = &kvm_gmem_iops; inode->i_mapping->a_ops = &kvm_gmem_aops; inode->i_mode |= S_IFREG; @@ -444,6 +539,8 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) kfree(gmem); err_fd: put_unused_fd(fd); +err_i_gmem: + kvfree(i_gmem); return err; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 279e03029ce1..8b7b4e0eb639 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -6504,7 +6504,9 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) if (WARN_ON_ONCE(r)) goto err_vfio; - kvm_gmem_init(module); + r = kvm_gmem_init(module); + if (r) + goto err_gmem; r = kvm_init_virtualization(); if (r) @@ -6525,6 +6527,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) err_register: kvm_uninit_virtualization(); err_virt: + kvm_gmem_exit(); +err_gmem: kvm_vfio_ops_exit(); err_vfio: kvm_async_pf_deinit(); @@ -6556,6 +6560,7 @@ void kvm_exit(void) for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); + kvm_gmem_exit(); kvm_vfio_ops_exit(); kvm_async_pf_deinit(); kvm_irqfd_exit(); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 715f19669d01..91e4202574a8 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -36,15 +36,17 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, #endif /* HAVE_KVM_PFNCACHE */ #ifdef CONFIG_KVM_PRIVATE_MEM -void kvm_gmem_init(struct module *module); +int kvm_gmem_init(struct module *module); +void kvm_gmem_exit(void); int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args); int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset); void kvm_gmem_unbind(struct kvm_memory_slot *slot); #else -static inline void kvm_gmem_init(struct module *module) +static inline void kvm_gmem_exit(void) {} +static inline int kvm_gmem_init(struct module *module) { - + return 0; } static inline int kvm_gmem_bind(struct kvm *kvm, From patchwork Fri Nov 8 15:50:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13868410 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 658F221B423 for ; Fri, 8 Nov 2024 15:51:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081072; cv=none; b=mIlzZlEhJERdo8vjp091ermjkSdSq/ItmughpeRd/dcPBbthcRT2HKtsJAmYxj/XjsalrbUJOyYcYpiEDJz1URzACpndqDckAV2ma1RyfRnsiIyfYxToN+cvBATP8jLy17klkWggsWkGqmYWSgvuy1M6BIsJaGdZtw6WJXnMZbU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081072; c=relaxed/simple; bh=/AZjLPqPEaywOa4BXgZaIJNjugO2+x5GxR4rgGGfGhk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=a1UBePeKquCD5nt5wmCybNTRyGnYOd9TkQY5k7WPlF8r5mwVgmCNktXNbjIjetwQpvb1h6KY7IpBWdRzFdBQeEKFy9wD0NK2D6rVzJnMJJFFchWVu0h4TdBkkwpTRKr3qOVRIWueconCrt8YcvtdxW3dN5xw2mAxXhyVgD18Jig= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TqUnSSqy; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TqUnSSqy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731081069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=clr/KAgDBcdG6y4j8m/K4hUntnUnW17taVZTjfs/x9U=; b=TqUnSSqyWopQjEycglCurmMTRE71clsMYA7v4hBhVbi707Q9XXLoaW0XzdMVcAELJJmd4x 4ZP8zhxG3rzQIoCdAXFh6UBdg35RoiU/cO8FC3DpHqnBQYn/V9R3wKM/zz85aCRiMPbAen cNaOXyHrs5hDzCLUnIdiyR81OZAFgmE= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-477-GS94pOM3PfaBu3GK_iLKcw-1; Fri, 08 Nov 2024 10:51:06 -0500 X-MC-Unique: GS94pOM3PfaBu3GK_iLKcw-1 X-Mimecast-MFC-AGG-ID: GS94pOM3PfaBu3GK_iLKcw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 135911956083; Fri, 8 Nov 2024 15:51:05 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 58D27300019E; Fri, 8 Nov 2024 15:51:04 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: michael.roth@amd.com, seanjc@google.com Subject: [PATCH 2/3] KVM: gmem: add a complete set of functions to query page preparedness Date: Fri, 8 Nov 2024 10:50:55 -0500 Message-ID: <20241108155056.332412-3-pbonzini@redhat.com> In-Reply-To: <20241108155056.332412-1-pbonzini@redhat.com> References: <20241108155056.332412-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 In preparation for moving preparedness out of the folio flags, pass the struct file* or struct inode* down to kvm_gmem_mark_prepared, as well as the offset within the gmem file. Introduce new functions to unprepare page on punch-hole, and to query the state. Signed-off-by: Paolo Bonzini --- virt/kvm/guest_memfd.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 3ea5a7597fd4..416e02a00cae 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -107,18 +107,28 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo return 0; } -static inline void kvm_gmem_mark_prepared(struct folio *folio) +static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, struct folio *folio) { folio_mark_uptodate(folio); } +static void kvm_gmem_mark_range_unprepared(struct inode *inode, pgoff_t index, pgoff_t npages) +{ +} + +static bool kvm_gmem_is_prepared(struct file *file, pgoff_t index, struct folio *folio) +{ + return folio_test_uptodate(folio); +} + /* * Process @folio, which contains @gfn, so that the guest can use it. * The folio must be locked and the gfn must be contained in @slot. * On successful return the guest sees a zero page so as to avoid * leaking host data and the up-to-date flag is set. */ -static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, +static int kvm_gmem_prepare_folio(struct kvm *kvm, struct file *file, + struct kvm_memory_slot *slot, gfn_t gfn, struct folio *folio) { unsigned long nr_pages, i; @@ -147,7 +157,7 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, index = ALIGN_DOWN(index, 1 << folio_order(folio)); r = __kvm_gmem_prepare_folio(kvm, slot, index, folio); if (!r) - kvm_gmem_mark_prepared(folio); + kvm_gmem_mark_prepared(file, index, folio); return r; } @@ -231,6 +241,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) kvm_gmem_invalidate_begin(gmem, start, end); truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); + kvm_gmem_mark_range_unprepared(inode, start, end - start); list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_end(gmem, start, end); @@ -682,7 +693,7 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, if (max_order) *max_order = 0; - *is_prepared = folio_test_uptodate(folio); + *is_prepared = kvm_gmem_is_prepared(file, index, folio); return folio; } @@ -704,7 +715,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, } if (!is_prepared) - r = kvm_gmem_prepare_folio(kvm, slot, gfn, folio); + r = kvm_gmem_prepare_folio(kvm, file, slot, gfn, folio); folio_unlock(folio); if (r < 0) @@ -781,8 +792,10 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long p = src ? src + i * PAGE_SIZE : NULL; ret = post_populate(kvm, gfn, pfn, p, max_order, opaque); - if (!ret) - kvm_gmem_mark_prepared(folio); + if (!ret) { + pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff; + kvm_gmem_mark_prepared(file, index, folio); + } put_folio_and_exit: folio_put(folio); From patchwork Fri Nov 8 15:50:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13868411 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8966621F4AC for ; Fri, 8 Nov 2024 15:51:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081076; cv=none; b=EOL/1x6ZKIbSCTvQcRYxpNOZnwwxD2cHXSS09i7KUTihcrFA3mN0+N+rHyGhYqhkGmzW1ugieN5WRDKdrXL1xsM2WvrIMrLNnsqKMQjOuCQCknaqfWIE27Da/hlRUNRyCMbXLCGyq3YhmjdR5/6ElTJWIMzZ+PjoqzMRa4vmtVI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081076; c=relaxed/simple; bh=wss1YTUGIt5QGo6tSV9djIgAdvk1YIPYCs+TInKCU54=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gD12B1oM5DgEm5At0Ts/ZJd57oI8i7ps0eBeVoum/x/Bs5NVAx4CdjTW60aSL550bUzYa3a1NA043CVhuEakFrEcRBpXEQHK/Jap2V9djetw1lCacwEqYTvqeoTUe5Uy8dz+63LM89LzB8J2f9nDzofJvBlPiu+jX73gdbm9lXg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SxTWXzbO; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SxTWXzbO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731081073; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+bBieyHX2d1zYDc3Erl++MDH1EG7vt2Iio6NE6nW7HE=; b=SxTWXzbOHzryeIOTAg9Y7nSY2kTafyvVOaJqKYUD5fA/FpCpG1YTtnwUU9ruVyeabbBWZx lK8ygCADKd6WMBgGfO2YuJtxA+GBXuNTLqc+pP0UvXEkOrTcF6l1wYlITNl5FNmzr+t5qt Uwz/uV/vbcgZyC2803oPJe3fmgHIKoc= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-347-z2_nwBsXMMmZ3Q4pqaaPwA-1; Fri, 08 Nov 2024 10:51:07 -0500 X-MC-Unique: z2_nwBsXMMmZ3Q4pqaaPwA-1 X-Mimecast-MFC-AGG-ID: z2_nwBsXMMmZ3Q4pqaaPwA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 01C4819560B0; Fri, 8 Nov 2024 15:51:06 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 47440300019E; Fri, 8 Nov 2024 15:51:05 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: michael.roth@amd.com, seanjc@google.com Subject: [PATCH 3/3] KVM: gmem: track preparedness a page at a time Date: Fri, 8 Nov 2024 10:50:56 -0500 Message-ID: <20241108155056.332412-4-pbonzini@redhat.com> In-Reply-To: <20241108155056.332412-1-pbonzini@redhat.com> References: <20241108155056.332412-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 With support for large pages in gmem, it may happen that part of the gmem is mapped with large pages and part with 4k pages. For example, if a conversion happens on a small region within a large page, the large page has to be smashed into small pages even if backed by a large folio. Each of the small pages will have its own state of preparedness, which makes it harder to use the uptodate flag for preparedness. Just switch to a bitmap in the inode's i_private data. This is a bit gnarly because ordinary bitmap operations in Linux are not atomic, but otherwise not too hard. Signed-off-by: Paolo Bonzini --- virt/kvm/guest_memfd.c | 103 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 100 insertions(+), 3 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 416e02a00cae..e08503dfdd8a 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -68,8 +68,13 @@ static struct file *kvm_gmem_create_file(const char *name, const struct file_ope } +#define KVM_GMEM_INODE_SIZE(size) \ + struct_size_t(struct kvm_gmem_inode, prepared, \ + DIV_ROUND_UP(size, PAGE_SIZE * BITS_PER_LONG)) + struct kvm_gmem_inode { unsigned long flags; + unsigned long prepared[]; }; struct kvm_gmem { @@ -107,18 +112,110 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo return 0; } +/* + * The bitmap of prepared pages has to be accessed atomically, because + * preparation is not protected by any guest. This unfortunately means + * that we cannot use regular bitmap operations. + * + * The logic becomes a bit simpler for set and test, which operate a + * folio at a time and therefore can assume that the range is naturally + * aligned (meaning that either it is smaller than a word, or it is does + * not include fractions of a word). For punch-hole operations however + * there is all the complexity. + */ + +static void bitmap_set_atomic_word(unsigned long *p, unsigned long start, unsigned long len) +{ + unsigned long mask_to_set = + BITMAP_FIRST_WORD_MASK(start) & BITMAP_LAST_WORD_MASK(start + len); + + atomic_long_or(mask_to_set, (atomic_long_t *)p); +} + +static void bitmap_clear_atomic_word(unsigned long *p, unsigned long start, unsigned long len) +{ + unsigned long mask_to_set = + BITMAP_FIRST_WORD_MASK(start) & BITMAP_LAST_WORD_MASK(start + len); + + atomic_long_andnot(mask_to_set, (atomic_long_t *)p); +} + +static bool bitmap_test_allset_word(unsigned long *p, unsigned long start, unsigned long len) +{ + unsigned long mask_to_set = + BITMAP_FIRST_WORD_MASK(start) & BITMAP_LAST_WORD_MASK(start + len); + + return (*p & mask_to_set) == mask_to_set; +} + static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, struct folio *folio) { - folio_mark_uptodate(folio); + struct kvm_gmem_inode *i_gmem = (struct kvm_gmem_inode *)file->f_inode->i_private; + unsigned long *p = i_gmem->prepared + BIT_WORD(index); + unsigned long npages = folio_nr_pages(folio); + + /* Folios must be naturally aligned */ + WARN_ON_ONCE(index & (npages - 1)); + index &= ~(npages - 1); + + /* Clear page before updating bitmap. */ + smp_wmb(); + + if (npages < BITS_PER_LONG) { + bitmap_set_atomic_word(p, index, npages); + } else { + BUILD_BUG_ON(BITS_PER_LONG != 64); + memset64((u64 *)p, ~0, BITS_TO_LONGS(npages)); + } } static void kvm_gmem_mark_range_unprepared(struct inode *inode, pgoff_t index, pgoff_t npages) { + struct kvm_gmem_inode *i_gmem = (struct kvm_gmem_inode *)inode->i_private; + unsigned long *p = i_gmem->prepared + BIT_WORD(index); + + index &= BITS_PER_LONG - 1; + if (index) { + int first_word_count = min(npages, BITS_PER_LONG - index); + bitmap_clear_atomic_word(p, index, first_word_count); + npages -= first_word_count; + p++; + } + + if (npages > BITS_PER_LONG) { + BUILD_BUG_ON(BITS_PER_LONG != 64); + memset64((u64 *)p, 0, BITS_TO_LONGS(npages)); + p += BIT_WORD(npages); + npages &= BITS_PER_LONG - 1; + } + + if (npages) + bitmap_clear_atomic_word(p++, 0, npages); } static bool kvm_gmem_is_prepared(struct file *file, pgoff_t index, struct folio *folio) { - return folio_test_uptodate(folio); + struct kvm_gmem_inode *i_gmem = (struct kvm_gmem_inode *)file->f_inode->i_private; + unsigned long *p = i_gmem->prepared + BIT_WORD(index); + unsigned long npages = folio_nr_pages(folio); + bool ret; + + /* Folios must be naturally aligned */ + WARN_ON_ONCE(index & (npages - 1)); + index &= ~(npages - 1); + + if (npages < BITS_PER_LONG) { + ret = bitmap_test_allset_word(p, index, npages); + } else { + for (; npages > 0; npages -= BITS_PER_LONG) + if (*p++ != ~0) + break; + ret = (npages == 0); + } + + /* Synchronize with kvm_gmem_mark_prepared(). */ + smp_rmb(); + return ret; } /* @@ -499,7 +596,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) struct file *file; int fd, err; - i_gmem = kvzalloc(sizeof(struct kvm_gmem_inode), GFP_KERNEL); + i_gmem = kvzalloc(KVM_GMEM_INODE_SIZE(size), GFP_KERNEL); if (!i_gmem) return -ENOMEM; i_gmem->flags = flags; From patchwork Fri Nov 8 16:32:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 13868484 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42F751BD9DB for ; Fri, 8 Nov 2024 16:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731083557; cv=none; b=BDYi4juGU0CWsqCMQK0f+d/cBJC7ag0EOJ1s2MkvPmiWYtu5GjUBpuXtBnZEdbwM1cz6c3tmJJ9cB8LM+uzhtcQUXxhzvnCGODqWC5kusTZ9lRmnnVqbBvcFn09GCaKDjOdCOCmFZ0nuYyARPlySww7cierIyAr0j/IjwWj4uJ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731083557; c=relaxed/simple; bh=AnvrjaJy/vrLSe7sUgafShkk6/FA90lKRMl8dVnl0C8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=p28/ukZomfL2snARUkux94wJJ+gs0k6mP68OpWohOy7nfZ+jCPuW9nsVjR0qWnKdA0tJxG3kR+6WHfzDRAsfpjH0vWqOuUp60N64G2zC2TQ4JEFwwwR1IOteerdsSpd+xvUvLIB0drczRLsRpIeKl2cz52VMviwizMH2O9QXD+E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dg8ubsjH; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dg8ubsjH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731083555; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6ARNxrMJwiTf6cTrp5WO9zitBzCD/zba3zIGWs6M4Po=; b=dg8ubsjHn3+fRx31BLvYCeb4G5ORk4f7+izxo9u7Zkww+zw1aqI6Ehe2/NoMJ3xpynh1G+ +zYcDj5GTvo7UJc2XVkhgbpyxMRpnbHmXrZ5BeXVfwXLqDUqiEDE7cEQRwABvOLARr4KEh 46spXf4hbBxWgWLFBHs2QA3QN3wdpO8= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-596-lhtoJS2zPsC6jXKvUtR1zg-1; Fri, 08 Nov 2024 11:32:31 -0500 X-MC-Unique: lhtoJS2zPsC6jXKvUtR1zg-1 X-Mimecast-MFC-AGG-ID: lhtoJS2zPsC6jXKvUtR1zg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3F29A1955F41; Fri, 8 Nov 2024 16:32:30 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6E95C195607C; Fri, 8 Nov 2024 16:32:29 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: michael.roth@amd.com, seanjc@google.com Subject: [PATCH 2.5/3] KVM: gmem: limit hole-punching to ranges within the file Date: Fri, 8 Nov 2024 11:32:28 -0500 Message-ID: <20241108163228.374110-1-pbonzini@redhat.com> In-Reply-To: <20241108155056.332412-1-pbonzini@redhat.com> References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Do not pass out-of-bounds values to kvm_gmem_mark_range_unprepared(). Signed-off-by: Paolo Bonzini --- virt/kvm/guest_memfd.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) Sent separately because I thought this was a bug also in the current code but, on closer look, it is fine because ksys_fallocate checks that there is no overflow. diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 412d49c6d491..7dc89ceef782 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -324,10 +324,17 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start, static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) { struct list_head *gmem_list = &inode->i_mapping->i_private_list; - pgoff_t start = offset >> PAGE_SHIFT; - pgoff_t end = (offset + len) >> PAGE_SHIFT; + loff_t size = i_size_read(inode); + pgoff_t start, end; struct kvm_gmem *gmem; + if (offset > size) + return 0; + + len = min(size - offset, len); + start = offset >> PAGE_SHIFT; + end = (offset + len) >> PAGE_SHIFT; + /* * Bindings must be stable across invalidation to ensure the start+end * are balanced.