From patchwork Tue Sep 10 23:43:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13799488 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6251BEE01F4 for ; Tue, 10 Sep 2024 23:45:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC1418D00DA; Tue, 10 Sep 2024 19:45:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B49EB8D0002; Tue, 10 Sep 2024 19:45:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94D948D00DA; Tue, 10 Sep 2024 19:45:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6D8018D0002 for ; Tue, 10 Sep 2024 19:45:01 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2A0DE1A0B68 for ; Tue, 10 Sep 2024 23:45:01 +0000 (UTC) X-FDA: 82550461602.19.B907748 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 5439D120006 for ; Tue, 10 Sep 2024 23:44:59 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Koe0dSvt; spf=pass (imf29.hostedemail.com: domain of 3-dngZgsKCG4MOWQdXQkfZSSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--ackerleytng.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3-dngZgsKCG4MOWQdXQkfZSSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726011762; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3koUG4NR7/VYilG4JhqRWXEMHNTy5kmn0YA+V+1Maxk=; b=P7/hYeUdO4LO7uINPX1TNIAjHMJQyKggPLrczj4M8625reQe0buB5a/F051oAwEUWUmT9z 2ibQ8YVMaEubEWY9T7jPICxLQJIRtJ7ToFapE1/R6AwlWS195kFE2ArWGaUHMH9I5jOmy1 BidQDNxX9UfnSR70lF3Yeo84tU7OQpE= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Koe0dSvt; spf=pass (imf29.hostedemail.com: domain of 3-dngZgsKCG4MOWQdXQkfZSSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--ackerleytng.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3-dngZgsKCG4MOWQdXQkfZSSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726011762; a=rsa-sha256; cv=none; b=fgxFivKo1OeSFjQzJlkP4dUxTD0vbReVSf19Flz/1swzH4p9+Nc2aaBeUrwfXnZOEzy2/C sis/aGp6fNdpOrr34dHxxuIUy/XyCdZhwXx4pbTFP1zP61WmMSjXDkCq9zrAvGC5lUZrtB qTx6TFEanAjHmUaIYa9PRSedzxCKojw= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-5e4df21f22dso302010a12.0 for ; Tue, 10 Sep 2024 16:44:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726011898; x=1726616698; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3koUG4NR7/VYilG4JhqRWXEMHNTy5kmn0YA+V+1Maxk=; b=Koe0dSvtgAH1ta1t1PyiQAFYx7G+MBTM5t7S2G0Vje7O1T/NC//rk6GaWOZwmGJAR4 5ZbaWwDKEjCoMwfppjsZVRsNdvREZmKZcjZE0hfj8UoaaizdbxLh9hqRzBDrjYnkaM0D HxKcONXosdWLWgdmNqZHlFtj/iRAPZ1fpeRm6oEKaLH7YYicvckmzOWPEfmV7R+7lz/V 1KG6Rgn/sEbHVMKzSPksR2z65s+aAjSm2eVrv2TKFOfEXyE/ABNxppdxauHDI3f1uuEU ZIg09D1tcOku23hr89FZAWuIm7KcM/fijH2GVPBd1d2ZUXwM8pk/N4M8ReARRladfLup aD0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726011898; x=1726616698; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3koUG4NR7/VYilG4JhqRWXEMHNTy5kmn0YA+V+1Maxk=; b=HJy3rsfuaAnjVeT8ZdLmeoxiMRZmy6JhS7Ss0rpJEfKFW53Rx+6tnYK+c0rEX+bY00 aPtuEiA+DsDCagvLA8Wjcpc+toaYwf4c9mEqKsCugNPRNKT4aqcwubpIWsbSi+iiE91/ cTL7xtiFyFNErzmFWj7D7vIk2aoqEPxb5KwgxYs6gPt+u7/ZbVWxy5yd2PyFiY7G2y/C CljDszcWwteJze5Vi1CCSbCC1V6QcJ0BjQNmt2AH8iKTF9wFLm5crULdThobbR4/G6x+ waIFZrswE3EEblW031YerHrIQTxGdtGNqqbS0rTi1NRPwJYJbz5TGUMXs/2E8xSq/2Yw IjKQ== X-Forwarded-Encrypted: i=1; AJvYcCUAHGAVrXdX06lbR4D4OLZjri4+yzZar/w1OGeHar4WblFwhg3gwX1XCcqg5OVJhwGwfFMpzLRQXw==@kvack.org X-Gm-Message-State: AOJu0YxJp/d0fziw+d/3V7RcqwhIT65M8RTdQ0WNeLlUrOPCRKqqfO/S 54YK6oASGQ5afWN+GdxKfIEMY9QpzM+wQ8hh7s+mWzEvohGFfyn5w3VY7eCC0xoc5zMsSv5IsO3 g5DWXrtWyu6Wc7QiUvNwDLQ== X-Google-Smtp-Source: AGHT+IEDqZZk8irFHoo8VNnnGmulx1leYo1hJT0YyoAnYorw/TvU2WADq0CylGqiF1u5/YMn1zVKE8FaP8ki3Qm88w== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a63:8f5e:0:b0:6e3:a2ac:efd4 with SMTP id 41be03b00d2f7-7db088941ecmr7238a12.6.1726011897848; Tue, 10 Sep 2024 16:44:57 -0700 (PDT) Date: Tue, 10 Sep 2024 23:43:46 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: <768488c67540aa18c200d7ee16e75a3a087022d4.1726009989.git.ackerleytng@google.com> Subject: [RFC PATCH 15/39] KVM: guest_memfd: hugetlb: allocate and truncate from hugetlb From: Ackerley Tng To: tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, peterx@redhat.com, david@redhat.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, mike.kravetz@oracle.com Cc: erdemaktas@google.com, vannapurve@google.com, ackerleytng@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-fsdevel@kvack.org X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5439D120006 X-Stat-Signature: g5rg36qpnx5wnaao56eq5ab33k4an59m X-Rspam-User: X-HE-Tag: 1726011899-384319 X-HE-Meta: U2FsdGVkX19FU4RNwSR1LRgtK4U904i9ReTkWlmZkxRU2kAe67Dr72KfaDSUQIgoHhzD9lz0mn2DiB3g7e1Aaim3BFwTviqdHBYY9hLZr3LhJfnz7IbxIu99lZGAK+3OY9NWyF15SLW3OVASOO1M6MR4mHkABXCCe86j2yYwGf+2Aoa5up94BRJrkbFC2iJ1ye1sNaPDN/1/yZGbsUo13I3+5z7qbDQznO7WaJ5po361a92q38ZXqfMmJZNLrgExMNkPXeNqPeUvef1UFAKA6mijrp6n+aWslH/kaaklm+GZEGzGngIrJJs7FHaE0sz4RoruE9IFn5fc1ZFQfm/DZvM8ZH4e0+naJ4dx6OenWqm333op3qgy5Im3SW7Z5rUPdvXS8b4g0hh1RPy9AeIzGtOM63uMIM0bnrDLfD/QeAIsTpaltRW2gGte7RP4Ma6nZtz03HuWjNHhSTghTVCK4OBlXVQS5Xl1PcIHbu9EDEU6ZXGM2OY9HjqGOFbqFgsSxhsTAizdHfCI/+o53cDeY1gXN8SUaSOMOq4o/kdha/XDiBgdN9QwL5XNlPYQ1dwgrGoCCpsMhKLbktKsegQydH2+/dGJ0pPvByiMPHiAbEI2XKpWxJBmEvzfX3tcFnO+X5vAI14THpo/1MzJ93hyCzaZK5aHdegQvHllk082DQT/+9Z/bIIrzotKO+D1b/voy81kCOHnkYbQ1vJEJX13j8XvTOk0gRUWLaUUfmMdNoMGjHtG4hHWhaXE0IZR58xggio5VloDcYimlfMRj+rCPm6NR4nfSTq8nSEuFmydGeN0XUlPALPj3ogVJN1GSaf4S/qTxMvXA6NQSVpuOTBynWXsFzQ2Z1jW9bP9sUxxdF3WS26StTfFJQEIk5bhwGUbOhVjcQ6+eh18Z0+zGz+GKDuDi5V68593XQ5qySaEis4z+7HJTngBX/5efElAHzGR4qtJUXhzRCwT6wSsKC9 2pZHGxDP bOtoetr414hUpt3votaUMpcb9a4+A6rtg+M3KU8oLIjw/6aNuGCTaeoQid0bpHgYonZ8jt9C5t5ZSLo7i787JFQmA6TLr72r01J+TRICejpzmszzPoL+QJS1iPOvfe79CTEbeHkVupdCwk8769Hc9BO6q1u8zOQpODcpiJi0UBGtAaWS2FZD2QKj7AQI9/xiAZOGImSBhT81bXIxycwXw5d3AKxKpcQTjH7WFzgnnVWnbBgj8VZ7sYrVtISUYEruEZ8xjKgo30PYFwBHdUKwCNlOTQVgjvuUMnTMeC5KPrRC8RYnogUZzKB1/7ac0e28FCQlSDQc9t4U/wGX4Wgg5wj5qFaW9vmlGsDI1JQrj8erDOOp5314lzqW1RFOYr6NORQYoerQsNJ8VCjaMw074vrGmxYq+aW1ACS/GTT1LhCbEuTxLyIK3wPGc7Uty6G1IrowTHVFwPtT2lGMJ7UFoIXKkHY4LQAdcvUQLVbTp7/N8TvWYD3AAtvUJUuuMZE+x7lUZJiJUYcegOObvLb5KH+5Zz9i3B8xbir0MQaYhkoNK+1yx7kwDsPdsOSeklbWeDgw1YoHxSGDunaYOLehS56KT3xg7xxWp+PCJmdgvIl2OLgmsfFIus6pejohbWlbhebjR22NKY9hkcvwzsU6NxuORpK0Lz3tmajmzdviZNKkc2BQaHwsoWbted/CRe3+FYVY8PG4Zhn23qfk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: If HugeTLB is requested at guest_memfd creation time, HugeTLB pages will be used to back guest_memfd. Signed-off-by: Ackerley Tng --- virt/kvm/guest_memfd.c | 252 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 239 insertions(+), 13 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 31e1115273e1..2e6f12e2bac8 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -8,6 +8,8 @@ #include #include #include +#include +#include #include "kvm_mm.h" @@ -29,6 +31,13 @@ static struct kvm_gmem_hugetlb *kvm_gmem_hgmem(struct inode *inode) return inode->i_mapping->i_private_data; } +static bool is_kvm_gmem_hugetlb(struct inode *inode) +{ + u64 flags = (u64)inode->i_private; + + return flags & KVM_GUEST_MEMFD_HUGETLB; +} + /** * folio_file_pfn - like folio_file_page, but return a pfn. * @folio: The folio which contains this index. @@ -58,6 +67,9 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo return 0; } +/** + * Use the uptodate flag to indicate that the folio is prepared for KVM's usage. + */ static inline void kvm_gmem_mark_prepared(struct folio *folio) { folio_mark_uptodate(folio); @@ -72,13 +84,18 @@ static inline void kvm_gmem_mark_prepared(struct folio *folio) static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, struct folio *folio) { - unsigned long nr_pages, i; pgoff_t index; int r; - nr_pages = folio_nr_pages(folio); - for (i = 0; i < nr_pages; i++) - clear_highpage(folio_page(folio, i)); + if (folio_test_hugetlb(folio)) { + folio_zero_user(folio, folio->index << PAGE_SHIFT); + } else { + unsigned long nr_pages, i; + + nr_pages = folio_nr_pages(folio); + for (i = 0; i < nr_pages; i++) + clear_highpage(folio_page(folio, i)); + } /* * Preparing huge folios should always be safe, since it should @@ -103,6 +120,174 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, return r; } +static int kvm_gmem_get_mpol_node_nodemask(gfp_t gfp_mask, + struct mempolicy **mpol, + nodemask_t **nodemask) +{ + /* + * TODO: mempolicy would probably have to be stored on the inode, use + * task policy for now. + */ + *mpol = get_task_policy(current); + + /* TODO: ignore interleaving (set ilx to 0) for now. */ + return policy_node_nodemask(*mpol, gfp_mask, 0, nodemask); +} + +static struct folio *kvm_gmem_hugetlb_alloc_folio(struct hstate *h, + struct hugepage_subpool *spool) +{ + bool memcg_charge_was_prepared; + struct mem_cgroup *memcg; + struct mempolicy *mpol; + nodemask_t *nodemask; + struct folio *folio; + gfp_t gfp_mask; + int ret; + int nid; + + gfp_mask = htlb_alloc_mask(h); + + memcg = get_mem_cgroup_from_current(); + ret = mem_cgroup_hugetlb_try_charge(memcg, + gfp_mask | __GFP_RETRY_MAYFAIL, + pages_per_huge_page(h)); + if (ret == -ENOMEM) + goto err; + + memcg_charge_was_prepared = ret != -EOPNOTSUPP; + + /* Pages are only to be taken from guest_memfd subpool and nowhere else. */ + if (hugepage_subpool_get_pages(spool, 1)) + goto err_cancel_charge; + + nid = kvm_gmem_get_mpol_node_nodemask(htlb_alloc_mask(h), &mpol, + &nodemask); + /* + * charge_cgroup_reservation is false because we didn't make any cgroup + * reservations when creating the guest_memfd subpool. + * + * use_hstate_resv is true because we reserved from global hstate when + * creating the guest_memfd subpool. + */ + folio = hugetlb_alloc_folio(h, mpol, nid, nodemask, false, true); + mpol_cond_put(mpol); + + if (!folio) + goto err_put_pages; + + hugetlb_set_folio_subpool(folio, spool); + + if (memcg_charge_was_prepared) + mem_cgroup_commit_charge(folio, memcg); + +out: + mem_cgroup_put(memcg); + + return folio; + +err_put_pages: + hugepage_subpool_put_pages(spool, 1); + +err_cancel_charge: + if (memcg_charge_was_prepared) + mem_cgroup_cancel_charge(memcg, pages_per_huge_page(h)); + +err: + folio = ERR_PTR(-ENOMEM); + goto out; +} + +static int kvm_gmem_hugetlb_filemap_add_folio(struct address_space *mapping, + struct folio *folio, pgoff_t index, + gfp_t gfp) +{ + int ret; + + __folio_set_locked(folio); + ret = __filemap_add_folio(mapping, folio, index, gfp, NULL); + if (unlikely(ret)) { + __folio_clear_locked(folio); + return ret; + } + + /* + * In hugetlb_add_to_page_cache(), there is a call to + * folio_clear_hugetlb_restore_reserve(). This is handled when the pages + * are removed from the page cache in unmap_hugepage_range() -> + * __unmap_hugepage_range() by conditionally calling + * folio_set_hugetlb_restore_reserve(). In kvm_gmem_hugetlb's usage of + * hugetlb, there are no VMAs involved, and pages are never taken from + * the surplus, so when pages are freed, the hstate reserve must be + * restored. Hence, this function makes no call to + * folio_clear_hugetlb_restore_reserve(). + */ + + /* mark folio dirty so that it will not be removed from cache/inode */ + folio_mark_dirty(folio); + + return 0; +} + +static struct folio *kvm_gmem_hugetlb_alloc_and_cache_folio(struct inode *inode, + pgoff_t index) +{ + struct kvm_gmem_hugetlb *hgmem; + struct folio *folio; + int ret; + + hgmem = kvm_gmem_hgmem(inode); + folio = kvm_gmem_hugetlb_alloc_folio(hgmem->h, hgmem->spool); + if (IS_ERR(folio)) + return folio; + + /* TODO: Fix index here to be aligned to huge page size. */ + ret = kvm_gmem_hugetlb_filemap_add_folio( + inode->i_mapping, folio, index, htlb_alloc_mask(hgmem->h)); + if (ret) { + folio_put(folio); + return ERR_PTR(ret); + } + + spin_lock(&inode->i_lock); + inode->i_blocks += blocks_per_huge_page(hgmem->h); + spin_unlock(&inode->i_lock); + + return folio; +} + +static struct folio *kvm_gmem_get_hugetlb_folio(struct inode *inode, + pgoff_t index) +{ + struct address_space *mapping; + struct folio *folio; + struct hstate *h; + pgoff_t hindex; + u32 hash; + + h = kvm_gmem_hgmem(inode)->h; + hindex = index >> huge_page_order(h); + mapping = inode->i_mapping; + + /* To lock, we calculate the hash using the hindex and not index. */ + hash = hugetlb_fault_mutex_hash(mapping, hindex); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + + /* + * The filemap is indexed with index and not hindex. Taking lock on + * folio to align with kvm_gmem_get_regular_folio() + */ + folio = filemap_lock_folio(mapping, index); + if (!IS_ERR(folio)) + goto out; + + folio = kvm_gmem_hugetlb_alloc_and_cache_folio(inode, index); +out: + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + + return folio; +} + /* * Returns a locked folio on success. The caller is responsible for * setting the up-to-date flag before the memory is mapped into the guest. @@ -114,8 +299,10 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, */ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) { - /* TODO: Support huge pages. */ - return filemap_grab_folio(inode->i_mapping, index); + if (is_kvm_gmem_hugetlb(inode)) + return kvm_gmem_get_hugetlb_folio(inode, index); + else + return filemap_grab_folio(inode->i_mapping, index); } static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, @@ -240,6 +427,35 @@ static void kvm_gmem_hugetlb_truncate_folios_range(struct inode *inode, spin_unlock(&inode->i_lock); } +static void kvm_gmem_hugetlb_truncate_range(struct inode *inode, loff_t lstart, + loff_t lend) +{ + loff_t full_hpage_start; + loff_t full_hpage_end; + unsigned long hsize; + struct hstate *h; + + h = kvm_gmem_hgmem(inode)->h; + hsize = huge_page_size(h); + + full_hpage_start = round_up(lstart, hsize); + full_hpage_end = round_down(lend, hsize); + + if (lstart < full_hpage_start) { + hugetlb_zero_partial_page(h, inode->i_mapping, lstart, + full_hpage_start); + } + + if (full_hpage_end > full_hpage_start) { + kvm_gmem_hugetlb_truncate_folios_range(inode, full_hpage_start, + full_hpage_end); + } + + if (lend > full_hpage_end) { + hugetlb_zero_partial_page(h, inode->i_mapping, full_hpage_end, + lend); + } +} static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) { @@ -257,7 +473,12 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_begin(gmem, start, end); - truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); + if (is_kvm_gmem_hugetlb(inode)) { + kvm_gmem_hugetlb_truncate_range(inode, offset, offset + len); + } else { + truncate_inode_pages_range(inode->i_mapping, offset, + offset + len - 1); + } list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_end(gmem, start, end); @@ -279,8 +500,15 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len) filemap_invalidate_lock_shared(mapping); - start = offset >> PAGE_SHIFT; - end = (offset + len) >> PAGE_SHIFT; + if (is_kvm_gmem_hugetlb(inode)) { + unsigned long hsize = huge_page_size(kvm_gmem_hgmem(inode)->h); + + start = round_down(offset, hsize) >> PAGE_SHIFT; + end = round_down(offset + len, hsize) >> PAGE_SHIFT; + } else { + start = offset >> PAGE_SHIFT; + end = (offset + len) >> PAGE_SHIFT; + } r = 0; for (index = start; index < end; ) { @@ -408,9 +636,7 @@ static void kvm_gmem_hugetlb_teardown(struct inode *inode) static void kvm_gmem_evict_inode(struct inode *inode) { - u64 flags = (u64)inode->i_private; - - if (flags & KVM_GUEST_MEMFD_HUGETLB) + if (is_kvm_gmem_hugetlb(inode)) kvm_gmem_hugetlb_teardown(inode); else truncate_inode_pages_final(inode->i_mapping); @@ -827,7 +1053,7 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, *pfn = folio_file_pfn(folio, index); if (max_order) - *max_order = 0; + *max_order = folio_order(folio); *is_prepared = folio_test_uptodate(folio); return folio;