From patchwork Tue Jun 6 19:03:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 77968C7EE32 for ; Tue, 6 Jun 2023 19:04:58 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bye-0001AN-GS; Tue, 06 Jun 2023 15:04:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3LoN_ZAsKCl057F9MG9TOIBBJJBG9.7JHL9HP-89Q9GIJIBIP.JMB@flex--ackerleytng.bounces.google.com>) id 1q6byd-00012v-AN for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:19 -0400 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3LoN_ZAsKCl057F9MG9TOIBBJJBG9.7JHL9HP-89Q9GIJIBIP.JMB@flex--ackerleytng.bounces.google.com>) id 1q6byb-0001wP-NU for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:19 -0400 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-ba83a9779f3so9814229276.1 for ; Tue, 06 Jun 2023 12:04:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078255; x=1688670255; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=W42CnlEhHXSQD8oU0aesgTIb+2nDDRqlSHCZLxoyAZw=; b=SI6wxzD3jNLZvjIR3ui3oLLkrgjrOnUdyqmkDtNB4LvaXzaOxSopYwQDXGSPUwGt0b d1EHBVyiG4hUUHi15P58bSdZq1hiIfxHNvpvia7Cxzf6d0mYTmKP/FDWWkzVCNVJSia8 WKQpt+ZebghqzxY/2xXY0xK+V+7VuTACGmy8UzpKH/NOajrF02Jvbazs2Qt1cP/5GZi3 Pxw8kkdCQkLaRblrUwFR5NCQzIvEfWgJISfEM1oMrdysWZB/g9FTr60M+vCWXu+AmqCG W3rBiG3Eit8E92Xwv+sO+1jySrdLIuEFp3Em5lURfr02pKqbadaZwr4BvewtCxcGcLfE Bs7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078255; x=1688670255; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W42CnlEhHXSQD8oU0aesgTIb+2nDDRqlSHCZLxoyAZw=; b=ZmmJN60niolT3YuC/vkgoqd00FQkVVTt8JWEVNP5+H81GyVQAb+Uj57IttzjL7o0Nb 2ReXxdg67F2VwMMTiRAZkZ2DuDEoD2dFontQFJXWpw05YAf6gZqoB+lDxkw5Qwyv2Cy6 Z94r0Ne37hMsv1VLjEpCbk9lfCqllm6R4XyOi3U1c8aWCwxswTsUu9vTHyTO28pZ4nBY MTWnHThM7UGqYogQLBXarMCJ9duUlXxFs1j9h+kWECqbYXYCAvOCijAdHiSVhU0Xm0/7 6oEBK82B7FnTpWdKb2v8uy+FBnifsOIEJdZ6uI6qUiIt11JWcUb0dZjdnvwFWmSewExO 8mcg== X-Gm-Message-State: AC+VfDyT60ZRXKGqRPlcndy5gvs6WfcxO6XKUJsVIbhpJKmf/fUqrsJM MPILJcqagxRYg/AHkd8Mvjb3F9lcRqxvujIcxA== X-Google-Smtp-Source: ACHHUZ535voW+NPlWmi/k960ILK6GzLJuoo4o0Ye5mrBBjql7Wgmyk28/ve76hKFr5oQ8deqfVoTqFa20GrqC6sA9g== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a25:2ce:0:b0:bb3:ac6a:6d61 with SMTP id 197-20020a2502ce000000b00bb3ac6a6d61mr1216938ybc.3.1686078254851; Tue, 06 Jun 2023 12:04:14 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:46 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 01/19] mm: hugetlb: Expose get_hstate_idx() From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::b4a; envelope-from=3LoN_ZAsKCl057F9MG9TOIBBJJBG9.7JHL9HP-89Q9GIJIBIP.JMB@flex--ackerleytng.bounces.google.com; helo=mail-yb1-xb4a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Expose get_hstate_idx() so it can be used from KVM's guest_mem code Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 9 --------- include/linux/hugetlb.h | 14 ++++++++++++++ 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 9062da6da567..406d7366cf3e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1560,15 +1560,6 @@ static int can_do_hugetlb_shm(void) return capable(CAP_IPC_LOCK) || in_group_p(shm_group); } -static int get_hstate_idx(int page_size_log) -{ - struct hstate *h = hstate_sizelog(page_size_log); - - if (!h) - return -1; - return hstate_index(h); -} - /* * Note that size should be aligned to proper hugepage size in caller side, * otherwise hugetlb_reserve_pages reserves one less hugepages than intended. diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 7c977d234aba..37c2edf7beea 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -876,6 +876,15 @@ static inline int hstate_index(struct hstate *h) return h - hstates; } +static inline int get_hstate_idx(int page_size_log) +{ + struct hstate *h = hstate_sizelog(page_size_log); + + if (!h) + return -1; + return hstate_index(h); +} + extern int dissolve_free_huge_page(struct page *page); extern int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn); @@ -1142,6 +1151,11 @@ static inline int hstate_index(struct hstate *h) return 0; } +static inline int get_hstate_idx(int page_size_log) +{ + return 0; +} + static inline int dissolve_free_huge_page(struct page *page) { return 0; From patchwork Tue Jun 6 19:03:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269573 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5A229C77B73 for ; Tue, 6 Jun 2023 19:05:37 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byh-0001Hi-Ix; Tue, 06 Jun 2023 15:04:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3MIN_ZAsKCl879HBOIBVQKDDLLDIB.9LJNBJR-ABSBIKLKDKR.LOD@flex--ackerleytng.bounces.google.com>) id 1q6byg-0001Ds-3v for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:22 -0400 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3MIN_ZAsKCl879HBOIBVQKDDLLDIB.9LJNBJR-ABSBIKLKDKR.LOD@flex--ackerleytng.bounces.google.com>) id 1q6byd-0001xM-DX for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:21 -0400 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bb3855c34deso1972054276.2 for ; Tue, 06 Jun 2023 12:04:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078257; x=1688670257; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+hBYwj+hDVdRsN8/1AwobMe3CTJKx3rv/kAyLXc58so=; b=H7E3FiMFKx8wp9Dwc4KoRhYITqBdXBFbeuSdYVU7Xl2quiLV1TI/RBj6a84p3NhH+A 6Ot5kD4xdeERpXhx7UwpPyWytx0ckODJexzs4v23BK+jBmvA/ejV4+lVF1eJPpgHviEu uumCUBgaOBD3ECdp59KweUCJLiBB+HqC6NZrMNuSqXw8B1oiojQQln/v/JBqZ3dwGu23 pfO6fIO6tANLPLyeFwUcf8Qi72Eo5MQO/Xul1ZcTopSBQKuO94fsqUr8V4qbQI+x7jBp uRAq7xDxsM2mm1v+/TKXiQUaFoQhkNFaDB59XiYo7wFgeMK27ohvJb62IAQGNTsPTIVZ PvjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078257; x=1688670257; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+hBYwj+hDVdRsN8/1AwobMe3CTJKx3rv/kAyLXc58so=; b=GAY9/jc8N6nWQFB8VN6TlGt7DtEnSII7hdZlCld8v+ATLsqFTr4Mw6phpCnWRHtdAN lJIVIN00N49EGB1wXl/ByVSlNybkiHhaj4B6T6IlGrnaQch78/Qsv1TMNX5eDimsL16v DJ4cTqqUWxI8OGAVImZppI5wvNdjwndIEthKCAR5NI070etKpr1tiglJf4j93E6fQh0d rGWs5uAc+8KPmMUbmyaDGePjApUWVCssCVOeyLijvE9y6OoPtVmL7ill9oj0Mx11a6rj V3nfPnRq5H8UIEgFIOQsoglpqAOHZl2C9Ty3f9S91Ksu/TpwCTE4o8n6Er0/PR+1oNI4 yeKg== X-Gm-Message-State: AC+VfDw2cyqdBsq9jcFK9ircM3UtHQ/fBBskVTjKxoczsHR7iQmjIBpF XbDelnA4Gd/Gz8gDHR2QUDboMSwejAWmHsF4iQ== X-Google-Smtp-Source: ACHHUZ68tuaeB7ur+Gj0U5ZR5IxxzkEh+TRxbplImu5DoiCt68fH6drAzpWUcd9Di+kjyJGUd+Eua/GN38BmucMMtA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:1882:b0:ba8:337a:d8a3 with SMTP id cj2-20020a056902188200b00ba8337ad8a3mr1661367ybb.11.1686078256773; Tue, 06 Jun 2023 12:04:16 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:47 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <97c312c8c0b56218454d546a540a3ea2e2a825e2.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 02/19] mm: hugetlb: Move and expose hugetlbfs_zero_partial_page From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::b4a; envelope-from=3MIN_ZAsKCl879HBOIBVQKDDLLDIB.9LJNBJR-ABSBIKLKDKR.LOD@flex--ackerleytng.bounces.google.com; helo=mail-yb1-xb4a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Zeroing of pages is generalizable to hugetlb and is not specific to hugetlbfs. Rename hugetlbfs_zero_partial_page => hugetlb_zero_partial_page, move it to mm/hugetlb.c and expose it in linux/hugetlb.h. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 27 ++------------------------- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 22 ++++++++++++++++++++++ 3 files changed, 30 insertions(+), 25 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 406d7366cf3e..3dab50d3ed88 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -688,29 +688,6 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) remove_inode_hugepages(inode, offset, LLONG_MAX); } -static void hugetlbfs_zero_partial_page(struct hstate *h, - struct address_space *mapping, - loff_t start, - loff_t end) -{ - pgoff_t idx = start >> huge_page_shift(h); - struct folio *folio; - - folio = filemap_lock_folio(mapping, idx); - if (!folio) - return; - - start = start & ~huge_page_mask(h); - end = end & ~huge_page_mask(h); - if (!end) - end = huge_page_size(h); - - folio_zero_segment(folio, (size_t)start, (size_t)end); - - folio_unlock(folio); - folio_put(folio); -} - static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) { struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode); @@ -737,7 +714,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) /* If range starts before first full page, zero partial page. */ if (offset < hole_start) - hugetlbfs_zero_partial_page(h, mapping, + hugetlb_zero_partial_page(h, mapping, offset, min(offset + len, hole_start)); /* Unmap users of full pages in the hole. */ @@ -750,7 +727,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) /* If range extends beyond last full page, zero partial page. */ if ((offset + len) > hole_end && (offset + len) > hole_start) - hugetlbfs_zero_partial_page(h, mapping, + hugetlb_zero_partial_page(h, mapping, hole_end, offset + len); i_mmap_unlock_write(mapping); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 37c2edf7beea..023293ceec25 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -256,6 +256,9 @@ long hugetlb_change_protection(struct vm_area_struct *vma, bool is_hugetlb_entry_migration(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); +void hugetlb_zero_partial_page(struct hstate *h, struct address_space *mapping, + loff_t start, loff_t end); + #else /* !CONFIG_HUGETLB_PAGE */ static inline void hugetlb_dup_vma_private(struct vm_area_struct *vma) @@ -464,6 +467,9 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm, static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { } +static inline void hugetlb_zero_partial_page( + struct hstate *h, struct address_space *mapping, loff_t start, loff_t end) {} + #endif /* !CONFIG_HUGETLB_PAGE */ /* * hugepages at page global directory. If arch support diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07abcb6eb203..9c9262833b4f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7407,6 +7407,28 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) ALIGN_DOWN(vma->vm_end, PUD_SIZE)); } +void hugetlb_zero_partial_page(struct hstate *h, + struct address_space *mapping, + loff_t start, loff_t end) +{ + pgoff_t idx = start >> huge_page_shift(h); + struct folio *folio; + + folio = filemap_lock_folio(mapping, idx); + if (!folio) + return; + + start = start & ~huge_page_mask(h); + end = end & ~huge_page_mask(h); + if (!end) + end = huge_page_size(h); + + folio_zero_segment(folio, (size_t)start, (size_t)end); + + folio_unlock(folio); + folio_put(folio); +} + #ifdef CONFIG_CMA static bool cma_reserve_called __initdata; From patchwork Tue Jun 6 19:03:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269569 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B19CAC7EE33 for ; Tue, 6 Jun 2023 19:04:58 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byh-0001Go-7P; Tue, 06 Jun 2023 15:04:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3MoN_ZAsKCmE9BJDQKDXSMFFNNFKD.BNLPDLT-CDUDKMNMFMT.NQF@flex--ackerleytng.bounces.google.com>) id 1q6byf-0001B3-AW for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:21 -0400 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3MoN_ZAsKCmE9BJDQKDXSMFFNNFKD.BNLPDLT-CDUDKMNMFMT.NQF@flex--ackerleytng.bounces.google.com>) id 1q6byd-0001y4-Qj for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:21 -0400 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-ba88ec544ddso10411091276.1 for ; Tue, 06 Jun 2023 12:04:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078258; x=1688670258; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4awdlrIAeNpwHVWeIID1BUVh8RCTYusTWiK5XfdHg+o=; b=EM6Va8BzGwCED0zIh64Bnjix1yyjlAzWk5b48JtiXuhP31W1mbMccgQlZPFYy2Rc8v tMovnZrmg6i3waIkF808VpF/dEtWacRQ+yv83fbRyzUMvkWnAQduvEvPNXK1sSt1N2Cy PtYuLHfMT8qcUe5nsKygK7f4ppJDBYrYq77vKTy6tbx4tH0TKa3ANgH0owgHfn5b1Tx1 OI+RAo/8et/7ti+hO4SSDEjjeJCABpVM+ohklHFBBqTgC2zed5wK6sgfxTCj5lcV8B7r QzmY/WCfEZ0p/5xV7lrrRx2FQaR4Eg7AC1b0TCHXbKQd5gKgaSLxDQ+6klKWy+rQ6YtV ePmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078258; x=1688670258; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4awdlrIAeNpwHVWeIID1BUVh8RCTYusTWiK5XfdHg+o=; b=TVvtHoXzbRQLKc0zegnnhSGnDRMaNa+HVTA5tcS+Z7miTgrY6H5+CGIqU6yAxMIXjI FI4BbuHRqKRmZsOvuUcFgTPglgN1MpuK4n6D3od15qkNmGln5VbUhE1BM3lr8p1w7Dd7 z+qpyzM0H+oRTfO+XmrPmT7q77gQJtTtOC0C24B5R0RWdIsdSe5k1GD7tfZX2upfLgiI zU5cQD5eqi/gSoaprCWbzF2ApZ5TAWr0mdlauPk5xAaJ1md0GjQvdkd5g9yZgdlkN5ga 94l4jKugeWxV+aX2wusuJnBP90lqCjh+GbdzjplTCLagMcM9noYEnD0xzwq60XuhQTcy ZrkQ== X-Gm-Message-State: AC+VfDw0DJjgWzXGB+cUAjcVLk/SfYVNONkTl63zVUh75gYlaQd/qDpQ 4kZrFiNgmfwiyiaJvMxmJ7l3x77UOr3+9V6kHg== X-Google-Smtp-Source: ACHHUZ7qnoxO8i1wmNBAi7fatM+eI5rwKFB4S/AWZNBrdb9BhmqSWgFoaD3eNX27zoBKSk99WIOAutKPR4Hz0jbmxQ== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:10c8:b0:ba8:797c:9bc7 with SMTP id w8-20020a05690210c800b00ba8797c9bc7mr1725260ybu.11.1686078258501; Tue, 06 Jun 2023 12:04:18 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:48 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <0ae157ec9e196f353ecf9036dbffdc295c994817.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 03/19] mm: hugetlb: Expose remove_inode_hugepages From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::b4a; envelope-from=3MoN_ZAsKCmE9BJDQKDXSMFFNNFKD.BNLPDLT-CDUDKMNMFMT.NQF@flex--ackerleytng.bounces.google.com; helo=mail-yb1-xb4a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org TODO may want to move this to hugetlb Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 3 +-- include/linux/hugetlb.h | 4 ++++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3dab50d3ed88..4f25df31ae80 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -611,8 +611,7 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. */ -static void remove_inode_hugepages(struct inode *inode, loff_t lstart, - loff_t lend) +void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) { struct hstate *h = hstate_inode(inode); struct address_space *mapping = &inode->i_data; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 023293ceec25..1483020b412b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -259,6 +259,8 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); void hugetlb_zero_partial_page(struct hstate *h, struct address_space *mapping, loff_t start, loff_t end); +void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend); + #else /* !CONFIG_HUGETLB_PAGE */ static inline void hugetlb_dup_vma_private(struct vm_area_struct *vma) @@ -470,6 +472,8 @@ static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { } static inline void hugetlb_zero_partial_page( struct hstate *h, struct address_space *mapping, loff_t start, loff_t end) {} +static inline void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) {} + #endif /* !CONFIG_HUGETLB_PAGE */ /* * hugepages at page global directory. If arch support From patchwork Tue Jun 6 19:03:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269620 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E2924C77B7A for ; Tue, 6 Jun 2023 19:06:51 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bym-0001Jk-0g; Tue, 06 Jun 2023 15:04:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3NIN_ZAsKCmMBDLFSMFZUOHHPPHMF.DPNRFNV-EFWFMOPOHOV.PSH@flex--ackerleytng.bounces.google.com>) id 1q6byj-0001Ix-Uj for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:25 -0400 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3NIN_ZAsKCmMBDLFSMFZUOHHPPHMF.DPNRFNV-EFWFMOPOHOV.PSH@flex--ackerleytng.bounces.google.com>) id 1q6byg-0001yt-MP for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:25 -0400 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bacd408046cso9641054276.3 for ; Tue, 06 Jun 2023 12:04:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078260; x=1688670260; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EVsoBqdtoJX8/StMCoaJ2TBhLAuQb0ZIsYKuvX8foFc=; b=3bRhDOn2F1YoY78ybE5Buf9pXWlEWLIrZLl3zmMqCMxMWePwOymNgzPlL05No/cBty JhXR5aZftY97cDZvNkSb70a55Fkj8oothgT0n+qKkVwOjj7CYzQnLwO2wmEIMdOIIq24 jiWbdBDp6sWsJWroyTU36/XenlT2tddUva4/zzFiwtqHgjy72QN5YVUYSt+vJDoI3IyN VcmHCICw1jUnn1dcw1Z79WvsPN60OZDQjPZspnctaG3vX9a9apYfWKKt97pGmH1ZZWZ+ n8NTZLQKSqtuCbrmmo44T26y4ZuXSivXoMS2qmsA9OyppL+7KUrgyC0TXXVY+vLySr9n vPOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078260; x=1688670260; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EVsoBqdtoJX8/StMCoaJ2TBhLAuQb0ZIsYKuvX8foFc=; b=ZNPUzQrH6VEmufg+MhZejMf1GpeLwje8PN1dMAHDUdXwZBlFMIZDUyeHXasGD5Rukz CZ3oBkJS3BWgiWG7RnQd7ojXkTcTdnboPtdtyI8UWjcfNDd/EOHT6/sg2qJq2QLelmvx rPeR2Da1r8QZL7r9dqlED4FRETnFMnFgcpgelEpzQF9NAraH9WN5rN9LTFN4Fc3yVKqo tJIj1+kjS34j0Ji0qE6Xn8qqnyK41zgQ8ufa3+gszQ329uGlRyMJ/utZuKjwHgR2ioZ+ sJlcUajGSVPZLv5fltC4/BoHmWuN3Grtl8keAb97HOE9gtqzZ9n6/TSuoEJUtgxE05gh Va/w== X-Gm-Message-State: AC+VfDyn6Xs88Vvac+TFALpPzk6mm50OsMiNba/1z8GczynaI7sGllOj BXjdRIzzEYpo7gn/AUBtV9YGZXCL5PrVa4pUMA== X-Google-Smtp-Source: ACHHUZ7pHc1WWdEMTu1ZaLAHhToRSasMGs58ikGHS3ezVoQuVBFPIMzugkY1x4qddS4xfRUOsuXb06aDGCKa9mnCdw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:124f:b0:ba8:2e68:7715 with SMTP id t15-20020a056902124f00b00ba82e687715mr1639977ybu.2.1686078260457; Tue, 06 Jun 2023 12:04:20 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:49 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 04/19] mm: hugetlb: Decouple hstate, subpool from inode From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::b4a; envelope-from=3NIN_ZAsKCmMBDLFSMFZUOHHPPHMF.DPNRFNV-EFWFMOPOHOV.PSH@flex--ackerleytng.bounces.google.com; helo=mail-yb1-xb4a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org hstate and subpool being retrievable from inode via hstate_inode() and subpool_inode() respectively is a hugetlbfs concept. hugetlb should be agnostic of hugetlbfs and hugetlb accounting functions should accept hstate (required) and subpool (can be NULL) independently of inode. inode is still a parameter for these accounting functions since the inode's block counts need to be updated during accounting. The inode's resv_map will also still need to be updated if not NULL. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 59 ++++++++++++++++++++++++++++------------- include/linux/hugetlb.h | 32 +++++++++++++++++----- mm/hugetlb.c | 49 ++++++++++++++++++++-------------- 3 files changed, 95 insertions(+), 45 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 4f25df31ae80..0fc49b6252e4 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -164,7 +164,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) file_accessed(file); ret = -ENOMEM; - if (!hugetlb_reserve_pages(inode, + if (!hugetlb_reserve_pages(h, subpool_inode(inode), inode, vma->vm_pgoff >> huge_page_order(h), len >> huge_page_shift(h), vma, vma->vm_flags)) @@ -550,14 +550,18 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, } } -/* +/** + * Remove folio from page_cache and userspace mappings. Also unreserves pages, + * updating hstate @h, subpool @spool (if not NULL), @inode block info and + * @inode's resv_map (if not NULL). + * * Called with hugetlb fault mutex held. * Returns true if page was actually removed, false otherwise. */ -static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, - struct address_space *mapping, - struct folio *folio, pgoff_t index, - bool truncate_op) +static bool remove_mapping_single_folio( + struct address_space *mapping, struct folio *folio, pgoff_t index, + struct hstate *h, struct hugepage_subpool *spool, struct inode *inode, + bool truncate_op) { bool ret = false; @@ -582,9 +586,8 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, hugetlb_delete_from_page_cache(folio); ret = true; if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, index, - index + 1, 1))) - hugetlb_fix_reserve_counts(inode); + if (unlikely(hugetlb_unreserve_pages(h, spool, inode, index, index + 1, 1))) + hugetlb_fix_reserve_counts(h, spool); } folio_unlock(folio); @@ -592,7 +595,14 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, } /* - * remove_inode_hugepages handles two distinct cases: truncation and hole + * Remove hugetlb page mappings from @mapping between offsets [@lstart, @lend). + * Also updates reservations in: + * + hstate @h (required) + * + subpool @spool (can be NULL) + * + resv_map in @inode (can be NULL) + * and updates blocks in @inode (required) + * + * remove_mapping_hugepages handles two distinct cases: truncation and hole * punch. There are subtle differences in operation for each case. * * truncation is indicated by end of range being LLONG_MAX @@ -611,10 +621,10 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. */ -void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) +void remove_mapping_hugepages(struct address_space *mapping, + struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, loff_t lstart, loff_t lend) { - struct hstate *h = hstate_inode(inode); - struct address_space *mapping = &inode->i_data; const pgoff_t start = lstart >> huge_page_shift(h); const pgoff_t end = lend >> huge_page_shift(h); struct folio_batch fbatch; @@ -636,8 +646,8 @@ void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) /* * Remove folio that was part of folio_batch. */ - if (remove_inode_single_folio(h, inode, mapping, folio, - index, truncate_op)) + if (remove_mapping_single_folio(mapping, folio, index, + h, spool, inode, truncate_op)) freed++; mutex_unlock(&hugetlb_fault_mutex_table[hash]); @@ -647,7 +657,16 @@ void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) } if (truncate_op) - (void)hugetlb_unreserve_pages(inode, start, LONG_MAX, freed); + (void)hugetlb_unreserve_pages(h, spool, inode, start, LONG_MAX, freed); +} + +void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) +{ + struct address_space *mapping = &inode->i_data; + struct hstate *h = hstate_inode(inode); + struct hugepage_subpool *spool = subpool_inode(inode); + + return remove_mapping_hugepages(mapping, h, spool, inode, lstart, lend); } static void hugetlbfs_evict_inode(struct inode *inode) @@ -1548,6 +1567,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size, struct vfsmount *mnt; int hstate_idx; struct file *file; + struct hstate *h; hstate_idx = get_hstate_idx(page_size_log); if (hstate_idx < 0) @@ -1578,9 +1598,10 @@ struct file *hugetlb_file_setup(const char *name, size_t size, inode->i_size = size; clear_nlink(inode); - if (!hugetlb_reserve_pages(inode, 0, - size >> huge_page_shift(hstate_inode(inode)), NULL, - acctflag)) + h = hstate_inode(inode); + if (!hugetlb_reserve_pages(h, subpool_inode(inode), inode, 0, + size >> huge_page_shift(h), NULL, + acctflag)) file = ERR_PTR(-ENOMEM); else file = alloc_file_pseudo(inode, mnt, name, O_RDWR, diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 1483020b412b..2457d7a21974 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -166,11 +166,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct page **pagep, bool wp_copy); #endif /* CONFIG_USERFAULTFD */ -bool hugetlb_reserve_pages(struct inode *inode, long from, long to, - struct vm_area_struct *vma, - vm_flags_t vm_flags); -long hugetlb_unreserve_pages(struct inode *inode, long start, long end, - long freed); +bool hugetlb_reserve_pages(struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, + long from, long to, + struct vm_area_struct *vma, + vm_flags_t vm_flags); +long hugetlb_unreserve_pages(struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, long start, long end, long freed); bool isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, @@ -178,7 +180,7 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags, void folio_putback_active_hugetlb(struct folio *folio); void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason); void free_huge_page(struct page *page); -void hugetlb_fix_reserve_counts(struct inode *inode); +void hugetlb_fix_reserve_counts(struct hstate *h, struct hugepage_subpool *spool); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); @@ -259,6 +261,9 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); void hugetlb_zero_partial_page(struct hstate *h, struct address_space *mapping, loff_t start, loff_t end); +void remove_mapping_hugepages(struct address_space *mapping, + struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, loff_t lstart, loff_t lend); void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend); #else /* !CONFIG_HUGETLB_PAGE */ @@ -472,6 +477,9 @@ static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { } static inline void hugetlb_zero_partial_page( struct hstate *h, struct address_space *mapping, loff_t start, loff_t end) {} +static inline void remove_mapping_hugepages( + struct address_space *mapping, struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, loff_t lstart, loff_t lend) {} static inline void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) {} #endif /* !CONFIG_HUGETLB_PAGE */ @@ -554,6 +562,12 @@ static inline struct hstate *hstate_inode(struct inode *i) { return HUGETLBFS_SB(i->i_sb)->hstate; } + +static inline struct hugepage_subpool *subpool_inode(struct inode *inode) +{ + return HUGETLBFS_SB(inode->i_sb)->spool; +} + #else /* !CONFIG_HUGETLBFS */ #define is_file_hugepages(file) false @@ -568,6 +582,12 @@ static inline struct hstate *hstate_inode(struct inode *i) { return NULL; } + +static inline struct hugepage_subpool *subpool_inode(struct inode *inode) +{ + return NULL; +} + #endif /* !CONFIG_HUGETLBFS */ #ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9c9262833b4f..9da419b930df 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -247,11 +247,6 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, return ret; } -static inline struct hugepage_subpool *subpool_inode(struct inode *inode) -{ - return HUGETLBFS_SB(inode->i_sb)->spool; -} - static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) { return subpool_inode(file_inode(vma->vm_file)); @@ -898,16 +893,13 @@ static long region_del(struct resv_map *resv, long f, long t) * appear as a "reserved" entry instead of simply dangling with incorrect * counts. */ -void hugetlb_fix_reserve_counts(struct inode *inode) +void hugetlb_fix_reserve_counts(struct hstate *h, struct hugepage_subpool *spool) { - struct hugepage_subpool *spool = subpool_inode(inode); long rsv_adjust; bool reserved = false; rsv_adjust = hugepage_subpool_get_pages(spool, 1); if (rsv_adjust > 0) { - struct hstate *h = hstate_inode(inode); - if (!hugetlb_acct_memory(h, 1)) reserved = true; } else if (!rsv_adjust) { @@ -6762,15 +6754,22 @@ long hugetlb_change_protection(struct vm_area_struct *vma, return pages > 0 ? (pages << h->order) : pages; } -/* Return true if reservation was successful, false otherwise. */ -bool hugetlb_reserve_pages(struct inode *inode, - long from, long to, - struct vm_area_struct *vma, - vm_flags_t vm_flags) +/** + * Reserves pages between vma indices @from and @to by handling accounting in: + * + hstate @h (required) + * + subpool @spool (can be NULL) + * + @inode (required if @vma is NULL) + * + * Will setup resv_map in @vma if necessary. + * Return true if reservation was successful, false otherwise. + */ +bool hugetlb_reserve_pages(struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, + long from, long to, + struct vm_area_struct *vma, + vm_flags_t vm_flags) { long chg = -1, add = -1; - struct hstate *h = hstate_inode(inode); - struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; struct hugetlb_cgroup *h_cg = NULL; long gbl_reserve, regions_needed = 0; @@ -6921,13 +6920,23 @@ bool hugetlb_reserve_pages(struct inode *inode, return false; } -long hugetlb_unreserve_pages(struct inode *inode, long start, long end, - long freed) +/** + * Unreserves pages between vma indices @start and @end by handling accounting + * in: + * + hstate @h (required) + * + subpool @spool (can be NULL) + * + @inode (required) + * + resv_map in @inode (can be NULL) + * + * @freed is the number of pages freed, for updating inode->i_blocks. + * + * Returns 0 on success. + */ +long hugetlb_unreserve_pages(struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, long start, long end, long freed) { - struct hstate *h = hstate_inode(inode); struct resv_map *resv_map = inode_resv_map(inode); long chg = 0; - struct hugepage_subpool *spool = subpool_inode(inode); long gbl_reserve; /* From patchwork Tue Jun 6 19:03:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269612 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DE9E8C7EE2A for ; Tue, 6 Jun 2023 19:06:05 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byn-0001Kx-AP; Tue, 06 Jun 2023 15:04:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3NoN_ZAsKCmUDFNHUOHbWQJJRRJOH.FRPTHPX-GHYHOQRQJQX.RUJ@flex--ackerleytng.bounces.google.com>) id 1q6byl-0001JP-79 for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:27 -0400 Received: from mail-pg1-x54a.google.com ([2607:f8b0:4864:20::54a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3NoN_ZAsKCmUDFNHUOHbWQJJRRJOH.FRPTHPX-GHYHOQRQJQX.RUJ@flex--ackerleytng.bounces.google.com>) id 1q6byj-00020f-HZ for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:26 -0400 Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-543a4a54469so952227a12.1 for ; Tue, 06 Jun 2023 12:04:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078262; x=1688670262; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TbnAfmaUEaUTjNKT8epdT4a2IjxXaCNnR/Vzf4sBC5I=; b=HeyHfG//kNLUaGcFQWwTewKEgcn4vPYn8juDXqEsqn0adr4vG649KYKYHbBgxoXbAB ahAsJnvxbrxRZIn8HaOziynSothNYKlQfcTu3jAM4uhoNpX3LWBrkvui47Zp8M9CRSxM Hv3v6i8Oi9BrSmd48ktue72BHNGZsJ2uSICRouR0ffp2LrDCVbnCBmA+g2SulzB0hGNV h/9GOrxYxtycjJuMZs46LGZ0YhTsDAihpMdCj9p8GQJvBLZp7PdD2PUuqFKV1kr3TMS1 NWDPMg3o3REsZ5x50h8pQh4zEK7vomJrpG+mCaUeJtrTs1RdwqH+EB5u26LqsONPQnLR jEIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078262; x=1688670262; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TbnAfmaUEaUTjNKT8epdT4a2IjxXaCNnR/Vzf4sBC5I=; b=bGEtOxC8JkTZPdneYkBF3PJkKMvsd0CRyZMICilQaZcdOvjoBu8RKNDLW6WKNDA/ah Pjm3qs6iErAqg5+kmXcONWkBFQbhZ/NVI7z+sBZsvz2SDFypusaYQNt6bPNuTq2yCS4x nefJNSA6jRDwsk3gMa5ckME11fBd7Tre57seVyZL/jzPkJR6SWpOunzbpm+eh3oIstea Cs9ptLIppWva0wzJXggssd6Fw13mlZFtC5s1Q0qhuKi18PQ4IGmJ/yPQRlHbXq1fR2Sk p0zaAk5XK4iNe9iF75aqfB5utd5rw6D8hsttj1O/WT+qA+2V6HnsendYGJ/Sfk3DBuen xjSA== X-Gm-Message-State: AC+VfDwhvfIwVInE37oXtlZu+cMuzvmUlDKy3P9+P9taGqn1AD49E4qK zPFU/sHqD8luZVTimWy8hB8NyQxoqIN5YdGVnw== X-Google-Smtp-Source: ACHHUZ7WRmD6VfCo02eYhPrQlyc6XnpZbLw7oQUgpoj7hXhb698PIwiX79ZF4kn3hNLKX22XHrTVhzRB5PnobIvmJA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a63:fe05:0:b0:513:9753:46d2 with SMTP id p5-20020a63fe05000000b00513975346d2mr634687pgh.2.1686078262212; Tue, 06 Jun 2023 12:04:22 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:50 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <7827774c13e975d3d1dedc4a4684cb92eac8b548.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 05/19] mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool and hstate From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::54a; envelope-from=3NoN_ZAsKCmUDFNHUOHbWQJJRRJOH.FRPTHPX-GHYHOQRQJQX.RUJ@flex--ackerleytng.bounces.google.com; helo=mail-pg1-x54a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org subpool_inode() and hstate_inode() are hugetlbfs-specific. By allowing subpool and hstate to be specified, hugetlb is further modularized from hugetlbfs. Signed-off-by: Ackerley Tng --- include/linux/hugetlb.h | 3 +++ mm/hugetlb.c | 16 ++++++++++++---- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 2457d7a21974..14df89d1642c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -747,6 +747,9 @@ struct huge_bootmem_page { }; int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); +struct folio *alloc_hugetlb_folio_from_subpool( + struct hugepage_subpool *spool, struct hstate *h, + struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9da419b930df..99ab4bbdb2ce 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3008,11 +3008,10 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) return ret; } -struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) +struct folio *alloc_hugetlb_folio_from_subpool( + struct hugepage_subpool *spool, struct hstate *h, + struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { - struct hugepage_subpool *spool = subpool_vma(vma); - struct hstate *h = hstate_vma(vma); struct folio *folio; long map_chg, map_commit; long gbl_chg; @@ -3139,6 +3138,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, return ERR_PTR(-ENOSPC); } +struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, + unsigned long addr, int avoid_reserve) +{ + struct hugepage_subpool *spool = subpool_vma(vma); + struct hstate *h = hstate_vma(vma); + + return alloc_hugetlb_folio_from_subpool(spool, h, vma, addr, avoid_reserve); +} + int alloc_bootmem_huge_page(struct hstate *h, int nid) __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); int __alloc_bootmem_huge_page(struct hstate *h, int nid) From patchwork Tue Jun 6 19:03:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269622 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 206F4C77B7A for ; Tue, 6 Jun 2023 19:07:39 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bym-0001KS-GG; Tue, 06 Jun 2023 15:04:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3N4N_ZAsKCmYEGOIVPIcXRKKSSKPI.GSQUIQY-HIZIPRSRKRY.SVK@flex--ackerleytng.bounces.google.com>) id 1q6byl-0001JW-LD for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:27 -0400 Received: from mail-pj1-x1049.google.com ([2607:f8b0:4864:20::1049]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3N4N_ZAsKCmYEGOIVPIcXRKKSSKPI.GSQUIQY-HIZIPRSRKRY.SVK@flex--ackerleytng.bounces.google.com>) id 1q6byj-00021z-UE for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:27 -0400 Received: by mail-pj1-x1049.google.com with SMTP id 98e67ed59e1d1-256a43693e7so2145241a91.2 for ; Tue, 06 Jun 2023 12:04:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078264; x=1688670264; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MZNbtWHLPp5u2GywkOyP6D11aGDQiwpmebr72+3tZZw=; b=3nPwPewVscUDqVZMn7Uj3o98nrz5j8VLctE94SycAE0CkTh6eROV6trMOeNgO1y4jS yLqfAq9F5z3V/0un8MfG4HdspEU1lRRMGNzpIw+TRUDHmsZbFPuX/+Cikl64LmUYIx66 8bq6oGIyVKNOZEdXnRQamgBHRpoKdYwSVBMUDqfzl7aIO0QNJg3Dh7nLDTiLBxl5+u8s 7Pxbmh7gLIa94u88QAeI5yehSo2cVl1/aF9z6LyF0284EPn+EeLrVrvCYse/GQqVLuDH J/bFMQvA4aC2mtn4McrJir6hXAodHcAaWQLRaE/P7TGgIMKpZ2UbMXusJndkMLSh2bgC CYgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078264; x=1688670264; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MZNbtWHLPp5u2GywkOyP6D11aGDQiwpmebr72+3tZZw=; b=FhusgGQ9t9xIbSHfY/1zDNsJ1SsVdkRN6JI8lEfTskyQEx3I8tJfp6XTZcK0PzuWFX VkjgHSbGb9/9Dxtwby4eWJq12UDsB3p48odRoWi66ywHE/UQJ79i4cJFOFa0VxbK9PFe J5yLrkzW41zzP/ckfH3VM/f4YHnoBDAb/LpXKtIJXVhrdoARP8k5VGtKlAZaBK4fEYaa xUcegdTwxnWzLboAYlOXMeeWGS5I5NTihoMcabtjU4jggodM+h+Bc5H5h3tyguBcnGXv eE22N5N4NViHNhjLhMs1XIYtPwDwY6bj2Vw0KM8E88cwcP9yuCSRNb/CLR5AfDE/KnkO cr4w== X-Gm-Message-State: AC+VfDyO3myrgH0jADm5Z4q8Buk1nG5k/UpiFw202Z1SjmKz58GoUvAB ONMkgaTkrj3c2F+EAld8GIn5eAcWrNsFPcy7Lg== X-Google-Smtp-Source: ACHHUZ7IGNkoxDfbAlb5rNYiUxqKSeo6I3oYpfv0H1ogSeNyAElarCOglNHY4mpG93xhDZmjnsgqFXX+BQakQxRlBw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:90a:c092:b0:256:b3d0:f2f0 with SMTP id o18-20020a17090ac09200b00256b3d0f2f0mr806359pjs.2.1686078263970; Tue, 06 Jun 2023 12:04:23 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:51 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <69ae008ec4076456078b880575ac310171136ac0.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 06/19] mm: hugetlb: Provide hugetlb_filemap_add_folio() From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::1049; envelope-from=3N4N_ZAsKCmYEGOIVPIcXRKKSSKPI.GSQUIQY-HIZIPRSRKRY.SVK@flex--ackerleytng.bounces.google.com; helo=mail-pj1-x1049.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org hstate_inode() is hugetlbfs-specific, limiting hugetlb_add_to_page_cache() to hugetlbfs. hugetlb_filemap_add_folio() allows hstate to be specified and further separates hugetlb from hugetlbfs. Signed-off-by: Ackerley Tng --- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 13 ++++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 14df89d1642c..7d49048c5a2a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -756,6 +756,8 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); +int hugetlb_filemap_add_folio(struct address_space *mapping, struct hstate *h, + struct folio *folio, pgoff_t idx); int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, pgoff_t idx); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 99ab4bbdb2ce..d16c6417b90f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5665,11 +5665,10 @@ static bool hugetlbfs_pagecache_present(struct hstate *h, return present; } -int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, - pgoff_t idx) +int hugetlb_filemap_add_folio(struct address_space *mapping, struct hstate *h, + struct folio *folio, pgoff_t idx) { struct inode *inode = mapping->host; - struct hstate *h = hstate_inode(inode); int err; __folio_set_locked(folio); @@ -5693,6 +5692,14 @@ int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping return 0; } +int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, + pgoff_t idx) +{ + struct hstate *h = hstate_inode(mapping->host); + + return hugetlb_filemap_add_folio(mapping, h, folio, idx); +} + static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, From patchwork Tue Jun 6 19:03:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269566 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C5903C77B73 for ; Tue, 6 Jun 2023 19:04:57 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byq-0001Lg-U3; Tue, 06 Jun 2023 15:04:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3OYN_ZAsKCmgGIQKXRKeZTMMUUMRK.IUSWKSa-JKbKRTUTMTa.UXM@flex--ackerleytng.bounces.google.com>) id 1q6byo-0001L2-OZ for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:30 -0400 Received: from mail-pg1-x54a.google.com ([2607:f8b0:4864:20::54a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3OYN_ZAsKCmgGIQKXRKeZTMMUUMRK.IUSWKSa-JKbKRTUTMTa.UXM@flex--ackerleytng.bounces.google.com>) id 1q6byl-00022s-Cs for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:29 -0400 Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-51f7638a56fso5921704a12.3 for ; Tue, 06 Jun 2023 12:04:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078266; x=1688670266; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RTPC0AoXX+NIzZnufKQytW8TkFDwziCOrc5ZhwaQeP0=; b=owBmv/2mMPcrWAYYeOz6IufYws8uNBVmJpdSLiIa/Oxif5X7cqv+vateNaoAbLjKN+ 9g8OIvY2fitd8/bF4vWCTRGtTW4cbQGYeFnSk7O/6JmITSXQUE0V5caWCidr9eoCizfR mnHihiIAiXJOu1//OF3fyNTkiIvpgOeBZOYv+VfEA9HuxcdSL1KjLzz1ww4jo+xPlynH cZu+282ONjHqqdAeelyqOrgfiIuIaT7NMZ4iSDTpZ4yPFrLEm9YlMCTiZ2Jg+4dOz9SD Af25fA3ecy4VQVCca8EotY/PKiVwegJai+On/aGLqcq8u80JQ1wIUoTEAC7FSVslSvP6 ocKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078266; x=1688670266; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RTPC0AoXX+NIzZnufKQytW8TkFDwziCOrc5ZhwaQeP0=; b=e1EdXKqqDLBIG7B52AUmZXix2TB+wslz0HugWozPuSjqn7jybrYG0EP/kuID9hsZ7m Zb07nnd/Xs7eLBQOQ0aUEXD6DkZQlEv8n75bYNC6AVw9CUcFGK2AWkGIqrpbZGbW7eLP omHzDjxtaSOQW1HZkWOxF+Lw7BGp3DIDKDBpDqjyFQfml/lrVe8MXo6JbFSMfAka+Qrl sbqkgZpXCDvXqNSKSlyYQ4uZ8pzBJiBV5mFJQO8vk7dhg72vRBwS1hFYUOsHr6OSDkR2 qv6Y8iU0I/w2zvSb7rDwaqglAwSiz1D+y0DfxzijHpNiBb4Vp2OG+mQyvcZjwtCaCGMJ Iw0A== X-Gm-Message-State: AC+VfDwYwY4Wrsuo+BlRhyrca3AYoN5l3D/neCaEMtxTQoOSOKWVOe5h BFksUh1LIChL3xNyxXYukgrM/exmb/fLR5OYlA== X-Google-Smtp-Source: ACHHUZ7U7Xr5zqg3KoBTJpAqI34Pla/gmFzawbxWzmrT+4L/HPyaurNY1sqxQPPqj80dK4ECUufRqT2mO/+XHSgL7w== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a65:618a:0:b0:542:9fad:de1 with SMTP id c10-20020a65618a000000b005429fad0de1mr614686pgv.12.1686078265785; Tue, 06 Jun 2023 12:04:25 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:52 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <508025e09425a98d52b17cfbdc07340ae05e3e32.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 07/19] mm: hugetlb: Refactor vma_*_reservation functions From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::54a; envelope-from=3OYN_ZAsKCmgGIQKXRKeZTMMUUMRK.IUSWKSa-JKbKRTUTMTa.UXM@flex--ackerleytng.bounces.google.com; helo=mail-pg1-x54a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org vma_*_reservation functions rely on vma_resv_map(), which assumes on a hugetlbfs concept of the resv_map being stored in a specific field of the inode. This refactor enables vma_*_reservation functions, now renamed resv_map_*_reservation, to be used with non-hugetlbfs filesystems, further decoupling hugetlb from hugetlbfs. Signed-off-by: Ackerley Tng --- mm/hugetlb.c | 184 +++++++++++++++++++++++++++------------------------ 1 file changed, 99 insertions(+), 85 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d16c6417b90f..d943f83d15a9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2643,89 +2643,81 @@ static void return_unused_surplus_pages(struct hstate *h, /* - * vma_needs_reservation, vma_commit_reservation and vma_end_reservation - * are used by the huge page allocation routines to manage reservations. + * resv_map_needs_reservation, resv_map_commit_reservation and + * resv_map_end_reservation are used by the huge page allocation routines to + * manage reservations. * - * vma_needs_reservation is called to determine if the huge page at addr - * within the vma has an associated reservation. If a reservation is - * needed, the value 1 is returned. The caller is then responsible for - * managing the global reservation and subpool usage counts. After - * the huge page has been allocated, vma_commit_reservation is called - * to add the page to the reservation map. If the page allocation fails, - * the reservation must be ended instead of committed. vma_end_reservation - * is called in such cases. + * resv_map_needs_reservation is called to determine if the huge page at addr + * within the vma has an associated reservation. If a reservation is needed, + * the value 1 is returned. The caller is then responsible for managing the + * global reservation and subpool usage counts. After the huge page has been + * allocated, resv_map_commit_reservation is called to add the page to the + * reservation map. If the page allocation fails, the reservation must be ended + * instead of committed. resv_map_end_reservation is called in such cases. * - * In the normal case, vma_commit_reservation returns the same value - * as the preceding vma_needs_reservation call. The only time this - * is not the case is if a reserve map was changed between calls. It - * is the responsibility of the caller to notice the difference and - * take appropriate action. + * In the normal case, resv_map_commit_reservation returns the same value as the + * preceding resv_map_needs_reservation call. The only time this is not the + * case is if a reserve map was changed between calls. It is the responsibility + * of the caller to notice the difference and take appropriate action. * - * vma_add_reservation is used in error paths where a reservation must - * be restored when a newly allocated huge page must be freed. It is - * to be called after calling vma_needs_reservation to determine if a - * reservation exists. + * resv_map_add_reservation is used in error paths where a reservation must be + * restored when a newly allocated huge page must be freed. It is to be called + * after calling resv_map_needs_reservation to determine if a reservation + * exists. * - * vma_del_reservation is used in error paths where an entry in the reserve - * map was created during huge page allocation and must be removed. It is to - * be called after calling vma_needs_reservation to determine if a reservation + * resv_map_del_reservation is used in error paths where an entry in the reserve + * map was created during huge page allocation and must be removed. It is to be + * called after calling resv_map_needs_reservation to determine if a reservation * exists. */ -enum vma_resv_mode { - VMA_NEEDS_RESV, - VMA_COMMIT_RESV, - VMA_END_RESV, - VMA_ADD_RESV, - VMA_DEL_RESV, +enum resv_map_resv_mode { + RESV_MAP_NEEDS_RESV, + RESV_MAP_COMMIT_RESV, + RESV_MAP_END_RESV, + RESV_MAP_ADD_RESV, + RESV_MAP_DEL_RESV, }; -static long __vma_reservation_common(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr, - enum vma_resv_mode mode) +static long __resv_map_reservation_common(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping, + enum resv_map_resv_mode mode) { - struct resv_map *resv; - pgoff_t idx; long ret; long dummy_out_regions_needed; - resv = vma_resv_map(vma); - if (!resv) - return 1; - - idx = vma_hugecache_offset(h, vma, addr); switch (mode) { - case VMA_NEEDS_RESV: - ret = region_chg(resv, idx, idx + 1, &dummy_out_regions_needed); + case RESV_MAP_NEEDS_RESV: + ret = region_chg(resv, resv_index, resv_index + 1, &dummy_out_regions_needed); /* We assume that vma_reservation_* routines always operate on * 1 page, and that adding to resv map a 1 page entry can only * ever require 1 region. */ VM_BUG_ON(dummy_out_regions_needed != 1); break; - case VMA_COMMIT_RESV: - ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + case RESV_MAP_COMMIT_RESV: + ret = region_add(resv, resv_index, resv_index + 1, 1, NULL, NULL); /* region_add calls of range 1 should never fail. */ VM_BUG_ON(ret < 0); break; - case VMA_END_RESV: - region_abort(resv, idx, idx + 1, 1); + case RESV_MAP_END_RESV: + region_abort(resv, resv_index, resv_index + 1, 1); ret = 0; break; - case VMA_ADD_RESV: - if (vma->vm_flags & VM_MAYSHARE) { - ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + case RESV_MAP_ADD_RESV: + if (may_be_shared_mapping) { + ret = region_add(resv, resv_index, resv_index + 1, 1, NULL, NULL); /* region_add calls of range 1 should never fail. */ VM_BUG_ON(ret < 0); } else { - region_abort(resv, idx, idx + 1, 1); - ret = region_del(resv, idx, idx + 1); + region_abort(resv, resv_index, resv_index + 1, 1); + ret = region_del(resv, resv_index, resv_index + 1); } break; - case VMA_DEL_RESV: - if (vma->vm_flags & VM_MAYSHARE) { - region_abort(resv, idx, idx + 1, 1); - ret = region_del(resv, idx, idx + 1); + case RESV_MAP_DEL_RESV: + if (may_be_shared_mapping) { + region_abort(resv, resv_index, resv_index + 1, 1); + ret = region_del(resv, resv_index, resv_index + 1); } else { - ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + ret = region_add(resv, resv_index, resv_index + 1, 1, NULL, NULL); /* region_add calls of range 1 should never fail. */ VM_BUG_ON(ret < 0); } @@ -2734,7 +2726,7 @@ static long __vma_reservation_common(struct hstate *h, BUG(); } - if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV) + if (may_be_shared_mapping || mode == RESV_MAP_DEL_RESV) return ret; /* * We know private mapping must have HPAGE_RESV_OWNER set. @@ -2758,34 +2750,39 @@ static long __vma_reservation_common(struct hstate *h, return ret; } -static long vma_needs_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static long resv_map_needs_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - return __vma_reservation_common(h, vma, addr, VMA_NEEDS_RESV); + return __resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_NEEDS_RESV); } -static long vma_commit_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static long resv_map_commit_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - return __vma_reservation_common(h, vma, addr, VMA_COMMIT_RESV); + return __resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_COMMIT_RESV); } -static void vma_end_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static void resv_map_end_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - (void)__vma_reservation_common(h, vma, addr, VMA_END_RESV); + (void)__resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_END_RESV); } -static long vma_add_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static long resv_map_add_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV); + return __resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_ADD_RESV); } -static long vma_del_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static long resv_map_del_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV); + return __resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_DEL_RESV); } /* @@ -2811,7 +2808,12 @@ static long vma_del_reservation(struct hstate *h, void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct folio *folio) { - long rc = vma_needs_reservation(h, vma, address); + long rc; + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, address); + bool may_share = vma->vm_flags & VM_MAYSHARE; + + rc = resv_map_needs_reservation(resv, resv_index, may_share); if (folio_test_hugetlb_restore_reserve(folio)) { if (unlikely(rc < 0)) @@ -2828,9 +2830,9 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, */ folio_clear_hugetlb_restore_reserve(folio); else if (rc) - (void)vma_add_reservation(h, vma, address); + (void)resv_map_add_reservation(resv, resv_index, may_share); else - vma_end_reservation(h, vma, address); + resv_map_end_reservation(resv, resv_index, may_share); } else { if (!rc) { /* @@ -2841,7 +2843,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, * Remove the entry so that a subsequent allocation * does not consume a reservation. */ - rc = vma_del_reservation(h, vma, address); + rc = resv_map_del_reservation(resv, resv_index, may_share); if (rc < 0) /* * VERY rare out of memory condition. Since @@ -2855,7 +2857,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, } else if (rc < 0) { /* * Rare out of memory condition from - * vma_needs_reservation call. Memory allocation is + * resv_map_needs_reservation call. Memory allocation is * only attempted if a new entry is needed. Therefore, * this implies there is not an entry in the * reserve map. @@ -2877,7 +2879,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, /* * No reservation present, do nothing */ - vma_end_reservation(h, vma, address); + resv_map_end_reservation(resv, resv_index, may_share); } } @@ -3019,13 +3021,17 @@ struct folio *alloc_hugetlb_folio_from_subpool( struct hugetlb_cgroup *h_cg = NULL; bool deferred_reserve; + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, addr); + bool may_share = vma->vm_flags & VM_MAYSHARE; + idx = hstate_index(h); /* * Examine the region/reserve map to determine if the process * has a reservation for the page to be allocated. A return * code of zero indicates a reservation exists (no change). */ - map_chg = gbl_chg = vma_needs_reservation(h, vma, addr); + map_chg = gbl_chg = resv_map_needs_reservation(resv, resv_index, may_share); if (map_chg < 0) return ERR_PTR(-ENOMEM); @@ -3039,7 +3045,7 @@ struct folio *alloc_hugetlb_folio_from_subpool( if (map_chg || avoid_reserve) { gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) { - vma_end_reservation(h, vma, addr); + resv_map_end_reservation(resv, resv_index, may_share); return ERR_PTR(-ENOSPC); } @@ -3104,11 +3110,11 @@ struct folio *alloc_hugetlb_folio_from_subpool( hugetlb_set_folio_subpool(folio, spool); - map_commit = vma_commit_reservation(h, vma, addr); + map_commit = resv_map_commit_reservation(resv, resv_index, may_share); if (unlikely(map_chg > map_commit)) { /* * The page was added to the reservation map between - * vma_needs_reservation and vma_commit_reservation. + * resv_map_needs_reservation and resv_map_commit_reservation. * This indicates a race with hugetlb_reserve_pages. * Adjust for the subpool count incremented above AND * in hugetlb_reserve_pages for the same page. Also, @@ -3134,7 +3140,7 @@ struct folio *alloc_hugetlb_folio_from_subpool( out_subpool_put: if (map_chg || avoid_reserve) hugepage_subpool_put_pages(spool, 1); - vma_end_reservation(h, vma, addr); + resv_map_end_reservation(resv, resv_index, may_share); return ERR_PTR(-ENOSPC); } @@ -5901,12 +5907,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * the spinlock. */ if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { - if (vma_needs_reservation(h, vma, haddr) < 0) { + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, address); + bool may_share = vma->vm_flags & VM_MAYSHARE; + + if (resv_map_needs_reservation(resv, resv_index, may_share) < 0) { ret = VM_FAULT_OOM; goto backout_unlocked; } /* Just decrements count, does not deallocate */ - vma_end_reservation(h, vma, haddr); + resv_map_end_reservation(resv, resv_index, may_share); } ptl = huge_pte_lock(h, mm, ptep); @@ -6070,12 +6080,16 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) && !(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(entry)) { - if (vma_needs_reservation(h, vma, haddr) < 0) { + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, address); + bool may_share = vma->vm_flags & VM_MAYSHARE; + + if (resv_map_needs_reservation(resv, resv_index, may_share) < 0) { ret = VM_FAULT_OOM; goto out_mutex; } /* Just decrements count, does not deallocate */ - vma_end_reservation(h, vma, haddr); + resv_map_end_reservation(resv, resv_index, may_share); pagecache_folio = filemap_lock_folio(mapping, idx); } From patchwork Tue Jun 6 19:03:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269570 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E00DCC7EE2A for ; Tue, 6 Jun 2023 19:04:57 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byq-0001Lh-UD; Tue, 06 Jun 2023 15:04:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3O4N_ZAsKCmoIKSMZTMgbVOOWWOTM.KWUYMUc-LMdMTVWVOVc.WZO@flex--ackerleytng.bounces.google.com>) id 1q6byp-0001LQ-Ch for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:31 -0400 Received: from mail-pf1-x449.google.com ([2607:f8b0:4864:20::449]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3O4N_ZAsKCmoIKSMZTMgbVOOWWOTM.KWUYMUc-LMdMTVWVOVc.WZO@flex--ackerleytng.bounces.google.com>) id 1q6byn-00023a-GR for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:31 -0400 Received: by mail-pf1-x449.google.com with SMTP id d2e1a72fcca58-65a971d7337so1772956b3a.1 for ; Tue, 06 Jun 2023 12:04:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078267; x=1688670267; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=q0LlAXxza1jA09p0tGZBn0yslODgyOL+oHXEFWpIcjw=; b=oNCwvb9aRpGDjzemliRPAV2RbEjUkgcqMTcFw35zrX9Wk6VNZIJTDcZ8+sohLR6ypL FaZ8YypjTtdq4RVqmfVtwq8W8DKoRJdnwKkTjuJm/fygYago9102lZaBqLBb3VLlSbNA 8+GKs0dCEhmYGJeMyAKWj2FCcFo2qfmKEi+Bfw7yT1VoVQRRdAWcJACFBwEDVx5EFgYY ssLEqzao5SUtD6qqnRzkJD0PmTNAv31SFuhQy7OBTGaI6F+cg3TLNZhoEHYY01Vr6fPj YBc8jpY1SOo3QPJvqg80DDEnv97O24LbqrhfCC1OmOHsPqRNG8fKpuVSD4Hro3kbXfW6 x8MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078267; x=1688670267; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=q0LlAXxza1jA09p0tGZBn0yslODgyOL+oHXEFWpIcjw=; b=bN2lbtbSkygZtFRoeHOz+PKMkJqZVYlicNy5ml0AnExZ5NmEh8Dy5DQUkPj8yyuo5k cto99IcIMS3bHNSFNyf0XcH7tH1dVu1CknPlzY+Hn+nK95EuZ8VD/lCVxkeG5+YHIXY7 l1AEDvIIYjeX7dTDIwYyD7qsE5W3l13w2rj6LSZtT99JiyxaS9yQWmmzBpIStR7iZIUa 37lDaf/OI00L8U+1PYbeOBaWYRyprlfjU+KiSfqjeQO2xbtgVko3Md1axSZfnhDJOl9N p6juc4nHcQLQauby3dvaMQyAjA95YxMc4lq1mW/Dh6wUY/XP7Z7PF9qtO8txvkJ4raXH 9oLw== X-Gm-Message-State: AC+VfDzxDMarmpFbfVsDn8mgM3fP1yXvwKFUiOjyHcftx+1iVpHDHSnS OePDwQ1YixMEaAdgw4Y0SWNXcyL58pLCa00m4A== X-Google-Smtp-Source: ACHHUZ6/geo43pUv13kPM8dbgitYn16Pb8nr3LytYQllA2bz6ABBAmVaVTp99t2le/ojeTQ9xVl065tEgGqpA1Ormw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6a00:188d:b0:64f:cdb0:64a8 with SMTP id x13-20020a056a00188d00b0064fcdb064a8mr1304775pfh.3.1686078267446; Tue, 06 Jun 2023 12:04:27 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:53 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 08/19] mm: hugetlb: Refactor restore_reserve_on_error From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::449; envelope-from=3O4N_ZAsKCmoIKSMZTMgbVOOWWOTM.KWUYMUc-LMdMTVWVOVc.WZO@flex--ackerleytng.bounces.google.com; helo=mail-pf1-x449.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Refactor restore_reserve_on_error to allow resv_map to be passed in. vma_resv_map() assumes the use of hugetlbfs in the way it retrieves the resv_map from the vma and inode. Introduce restore_reserve_on_error_vma() which retains original functionality to simplify refactoring for now. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 37 +++++++++++++++++++++---------------- 3 files changed, 26 insertions(+), 19 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 0fc49b6252e4..44e6ee9a856d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -868,7 +868,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __folio_mark_uptodate(folio); error = hugetlb_add_to_page_cache(folio, mapping, index); if (unlikely(error)) { - restore_reserve_on_error(h, &pseudo_vma, addr, folio); + restore_reserve_on_error_vma(h, &pseudo_vma, addr, folio); folio_put(folio); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 7d49048c5a2a..02a2766d89a4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -760,8 +760,10 @@ int hugetlb_filemap_add_folio(struct address_space *mapping, struct hstate *h, struct folio *folio, pgoff_t idx); int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, pgoff_t idx); -void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, struct folio *folio); +void restore_reserve_on_error(struct resv_map *resv, pgoff_t resv_index, + bool may_share, struct folio *folio); +void restore_reserve_on_error_vma(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct folio *folio); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h, int nid); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d943f83d15a9..4675f9efeba4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2805,15 +2805,10 @@ static long resv_map_del_reservation(struct resv_map *resv, pgoff_t resv_index, * * In case 2, simply undo reserve map modifications done by alloc_hugetlb_folio. */ -void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, struct folio *folio) +void restore_reserve_on_error(struct resv_map *resv, pgoff_t resv_index, + bool may_share, struct folio *folio) { - long rc; - struct resv_map *resv = vma_resv_map(vma); - pgoff_t resv_index = vma_hugecache_offset(h, vma, address); - bool may_share = vma->vm_flags & VM_MAYSHARE; - - rc = resv_map_needs_reservation(resv, resv_index, may_share); + long rc = resv_map_needs_reservation(resv, resv_index, may_share); if (folio_test_hugetlb_restore_reserve(folio)) { if (unlikely(rc < 0)) @@ -2865,7 +2860,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, * For shared mappings, no entry in the map indicates * no reservation. We are done. */ - if (!(vma->vm_flags & VM_MAYSHARE)) + if (!may_share) /* * For private mappings, no entry indicates * a reservation is present. Since we can @@ -2883,6 +2878,16 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, } } +void restore_reserve_on_error_vma(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct folio *folio) +{ + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, address); + bool may_share = vma->vm_flags & VM_MAYSHARE; + + restore_reserve_on_error(resv, resv_index, may_share, folio); +} + /* * alloc_and_dissolve_hugetlb_folio - Allocate a new folio and dissolve * the old one @@ -5109,8 +5114,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { - restore_reserve_on_error(h, dst_vma, addr, - new_folio); + restore_reserve_on_error_vma(h, dst_vma, addr, + new_folio); folio_put(new_folio); /* huge_ptep of dst_pte won't change as in child */ goto again; @@ -5642,7 +5647,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * unshare) */ if (new_folio != page_folio(old_page)) - restore_reserve_on_error(h, vma, haddr, new_folio); + restore_reserve_on_error_vma(h, vma, haddr, new_folio); folio_put(new_folio); out_release_old: put_page(old_page); @@ -5860,7 +5865,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * to the page cache. So it's safe to call * restore_reserve_on_error() here. */ - restore_reserve_on_error(h, vma, haddr, folio); + restore_reserve_on_error_vma(h, vma, haddr, folio); folio_put(folio); goto out; } @@ -5965,7 +5970,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, spin_unlock(ptl); backout_unlocked: if (new_folio && !new_pagecache_folio) - restore_reserve_on_error(h, vma, haddr, folio); + restore_reserve_on_error_vma(h, vma, haddr, folio); folio_unlock(folio); folio_put(folio); @@ -6232,7 +6237,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* Free the allocated folio which may have * consumed a reservation. */ - restore_reserve_on_error(h, dst_vma, dst_addr, folio); + restore_reserve_on_error_vma(h, dst_vma, dst_addr, folio); folio_put(folio); /* Allocate a temporary folio to hold the copied @@ -6361,7 +6366,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, folio_unlock(folio); out_release_nounlock: if (!folio_in_pagecache) - restore_reserve_on_error(h, dst_vma, dst_addr, folio); + restore_reserve_on_error_vma(h, dst_vma, dst_addr, folio); folio_put(folio); goto out; } From patchwork Tue Jun 6 19:03:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269614 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 54182C77B7A for ; Tue, 6 Jun 2023 19:06:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bys-0001MS-LD; Tue, 06 Jun 2023 15:04:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3PYN_ZAsKCmwKMUObVOidXQQYYQVO.MYWaOWe-NOfOVXYXQXe.YbQ@flex--ackerleytng.bounces.google.com>) id 1q6byr-0001Lo-Cr for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:33 -0400 Received: from mail-pg1-x54a.google.com ([2607:f8b0:4864:20::54a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3PYN_ZAsKCmwKMUObVOidXQQYYQVO.MYWaOWe-NOfOVXYXQXe.YbQ@flex--ackerleytng.bounces.google.com>) id 1q6byp-000243-OE for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:33 -0400 Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-5428d1915acso2871646a12.0 for ; Tue, 06 Jun 2023 12:04:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078270; x=1688670270; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QY81mipd5eg6c/BuU1bdNHCWJQ+u0Nbd1U9TPL2kLvI=; b=KOWrQMhz2uWLqPOLL/0JQqPmlrC3dCiI9oJNy8GCgLIowS+OzS7WKaPGY9vPm3PtAT fUuRJYwgpjWgfKigP2vr98VC/4aqX3GYpvsp4hPnsGOFwNN2fFFSeRFXL9FC8QZZeOsv TXZBuwWg2rM6bvxVXZ2UCQuJ6ff8v2+LFRqOgjtb865dVZ2l7fY2f7Z/OtzPXYHKjbfN ZRjrOmzbbZGa3KF/56lwJLd6R/hiuFa+vyvCC7nno3amrl0TRMEIewtSKtG/z8kueDPR Li+slKiOskei/vBhvsQJtHNmBIeIcarnb3mQ6zRjd56sBUc1JOzEZw6pSH3bcClc+mm/ tubw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078270; x=1688670270; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QY81mipd5eg6c/BuU1bdNHCWJQ+u0Nbd1U9TPL2kLvI=; b=VbGaf4JkQDfr4iIjn7YTnykLvW4xuStsfLR08wpUKwM9IeoSN+CbqJEKKp0N4j74Cv Cf0h4OosLXOTNhGfRyWxbYMv8KtiwbBRlWTe96bcJ9QO3vFb9Dt4u4gAYvo0oJVXw+fg lGNMTGM5YXa8UUzbn5MgGGxIii9g15st50qb9i1R/7IjiRsC8Dw+FC7lLzP6Liav+Ljl keG/3f0+uXJLSVvEbbHnw9lGiLUF+kfkLoP7V73+Z9S/B1O7LLNaczKlZB/TyPk5OtRu K2cimdeKS3BCwNg2+wr1HutjiWr/TJvnV2lbPev/HKW/E7IQtB72lNSEk+1NlVZDYiQF I6Og== X-Gm-Message-State: AC+VfDxF2Rr0yv5QqvrtItlqvn9QqwElF7407KSNQsPRR6QzrHi2dQoA t/kWful8gxIJn0fDxA7cbjPVfdTOP0nmArouVQ== X-Google-Smtp-Source: ACHHUZ7uGZgeuxGQjEekaNfEPd3cHpOSpYga7qIjxZtHDS9vRTE59C+39iUxIiNEiIcn3iYrIGeQIhYQVNbQFyY1wQ== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a63:4848:0:b0:534:7672:433e with SMTP id x8-20020a634848000000b005347672433emr763304pgk.3.1686078269556; Tue, 06 Jun 2023 12:04:29 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:54 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <7937abfd3f2d071820a1bcb84e05bf48e38e2e5b.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 09/19] mm: hugetlb: Use restore_reserve_on_error directly in filesystems From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::54a; envelope-from=3PYN_ZAsKCmwKMUObVOidXQQYYQVO.MYWaOWe-NOfOVXYXQXe.YbQ@flex--ackerleytng.bounces.google.com; helo=mail-pg1-x54a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Expose inode_resv_map() so that hugetlbfs can access its own resv_map. Hide restore_reserve_on_error_vma(), that function is now only used within mm/hugetlb.c. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 21 +++++++++++++++++++-- mm/hugetlb.c | 13 ------------- 3 files changed, 20 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 44e6ee9a856d..53f6a421499d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -868,7 +868,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __folio_mark_uptodate(folio); error = hugetlb_add_to_page_cache(folio, mapping, index); if (unlikely(error)) { - restore_reserve_on_error_vma(h, &pseudo_vma, addr, folio); + restore_reserve_on_error(inode_resv_map(inode), index, true, folio); folio_put(folio); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 02a2766d89a4..5fe9643826d7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -568,6 +568,20 @@ static inline struct hugepage_subpool *subpool_inode(struct inode *inode) return HUGETLBFS_SB(inode->i_sb)->spool; } +static inline struct resv_map *inode_resv_map(struct inode *inode) +{ + /* + * At inode evict time, i_mapping may not point to the original + * address space within the inode. This original address space + * contains the pointer to the resv_map. So, always use the + * address space embedded within the inode. + * The VERY common case is inode->mapping == &inode->i_data but, + * this may not be true for device special inodes. + */ + return (struct resv_map *)(&inode->i_data)->private_data; +} + + #else /* !CONFIG_HUGETLBFS */ #define is_file_hugepages(file) false @@ -588,6 +602,11 @@ static inline struct hugepage_subpool *subpool_inode(struct inode *inode) return NULL; } +static inline struct resv_map *inode_resv_map(struct inode *inode) +{ + return NULL; +} + #endif /* !CONFIG_HUGETLBFS */ #ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA @@ -762,8 +781,6 @@ int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping pgoff_t idx); void restore_reserve_on_error(struct resv_map *resv, pgoff_t resv_index, bool may_share, struct folio *folio); -void restore_reserve_on_error_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, struct folio *folio); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h, int nid); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4675f9efeba4..540634aec181 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1091,19 +1091,6 @@ void resv_map_release(struct kref *ref) kfree(resv_map); } -static inline struct resv_map *inode_resv_map(struct inode *inode) -{ - /* - * At inode evict time, i_mapping may not point to the original - * address space within the inode. This original address space - * contains the pointer to the resv_map. So, always use the - * address space embedded within the inode. - * The VERY common case is inode->mapping == &inode->i_data but, - * this may not be true for device special inodes. - */ - return (struct resv_map *)(&inode->i_data)->private_data; -} - static struct resv_map *vma_resv_map(struct vm_area_struct *vma) { VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma); From patchwork Tue Jun 6 19:03:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1BB54C77B73 for ; Tue, 6 Jun 2023 19:06:11 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byv-0001NQ-9J; Tue, 06 Jun 2023 15:04:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3P4N_ZAsKCm4MOWQdXQkfZSSaaSXQ.OaYcQYg-PQhQXZaZSZg.adS@flex--ackerleytng.bounces.google.com>) id 1q6bys-0001MK-D8 for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:34 -0400 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3P4N_ZAsKCm4MOWQdXQkfZSSaaSXQ.OaYcQYg-PQhQXZaZSZg.adS@flex--ackerleytng.bounces.google.com>) id 1q6byq-00024R-TX for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:34 -0400 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bacfa4ef059so10375518276.2 for ; Tue, 06 Jun 2023 12:04:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078271; x=1688670271; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QP9zRN1PPC9/vVggheoZtkj7ew+Av1V49kv617fnYR8=; b=FI4N69NL3RScUYfhkbRLrL4c1EfS/desGI2Shr1/96TXRRDn4geHvyU1N7eGgqKJyG GWBPzDYB1BfYFhv6zCt0EhdFu5tldM3yzA5NvQB4/zCb+hcK/bM/QuqRiIfDvi5+hI04 SbvBvKqQxgCQVjt+cTCPYL6N1+wfniCAiJKkEW+sWbUf69X9qSMHub3UvTd9MQDzXMi4 1B4c0zvORWB8HqA5+0cdcj+spglhu6jaXUuf69LPtiQvLZve3RcWtPEQxixl3Ypwmw86 K8LGygCgyEMNtUVq9AiwfPbvahkLarTMGurJs3foOYOqn4CdQy93rRAZp4edz17ORY67 OYlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078271; x=1688670271; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QP9zRN1PPC9/vVggheoZtkj7ew+Av1V49kv617fnYR8=; b=Max4jZuP+HtsAuc22Gca54WdLz6XtNTEucV1K9Ub6mwOgbo9eitfJzVTXd+PYH2StZ kmcmSpklwnr5bG+mRF+8RVMwXKJx3z9niQcHJF9wbFOXiJoU5nFobRh+nJpx+7iWb4vW 1a0g0Qyjcteag2tvyHj2RcvekraHwPt5buVy68RCBDwckUbvAFtS9OR6l0r3td4fUgpN sdP1q1TO4TvKGGFUhWv9QYDnmmyPCPkYnbJhdnSF6RLjD+PomRQKS2+DKhAd+pxRFjud vwTkvEbGW06mg0C/hpVjVsRA/NJRedfCXHUOAukHL21eXQH+x7gPEspqYbUtTyvkrfhj s9RQ== X-Gm-Message-State: AC+VfDz9GhCOB1Rjy7tLl4hrp9Pm0lNkTCVbLxLIn23Azial56MY+kFl TJxbEWfpU6hOQFCN57BEU0RIQvkpi6kHX0mwCQ== X-Google-Smtp-Source: ACHHUZ76bqpjC35XKXFBahxA9u7+2eFkt14S7ZQHJlZjw7W0ReYy7o2ytYkzhnc/yAz55ZaZBmAlcbBYdbpzzQGaSA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a25:105:0:b0:bab:a276:caac with SMTP id 5-20020a250105000000b00baba276caacmr1716574ybb.3.1686078271536; Tue, 06 Jun 2023 12:04:31 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:55 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <382ee70df7b65c365a1eab1223f84aecc0c5be10.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 10/19] mm: hugetlb: Parametrize alloc_hugetlb_folio_from_subpool() by resv_map From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::b4a; envelope-from=3P4N_ZAsKCm4MOWQdXQkfZSSaaSXQ.OaYcQYg-PQhQXZaZSZg.adS@flex--ackerleytng.bounces.google.com; helo=mail-yb1-xb4a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Parametrize alloc_hugetlb_folio_from_subpool() by resv_map to remove the use of vma_resv_map() and decouple hugetlb with hugetlbfs. Signed-off-by: Ackerley Tng --- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5fe9643826d7..d564802ace4b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -767,7 +767,7 @@ struct huge_bootmem_page { int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct folio *alloc_hugetlb_folio_from_subpool( - struct hugepage_subpool *spool, struct hstate *h, + struct hugepage_subpool *spool, struct hstate *h, struct resv_map *resv, struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 540634aec181..aebdd8c63439 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3003,7 +3003,7 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) } struct folio *alloc_hugetlb_folio_from_subpool( - struct hugepage_subpool *spool, struct hstate *h, + struct hugepage_subpool *spool, struct hstate *h, struct resv_map *resv, struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { struct folio *folio; @@ -3013,7 +3013,6 @@ struct folio *alloc_hugetlb_folio_from_subpool( struct hugetlb_cgroup *h_cg = NULL; bool deferred_reserve; - struct resv_map *resv = vma_resv_map(vma); pgoff_t resv_index = vma_hugecache_offset(h, vma, addr); bool may_share = vma->vm_flags & VM_MAYSHARE; @@ -3141,8 +3140,9 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); + struct resv_map *resv = vma_resv_map(vma); - return alloc_hugetlb_folio_from_subpool(spool, h, vma, addr, avoid_reserve); + return alloc_hugetlb_folio_from_subpool(spool, h, resv, vma, addr, avoid_reserve); } int alloc_bootmem_huge_page(struct hstate *h, int nid) From patchwork Tue Jun 6 19:03:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DE2F9C77B73 for ; Tue, 6 Jun 2023 19:07:10 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byy-0001OK-34; Tue, 06 Jun 2023 15:04:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3QYN_ZAsKCnAOQYSfZSmhbUUccUZS.QcaeSai-RSjSZbcbUbi.cfU@flex--ackerleytng.bounces.google.com>) id 1q6byx-0001Nq-3q for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:39 -0400 Received: from mail-pg1-x549.google.com ([2607:f8b0:4864:20::549]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3QYN_ZAsKCnAOQYSfZSmhbUUccUZS.QcaeSai-RSjSZbcbUbi.cfU@flex--ackerleytng.bounces.google.com>) id 1q6byu-00025j-EK for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:38 -0400 Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-5439aafb633so2784933a12.2 for ; Tue, 06 Jun 2023 12:04:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078273; x=1688670273; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=DUcXaEC48I4Bm1WwWPPkFcaYKFbZrdo62sI7Vn9jtKM=; b=EsMzcu7MCofZV1M6N8qLTVGfCobRb2m9UBd1k4rqImHaaV90eE+7yVgqGJTlHLVTGg WYLYzf9nuh7wT2wl0ong7Pm+JdEwy0XrWnw7A67dAqqNLmKKthrxFKAZNZCPEmg+Gm9l 1/jfZjJbtssIps71Hn10bsoNM3+P1vWpJqtgmuCPP7gyVh3snIw/tKY/Poj3vjuiFMIm hWDohY5CuxRIZJ1XCd6YRMkjU3OFa1a+IZrlt5R6w1j+nUqvdL8ryozxcFEC3Inu+RpX MrblT0sw+GiuzKbFJnN8KErIkyQh7K372xpyIXAqppgkeoG3kQADStwWWL0FGczDjczb yaGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078273; x=1688670273; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DUcXaEC48I4Bm1WwWPPkFcaYKFbZrdo62sI7Vn9jtKM=; b=YlxVJ4j36sB6spzIiq2dVksCPNsAmhyfcUFbgpS0lFqs1+KNAkREeooiDRHHD2OjXD p4GYVPT1tLnowOKqMDWUM4MSuIGhpgQ/SKSuGT/rbtJ0vWfY2WGSWF2gCF15Mwse9OW4 DcWiQn+f2NvVfM7nZ3W7pNoP4QbQsC0w4h9+/FJDUYediUClTHjKGxSfa/14s//b+YrH rStyGMNexzuhcoi0sUuhbr24O2FSgjyMhk9NrEhYpBfJ56s8IcKiRu9vmA/AafHxn2pG wr5ZbcOUaDYfCti9JCq6OzF8C7cCVFZ3vmtkcpIM68lnFTXrWF+GpZkkg0JRzIg0C0jm 56oA== X-Gm-Message-State: AC+VfDxGFnJ8aKt4axOYGFYlC0Npv13RkubymmGoDBWi1u8OJfDo1CkQ Yy3IGgcwt/Tp/TMqHYwGDjvFkQ9rXxke1ueUqQ== X-Google-Smtp-Source: ACHHUZ4v54k8mWqyJRdVfnADxQ7LxuS0BkTdcogjdbP6mmP0fUtChdx6VWMt3kG2rI14jmGLU8WGRs/GoPhwfnEqcg== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a65:5c8b:0:b0:53f:b396:6f32 with SMTP id a11-20020a655c8b000000b0053fb3966f32mr628734pgt.3.1686078273321; Tue, 06 Jun 2023 12:04:33 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:56 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <1d0337d32f40b781f9b7509cb40448b81bde6b00.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 11/19] mm: hugetlb: Parametrize hugetlb functions by resv_map From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::549; envelope-from=3QYN_ZAsKCnAOQYSfZSmhbUUccUZS.QcaeSai-RSjSZbcbUbi.cfU@flex--ackerleytng.bounces.google.com; helo=mail-pg1-x549.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Parametrize remove_mapping_hugepages() and hugetlb_unreserve_pages() by resv_map to remove the use of inode_resv_map() and decouple hugetlb with hugetlbfs. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 16 ++++++++++------ include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 4 ++-- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 53f6a421499d..a7791b1390a6 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -560,8 +560,8 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, */ static bool remove_mapping_single_folio( struct address_space *mapping, struct folio *folio, pgoff_t index, - struct hstate *h, struct hugepage_subpool *spool, struct inode *inode, - bool truncate_op) + struct hstate *h, struct hugepage_subpool *spool, struct resv_map *resv_map, + struct inode *inode, bool truncate_op) { bool ret = false; @@ -586,7 +586,8 @@ static bool remove_mapping_single_folio( hugetlb_delete_from_page_cache(folio); ret = true; if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(h, spool, inode, index, index + 1, 1))) + if (unlikely(hugetlb_unreserve_pages(h, spool, resv_map, + inode, index, index + 1, 1))) hugetlb_fix_reserve_counts(h, spool); } @@ -623,6 +624,7 @@ static bool remove_mapping_single_folio( */ void remove_mapping_hugepages(struct address_space *mapping, struct hstate *h, struct hugepage_subpool *spool, + struct resv_map *resv_map, struct inode *inode, loff_t lstart, loff_t lend) { const pgoff_t start = lstart >> huge_page_shift(h); @@ -647,7 +649,7 @@ void remove_mapping_hugepages(struct address_space *mapping, * Remove folio that was part of folio_batch. */ if (remove_mapping_single_folio(mapping, folio, index, - h, spool, inode, truncate_op)) + h, spool, resv_map, inode, truncate_op)) freed++; mutex_unlock(&hugetlb_fault_mutex_table[hash]); @@ -657,7 +659,8 @@ void remove_mapping_hugepages(struct address_space *mapping, } if (truncate_op) - (void)hugetlb_unreserve_pages(h, spool, inode, start, LONG_MAX, freed); + (void)hugetlb_unreserve_pages(h, spool, resv_map, inode, + start, LONG_MAX, freed); } void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) @@ -665,8 +668,9 @@ void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) struct address_space *mapping = &inode->i_data; struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); + struct resv_map *resv_map = inode_resv_map(inode); - return remove_mapping_hugepages(mapping, h, spool, inode, lstart, lend); + return remove_mapping_hugepages(mapping, h, spool, resv_map, inode, lstart, lend); } static void hugetlbfs_evict_inode(struct inode *inode) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d564802ace4b..af04588a5afe 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -172,7 +172,8 @@ bool hugetlb_reserve_pages(struct hstate *h, struct hugepage_subpool *spool, struct vm_area_struct *vma, vm_flags_t vm_flags); long hugetlb_unreserve_pages(struct hstate *h, struct hugepage_subpool *spool, - struct inode *inode, long start, long end, long freed); + struct resv_map *resv_map, struct inode *inode, + long start, long end, long freed); bool isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, @@ -263,6 +264,7 @@ void hugetlb_zero_partial_page(struct hstate *h, struct address_space *mapping, void remove_mapping_hugepages(struct address_space *mapping, struct hstate *h, struct hugepage_subpool *spool, + struct resv_map *resv_map, struct inode *inode, loff_t lstart, loff_t lend); void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend); @@ -479,7 +481,7 @@ static inline void hugetlb_zero_partial_page( static inline void remove_mapping_hugepages( struct address_space *mapping, struct hstate *h, struct hugepage_subpool *spool, - struct inode *inode, loff_t lstart, loff_t lend) {} + struct resv_map *resv_map, struct inode *inode, loff_t lstart, loff_t lend) {} static inline void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) {} #endif /* !CONFIG_HUGETLB_PAGE */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index aebdd8c63439..a1cbda457aa7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6954,9 +6954,9 @@ bool hugetlb_reserve_pages(struct hstate *h, struct hugepage_subpool *spool, * Returns 0 on success. */ long hugetlb_unreserve_pages(struct hstate *h, struct hugepage_subpool *spool, - struct inode *inode, long start, long end, long freed) + struct resv_map *resv_map, struct inode *inode, + long start, long end, long freed) { - struct resv_map *resv_map = inode_resv_map(inode); long chg = 0; long gbl_reserve; From patchwork Tue Jun 6 19:03:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6640DC77B7A for ; Tue, 6 Jun 2023 19:06:37 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byy-0001OX-Mi; Tue, 06 Jun 2023 15:04:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3Q4N_ZAsKCnIQSaUhbUojdWWeeWbU.SecgUck-TUlUbdedWdk.ehW@flex--ackerleytng.bounces.google.com>) id 1q6byx-0001Np-3e for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:39 -0400 Received: from mail-pf1-x449.google.com ([2607:f8b0:4864:20::449]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3Q4N_ZAsKCnIQSaUhbUojdWWeeWbU.SecgUck-TUlUbdedWdk.ehW@flex--ackerleytng.bounces.google.com>) id 1q6byv-00026U-8E for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:38 -0400 Received: by mail-pf1-x449.google.com with SMTP id d2e1a72fcca58-653a1cfb819so1498403b3a.0 for ; Tue, 06 Jun 2023 12:04:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078275; x=1688670275; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gCnAmjEfrZv4hWergm9cCKqGSfxeJstUIVXgvQlsrbM=; b=0RfjN0vYiusk5ioCQS1nAlCQ9SwNbpNkrAGoCc3DzbyiTfogeskfW/k35AUHJjbjvh qcmNficgsPHY7IzjLRZ7Fx7yHknnSVBialc9Lm2HDy+G56wJLuQdJfrCE1L/whTYztSX 0nMD1VG7QI0IeJ7NlJ0RqSFpSkllQWKP+8SH5M4Tworj/X9kHQ6JcXzt+NcdmrYuQcSD ftcLroNPDkUfGiCVt5bzuOWxk/1vZL8CVOd3YhjEG4x4yYR8uQvEHMIVWqF9xF36YlZ+ oP2GU9HlO/y1x/wbpPSXD25KiOaXofjmCzoeXIdkiuAzfcG5mWunKS1qCnW27bnXJIzu pIkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078275; x=1688670275; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gCnAmjEfrZv4hWergm9cCKqGSfxeJstUIVXgvQlsrbM=; b=bTVHY0wNV1XxM1ojr/4heQ1QAg9fQK9ALK4onz9cKOXl/sP0S1ugmHeN6CqyCSuWsI YkFD9fjHjXOZWObx6SDjFjDpr0V3Dx6JM2OUgeobw/+HeOKf8j3EtiU4CtrCeUDZm3ZM r4H1rNrFkUXQ2WVMQW6lEnG+tjnfc1arH32HAqJ2P92BfCf84Yw9M0Pf7n5IfV3/dWb/ EcO+HA6a1XBPNk9t0M/FmCwdGRVhRnqB5y62hwQ3atE7t1VtWp82yWxdDlZ6LzsC6+vq 4LIru7n3CtpBtdwWnN0J+sNNZY/Hq+oVmyh6G2NznqFMyKcvvGuDHJQP/dJ7o3pRobng gtsg== X-Gm-Message-State: AC+VfDxEp699CGL16hLQuccAP79fKyqHarctrXVTI9PAQZRntOI5ALQ5 BxVOo4QL2GkdvW7AOAg5wUPx41NNur5u4uDLdA== X-Google-Smtp-Source: ACHHUZ7oX0XJdw6EGaS/GrZQZU2JbZA6pR7i2na5qTVNAmby/QbfNP6VXG9VMvls4TqYrSTNJjMknJRBHdMlJ61Amg== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6a00:148d:b0:63b:234e:d641 with SMTP id v13-20020a056a00148d00b0063b234ed641mr1369654pfu.4.1686078275306; Tue, 06 Jun 2023 12:04:35 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:57 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <0c1144b9c5cd620cd0acf7ee033fef8d311b97ba.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 12/19] mm: truncate: Expose preparation steps for truncate_inode_pages_final From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::449; envelope-from=3Q4N_ZAsKCnIQSaUhbUojdWWeeWbU.SecgUck-TUlUbdedWdk.ehW@flex--ackerleytng.bounces.google.com; helo=mail-pf1-x449.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org This will allow preparation steps to be shared Signed-off-by: Ackerley Tng --- include/linux/mm.h | 1 + mm/truncate.c | 24 ++++++++++++++---------- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f79667824eb..7a8f6b810de0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3053,6 +3053,7 @@ extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info); extern void truncate_inode_pages(struct address_space *, loff_t); extern void truncate_inode_pages_range(struct address_space *, loff_t lstart, loff_t lend); +extern void truncate_inode_pages_final_prepare(struct address_space *mapping); extern void truncate_inode_pages_final(struct address_space *); /* generic vm_area_ops exported for stackable file systems */ diff --git a/mm/truncate.c b/mm/truncate.c index 7b4ea4c4a46b..4a7ae87e03b5 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -449,16 +449,7 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart) } EXPORT_SYMBOL(truncate_inode_pages); -/** - * truncate_inode_pages_final - truncate *all* pages before inode dies - * @mapping: mapping to truncate - * - * Called under (and serialized by) inode->i_rwsem. - * - * Filesystems have to use this in the .evict_inode path to inform the - * VM that this is the final truncate and the inode is going away. - */ -void truncate_inode_pages_final(struct address_space *mapping) +void truncate_inode_pages_final_prepare(struct address_space *mapping) { /* * Page reclaim can not participate in regular inode lifetime @@ -479,7 +470,20 @@ void truncate_inode_pages_final(struct address_space *mapping) xa_lock_irq(&mapping->i_pages); xa_unlock_irq(&mapping->i_pages); } +} +/** + * truncate_inode_pages_final - truncate *all* pages before inode dies + * @mapping: mapping to truncate + * + * Called under (and serialized by) inode->i_rwsem. + * + * Filesystems have to use this in the .evict_inode path to inform the + * VM that this is the final truncate and the inode is going away. + */ +void truncate_inode_pages_final(struct address_space *mapping) +{ + truncate_inode_pages_final_prepare(mapping); truncate_inode_pages(mapping, 0); } EXPORT_SYMBOL(truncate_inode_pages_final); From patchwork Tue Jun 6 19:03:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6C0F7C7EE2A for ; Tue, 6 Jun 2023 19:06:19 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6byz-0001Ov-IK; Tue, 06 Jun 2023 15:04:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3RYN_ZAsKCnQSUcWjdWqlfYYggYdW.UgeiWem-VWnWdfgfYfm.gjY@flex--ackerleytng.bounces.google.com>) id 1q6byy-0001OW-KV for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:40 -0400 Received: from mail-pj1-x1049.google.com ([2607:f8b0:4864:20::1049]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3RYN_ZAsKCnQSUcWjdWqlfYYggYdW.UgeiWem-VWnWdfgfYfm.gjY@flex--ackerleytng.bounces.google.com>) id 1q6byw-000276-U4 for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:40 -0400 Received: by mail-pj1-x1049.google.com with SMTP id 98e67ed59e1d1-25665d2a250so5868042a91.0 for ; Tue, 06 Jun 2023 12:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078277; x=1688670277; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=l7m/+H3bbKiFYeplsxMS6SzLQH+bOrcqP3X6Nh5dkpY=; b=14V+ikDqPeP4Yv1OdmN8nd7QZ5MMjeGKWKaRSYFOZag9l1JWEXWB4ucnpLMnABlXZA EKyjA/Hep4b5XPLE3jo+/oAtPajUrqdw+KeK/h9os1d0usuctIgWW9M+esmzYCfBXl4x ASFKNceH4Dvr96zHhOZ0uM2fLtsPSrNxdz17wPPKIRFX8xmJyoeEp2adnxvZEKqN8X4Y OM4Rksx4vK290PDLeLyuR9aPGnO/Zd4sMVHFMKGC9zk+GBNtZUBayI8BLZ7Yas0LD8pw GLTAE9YptousowvGxGw0V5tbYjLuLpB0CEXyROZYUG+NBDy7fuPXlo1tE0nXFLN+WEBW wc/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078277; x=1688670277; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l7m/+H3bbKiFYeplsxMS6SzLQH+bOrcqP3X6Nh5dkpY=; b=ZSe2ujfxBrBowDNGdxPZ7dLBSOxRIJVwJPP3j98ZBQUFMsE9TzK7PLFUpEYQr2GjJZ zwG6DBUnHaq25oybxcsOYN+tqlQpmIDHmcpFYGzCRDLg0A+8+V7IK4Sm2U8Af4efjAtI Oi6c5uCIu7QzlyVVvllTz2l9IB0tO8PAog89gcKUF57dz45AvLvLtyXcSfepcfOLtJAK HPDMgxDt12/utXugGc9Xq+X9nW/BtSQjHpmbu+evN/LqnQI00f2kV1QS/E0h31PEc7+J E9khG85i2qc727Gyie922KUMgIe7k3GV2Toe5GTNBsZ7dWLXBWxE5/sk8Na8Mu55DhIM pojA== X-Gm-Message-State: AC+VfDzqAE/iqkv/ogkaBaDjH5Jrg2jy/+QzUabUylQLqhXELJbNptJ1 g58tGqUbG44rbFsNhjWBA5kYsUSFNibWQE5gQA== X-Google-Smtp-Source: ACHHUZ400PPpzSqOQyBrlXwnbK9H57gIHhF9scqTSjoAqyx8OcXSiJ7wdR2icZ3OG98VTDSuHd/HQ93iA5Sxt1cqNg== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:90b:3003:b0:253:4808:7587 with SMTP id hg3-20020a17090b300300b0025348087587mr772798pjb.7.1686078277257; Tue, 06 Jun 2023 12:04:37 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:58 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <67168b07e8d4a0c714fce5f030671b376d8ca001.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 13/19] KVM: guest_mem: Refactor kvm_gmem fd creation to be in layers From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::1049; envelope-from=3RYN_ZAsKCnQSUcWjdWqlfYYggYdW.UgeiWem-VWnWdfgfYfm.gjY@flex--ackerleytng.bounces.google.com; helo=mail-pj1-x1049.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org First create a gmem inode, then create a gmem file using the inode, then install the file into an fd. Creating the file in layers separates inode concepts (struct kvm_gmem) from file concepts and makes cleaning up in stages neater. Signed-off-by: Ackerley Tng --- virt/kvm/guest_mem.c | 86 +++++++++++++++++++++++++------------------- 1 file changed, 50 insertions(+), 36 deletions(-) diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index 8708139822d3..2f69ef666871 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -375,41 +375,27 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, }; -static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags, - struct vfsmount *mnt) +static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u64 flags, + struct vfsmount *mnt) { + int err; + struct inode *inode; + struct kvm_gmem *gmem; const char *anon_name = "[kvm-gmem]"; const struct qstr qname = QSTR_INIT(anon_name, strlen(anon_name)); - struct kvm_gmem *gmem; - struct inode *inode; - struct file *file; - int fd, err; - - fd = get_unused_fd_flags(0); - if (fd < 0) - return fd; inode = alloc_anon_inode(mnt->mnt_sb); - if (IS_ERR(inode)) { - err = PTR_ERR(inode); - goto err_fd; - } + if (IS_ERR(inode)) + return inode; err = security_inode_init_security_anon(inode, &qname, NULL); if (err) goto err_inode; - file = alloc_file_pseudo(inode, mnt, "kvm-gmem", O_RDWR, &kvm_gmem_fops); - if (IS_ERR(file)) { - err = PTR_ERR(file); - goto err_inode; - } - + err = -ENOMEM; gmem = kzalloc(sizeof(*gmem), GFP_KERNEL); - if (!gmem) { - err = -ENOMEM; - goto err_file; - } + if (!gmem) + goto err_inode; xa_init(&gmem->bindings); @@ -426,24 +412,41 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags, mapping_set_large_folios(inode->i_mapping); mapping_set_unevictable(inode->i_mapping); - file->f_flags |= O_LARGEFILE; - file->f_mapping = inode->i_mapping; - file->private_data = gmem; - - fd_install(fd, file); - return fd; + return inode; -err_file: - fput(file); err_inode: iput(inode); -err_fd: - put_unused_fd(fd); - return err; + return ERR_PTR(err); +} + + +static struct file *kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags, + struct vfsmount *mnt) +{ + struct file *file; + struct inode *inode; + + inode = kvm_gmem_create_inode(kvm, size, flags, mnt); + if (IS_ERR(inode)) + return ERR_CAST(inode); + + file = alloc_file_pseudo(inode, mnt, "kvm-gmem", O_RDWR, &kvm_gmem_fops); + if (IS_ERR(file)) { + iput(inode); + return file; + } + + file->f_flags |= O_LARGEFILE; + file->f_mapping = inode->i_mapping; + file->private_data = inode->i_mapping->private_data; + + return file; } int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) { + int fd; + struct file *file; loff_t size = gmem->size; u64 flags = gmem->flags; @@ -462,7 +465,18 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) #endif } - return __kvm_gmem_create(kvm, size, flags, kvm_gmem_mnt); + fd = get_unused_fd_flags(0); + if (fd < 0) + return fd; + + file = kvm_gmem_create_file(kvm, size, flags, kvm_gmem_mnt); + if (IS_ERR(file)) { + put_unused_fd(fd); + return PTR_ERR(file); + } + + fd_install(fd, file); + return fd; } int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, From patchwork Tue Jun 6 19:03:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E295DC7EE2A for ; Tue, 6 Jun 2023 19:06:51 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bz2-0001Q5-1i; Tue, 06 Jun 2023 15:04:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3R4N_ZAsKCnYUWeYlfYsnhaaiiafY.WigkYgo-XYpYfhihaho.ila@flex--ackerleytng.bounces.google.com>) id 1q6bz0-0001Pr-OQ for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:42 -0400 Received: from mail-pl1-x649.google.com ([2607:f8b0:4864:20::649]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3R4N_ZAsKCnYUWeYlfYsnhaaiiafY.WigkYgo-XYpYfhihaho.ila@flex--ackerleytng.bounces.google.com>) id 1q6byy-00028D-TL for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:42 -0400 Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1b24203563fso2139725ad.2 for ; Tue, 06 Jun 2023 12:04:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078279; x=1688670279; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wXM9JnRppggpXADZsSU7oZSsG9y4s1gBipK1U/D+HWI=; b=VWeSODpWaBqW4mQpMFylMZsGjqz9x2cJWtAJkF8pA6jBSLPJ/ks6MKmLjF0le7g30t JnmedBVXzjwth3hzxEtFKbLpgTPpii8AIYI3pjk/s5jbK19NEFa6oqTQ13vB92hVQiL8 ViOM5skTjTy0akJXm7D62mzzCwjuX5DC11svxqniq47zqTKpG8FGJ6kwksqVcEgjbsdr Q2K/qR6ANqCdLydEwlvpleG5wsmCc5RNcdR6s83XxomLHgCDxLDz6X0Nkjd7UZ2zKFsM UtKOTh+fe/f2a9gAwgpw1N4/+Q8OfnwiS90nL6ITis3W2ib2nbVxkWVf59u885V77XDE HbzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078279; x=1688670279; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wXM9JnRppggpXADZsSU7oZSsG9y4s1gBipK1U/D+HWI=; b=HmXUMV0j1T9F/KiVt0pHmglPq9F4x9E/dBFIMW6WUQOz1fD21wfoHkjgwWb1zCuomT ivhx1FAaV1UMu5LtWBjh65H2kAH7Lll2YZXjTOA3+vEYfgu9gimR07ArIadLzqe4PdjP vmb25yFQNL6LybLbO+wzuairg7ERd+HrUdGlPA9wSsQTUwly5gj6QFGgOfnVGAK7CLtM GDsd6KP0+N9kUjk1EYeCBnRZCwCPl3XftL3qbbuUURg1D8fmmSj3NDMGxHKtwfVscwtN m3ZbAdXJNdNn52v27U3qiaDPxs5lliMN/mHwTLSyxwnPa72tZ2X9XmTEQh5Hgik6ed+n r4nw== X-Gm-Message-State: AC+VfDzcaRG2JnqXdsNlwz1LNUSmq+0u8Jgr7kFPDBcd+lre857WvTfP F73mANL3cHW7lhPbfyWTJtoUyU8u1uNv1Pa7qQ== X-Google-Smtp-Source: ACHHUZ7ZdtfDieT/Oervi6LGRC4PY5HZ9dIjkRC61ZAb5O1fokMpY3zuDqGBxtoVMfmrEF+ZlDZD/kzUef5/lnoKEw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:902:7c03:b0:1b2:690:23ff with SMTP id x3-20020a1709027c0300b001b2069023ffmr991531pll.11.1686078279112; Tue, 06 Jun 2023 12:04:39 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:59 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <44ac051fab6315161905949caefed78381b6c5fe.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 14/19] KVM: guest_mem: Refactor cleanup to separate inode and file cleanup From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::649; envelope-from=3R4N_ZAsKCnYUWeYlfYsnhaaiiafY.WigkYgo-XYpYfhihaho.ila@flex--ackerleytng.bounces.google.com; helo=mail-pl1-x649.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Cleanup in kvm_gmem_release() should be the reverse of kvm_gmem_create_file(). Cleanup in kvm_gmem_evict_inode() should be the reverse of kvm_gmem_create_inode(). Signed-off-by: Ackerley Tng --- virt/kvm/guest_mem.c | 105 +++++++++++++++++++++++++++++-------------- 1 file changed, 71 insertions(+), 34 deletions(-) diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index 2f69ef666871..13253af40be6 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -247,42 +247,13 @@ static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset, static int kvm_gmem_release(struct inode *inode, struct file *file) { - struct kvm_gmem *gmem = inode->i_mapping->private_data; - struct kvm_memory_slot *slot; - struct kvm *kvm = gmem->kvm; - unsigned long index; - /* - * Prevent concurrent attempts to *unbind* a memslot. This is the last - * reference to the file and thus no new bindings can be created, but - * deferencing the slot for existing bindings needs to be protected - * against memslot updates, specifically so that unbind doesn't race - * and free the memslot (kvm_gmem_get_file() will return NULL). + * This is called when the last reference to the file is released. Only + * clean up file-related stuff. struct kvm_gmem is also referred to in + * the inode, so clean that up in kvm_gmem_evict_inode(). */ - mutex_lock(&kvm->slots_lock); - - xa_for_each(&gmem->bindings, index, slot) - rcu_assign_pointer(slot->gmem.file, NULL); - - synchronize_rcu(); - - /* - * All in-flight operations are gone and new bindings can be created. - * Free the backing memory, and more importantly, zap all SPTEs that - * pointed at this file. - */ - kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul); - truncate_inode_pages_final(file->f_mapping); - kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul); - - mutex_unlock(&kvm->slots_lock); - - WARN_ON_ONCE(!(mapping_empty(file->f_mapping))); - - xa_destroy(&gmem->bindings); - kfree(gmem); - - kvm_put_kvm(kvm); + file->f_mapping = NULL; + file->private_data = NULL; return 0; } @@ -603,11 +574,77 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, } EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); +static void kvm_gmem_evict_inode(struct inode *inode) +{ + struct kvm_gmem *gmem = inode->i_mapping->private_data; + struct kvm_memory_slot *slot; + struct kvm *kvm; + unsigned long index; + + /* + * If iput() was called before inode is completely set up due to some + * error in kvm_gmem_create_inode(), gmem will be NULL. + */ + if (!gmem) + goto basic_cleanup; + + kvm = gmem->kvm; + + /* + * Prevent concurrent attempts to *unbind* a memslot. This is the last + * reference to the file and thus no new bindings can be created, but + * deferencing the slot for existing bindings needs to be protected + * against memslot updates, specifically so that unbind doesn't race + * and free the memslot (kvm_gmem_get_file() will return NULL). + */ + mutex_lock(&kvm->slots_lock); + + xa_for_each(&gmem->bindings, index, slot) + rcu_assign_pointer(slot->gmem.file, NULL); + + synchronize_rcu(); + + /* + * All in-flight operations are gone and new bindings can be created. + * Free the backing memory, and more importantly, zap all SPTEs that + * pointed at this file. + */ + kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul); + truncate_inode_pages_final(inode->i_mapping); + kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul); + + mutex_unlock(&kvm->slots_lock); + + WARN_ON_ONCE(!(mapping_empty(inode->i_mapping))); + + xa_destroy(&gmem->bindings); + kfree(gmem); + + kvm_put_kvm(kvm); + +basic_cleanup: + clear_inode(inode); +} + +static const struct super_operations kvm_gmem_super_operations = { + /* + * TODO update statfs handler for kvm_gmem. What should the statfs + * handler return? + */ + .statfs = simple_statfs, + .evict_inode = kvm_gmem_evict_inode, +}; + static int kvm_gmem_init_fs_context(struct fs_context *fc) { + struct pseudo_fs_context *ctx; + if (!init_pseudo(fc, GUEST_MEMORY_MAGIC)) return -ENOMEM; + ctx = fc->fs_private; + ctx->ops = &kvm_gmem_super_operations; + return 0; } From patchwork Tue Jun 6 19:04:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B5F83C7EE2A for ; Tue, 6 Jun 2023 19:06:01 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bz4-0001RI-8g; Tue, 06 Jun 2023 15:04:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3SIN_ZAsKCncVXfZmgZtoibbjjbgZ.XjhlZhp-YZqZgijibip.jmb@flex--ackerleytng.bounces.google.com>) id 1q6bz2-0001Q6-5T for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:44 -0400 Received: from mail-pl1-x649.google.com ([2607:f8b0:4864:20::649]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3SIN_ZAsKCncVXfZmgZtoibbjjbgZ.XjhlZhp-YZqZgijibip.jmb@flex--ackerleytng.bounces.google.com>) id 1q6bz0-000298-6Q for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:43 -0400 Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1b03f9dfd52so25771935ad.3 for ; Tue, 06 Jun 2023 12:04:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078280; x=1688670280; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qPbJS0NeJm0ebjWuWAGTg4zHB+jUkimTjx0FXzxIc84=; b=08oQhpNB5bPsCTiYZb6zzITcl3NrY7Nrj0gHP7CL4wEm89iCZHqEqZvXMnagv03rkn GPexWIBMA+li5aiwDX4ihlFm6cVZOeax1+AHpQmOU2YD/QwBli/TC1j7FXVthdfotrbj ohAd8tgwk7CZ9llZkCvOzoa+H0uBPFBXEcWSK2TcUhtWGacdRDhynTXSQrFRtxSj03ll +4JVjB3yWPlWCPldFPbqjqq8D3jeR9j4nRL0TBHcuQBOxSh7kt2ZiDSE77C+EmED3XTt Cx6sMOv8frrdr7pYnwu4yVAynavbLFmrX0Dakhf26NBqNaaItWBILzjwUpL7K2OiHx+L mttg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078280; x=1688670280; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qPbJS0NeJm0ebjWuWAGTg4zHB+jUkimTjx0FXzxIc84=; b=YcPKwBxLJYQOuYBqX6c93+FmwIVvJN5XMcpIknPtIpqV07gLFNDVO1shlhMdZIUZK3 MutOHmtaEdPc8cB4liCP7nvD++mtHGT2X6QoKzc4nYmfOtToXeJDakaTm6jPcXGqQLuj kTaT2xOe8awRSbp3F+KMN4ipE68CDbRjIjMb/mZRs8CooDRHRyKly+jrDe5iphOJIJC2 8THik8P19gZmeENOVqUnWmtL3QuvNYRoWbt6Y1xmi1eBXeEnN+z2q/iL/Y6rCG8HneY2 SzKjfTHwz+CKpPSfe/lZ8cjJRn9pgSGpI1nggKP9StsIF5Yd+b0GGBXn+VOr9MSXOc3I YyxA== X-Gm-Message-State: AC+VfDz22BN5TdWdK5103PFa3Tw2pSG6HyVPfYtul/bpXmcv0f3MRs8P Korbymk6+1agcWhcKhasJ8j8dZPgjlueItX0eA== X-Google-Smtp-Source: ACHHUZ7+W5XWqHy7WGLlye5fNuolaabE941Ydb/Msxa478YJYlgHnPlrM8+Gzs9sRjg5WLNJlGVtgTPDm/CmJ9Gc2A== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:902:da8c:b0:1b0:4b1d:26e1 with SMTP id j12-20020a170902da8c00b001b04b1d26e1mr956600plx.8.1686078280564; Tue, 06 Jun 2023 12:04:40 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:00 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 15/19] KVM: guest_mem: hugetlb: initialization and cleanup From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::649; envelope-from=3SIN_ZAsKCncVXfZmgZtoibbjjbgZ.XjhlZhp-YZqZgijibip.jmb@flex--ackerleytng.bounces.google.com; helo=mail-pl1-x649.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org First stage of hugetlb support: add initialization and cleanup routines Signed-off-by: Ackerley Tng --- include/uapi/linux/kvm.h | 25 ++++++++++++ virt/kvm/guest_mem.c | 88 +++++++++++++++++++++++++++++++++++++--- 2 files changed, 108 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0fa665e8862a..1df0c802c29f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -13,6 +13,7 @@ #include #include #include +#include #define KVM_API_VERSION 12 @@ -2280,6 +2281,30 @@ struct kvm_memory_attributes { #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) #define KVM_GUEST_MEMFD_HUGE_PMD (1ULL << 0) +#define KVM_GUEST_MEMFD_HUGETLB (1ULL << 1) + +/* + * Huge page size encoding when KVM_GUEST_MEMFD_HUGETLB is specified, and a huge + * page size other than the default is desired. See hugetlb_encode.h. All + * known huge page size encodings are provided here. It is the responsibility + * of the application to know which sizes are supported on the running system. + * See mmap(2) man page for details. + */ +#define KVM_GUEST_MEMFD_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT +#define KVM_GUEST_MEMFD_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK + +#define KVM_GUEST_MEMFD_HUGE_64KB HUGETLB_FLAG_ENCODE_64KB +#define KVM_GUEST_MEMFD_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB +#define KVM_GUEST_MEMFD_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB +#define KVM_GUEST_MEMFD_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB +#define KVM_GUEST_MEMFD_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB +#define KVM_GUEST_MEMFD_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB +#define KVM_GUEST_MEMFD_HUGE_32MB HUGETLB_FLAG_ENCODE_32MB +#define KVM_GUEST_MEMFD_HUGE_256MB HUGETLB_FLAG_ENCODE_256MB +#define KVM_GUEST_MEMFD_HUGE_512MB HUGETLB_FLAG_ENCODE_512MB +#define KVM_GUEST_MEMFD_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB +#define KVM_GUEST_MEMFD_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB +#define KVM_GUEST_MEMFD_HUGE_16GB HUGETLB_FLAG_ENCODE_16GB struct kvm_create_guest_memfd { __u64 size; diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index 13253af40be6..b533143e2878 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -19,6 +19,7 @@ #include #include #include +#include #include @@ -30,6 +31,11 @@ struct kvm_gmem { struct kvm *kvm; u64 flags; struct xarray bindings; + struct { + struct hstate *h; + struct hugepage_subpool *spool; + struct resv_map *resv_map; + } hugetlb; }; static loff_t kvm_gmem_get_size(struct file *file) @@ -346,6 +352,46 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, }; +static int kvm_gmem_hugetlb_setup(struct inode *inode, struct kvm_gmem *gmem, + loff_t size, u64 flags) +{ + int page_size_log; + int hstate_idx; + long hpages; + struct resv_map *resv_map; + struct hugepage_subpool *spool; + struct hstate *h; + + page_size_log = (flags >> KVM_GUEST_MEMFD_HUGE_SHIFT) & KVM_GUEST_MEMFD_HUGE_MASK; + hstate_idx = get_hstate_idx(page_size_log); + if (hstate_idx < 0) + return -ENOENT; + + h = &hstates[hstate_idx]; + /* Round up to accommodate size requests that don't align with huge pages */ + hpages = round_up(size, huge_page_size(h)) >> huge_page_shift(h); + spool = hugepage_new_subpool(h, hpages, hpages); + if (!spool) + goto out; + + resv_map = resv_map_alloc(); + if (!resv_map) + goto out_subpool; + + inode->i_blkbits = huge_page_shift(h); + + gmem->hugetlb.h = h; + gmem->hugetlb.spool = spool; + gmem->hugetlb.resv_map = resv_map; + + return 0; + +out_subpool: + kfree(spool); +out: + return -ENOMEM; +} + static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u64 flags, struct vfsmount *mnt) { @@ -368,6 +414,12 @@ static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u64 fla if (!gmem) goto err_inode; + if (flags & KVM_GUEST_MEMFD_HUGETLB) { + err = kvm_gmem_hugetlb_setup(inode, gmem, size, flags); + if (err) + goto err_gmem; + } + xa_init(&gmem->bindings); kvm_get_kvm(kvm); @@ -385,6 +437,8 @@ static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u64 fla return inode; +err_gmem: + kfree(gmem); err_inode: iput(inode); return ERR_PTR(err); @@ -414,6 +468,8 @@ static struct file *kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags return file; } +#define KVM_GUEST_MEMFD_ALL_FLAGS (KVM_GUEST_MEMFD_HUGE_PMD | KVM_GUEST_MEMFD_HUGETLB) + int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) { int fd; @@ -424,8 +480,15 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) if (size < 0 || !PAGE_ALIGNED(size)) return -EINVAL; - if (flags & ~KVM_GUEST_MEMFD_HUGE_PMD) - return -EINVAL; + if (!(flags & KVM_GUEST_MEMFD_HUGETLB)) { + if (flags & ~(unsigned int)KVM_GUEST_MEMFD_ALL_FLAGS) + return -EINVAL; + } else { + /* Allow huge page size encoding in flags. */ + if (flags & ~(unsigned int)(KVM_GUEST_MEMFD_ALL_FLAGS | + (KVM_GUEST_MEMFD_HUGE_MASK << KVM_GUEST_MEMFD_HUGE_SHIFT))) + return -EINVAL; + } if (flags & KVM_GUEST_MEMFD_HUGE_PMD) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -610,7 +673,17 @@ static void kvm_gmem_evict_inode(struct inode *inode) * pointed at this file. */ kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul); - truncate_inode_pages_final(inode->i_mapping); + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) { + truncate_inode_pages_final_prepare(inode->i_mapping); + remove_mapping_hugepages( + inode->i_mapping, gmem->hugetlb.h, gmem->hugetlb.spool, + gmem->hugetlb.resv_map, inode, 0, LLONG_MAX); + + resv_map_release(&gmem->hugetlb.resv_map->refs); + hugepage_put_subpool(gmem->hugetlb.spool); + } else { + truncate_inode_pages_final(inode->i_mapping); + } kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul); mutex_unlock(&kvm->slots_lock); @@ -688,10 +761,15 @@ bool kvm_gmem_check_alignment(const struct kvm_userspace_memory_region2 *mem) { size_t page_size; - if (mem->flags & KVM_GUEST_MEMFD_HUGE_PMD) + if (mem->flags & KVM_GUEST_MEMFD_HUGETLB) { + size_t page_size_log = ((mem->flags >> KVM_GUEST_MEMFD_HUGE_SHIFT) + & KVM_GUEST_MEMFD_HUGE_MASK); + page_size = 1UL << page_size_log; + } else if (mem->flags & KVM_GUEST_MEMFD_HUGE_PMD) { page_size = HPAGE_PMD_SIZE; - else + } else { page_size = PAGE_SIZE; + } return (IS_ALIGNED(mem->gmem_offset, page_size) && IS_ALIGNED(mem->memory_size, page_size)); From patchwork Tue Jun 6 19:04:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269568 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E3646C7EE31 for ; Tue, 6 Jun 2023 19:04:57 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bz6-0001SN-Lr; Tue, 06 Jun 2023 15:04:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3SoN_ZAsKCnkXZhboibvqkddlldib.Zljnbjr-absbiklkdkr.lod@flex--ackerleytng.bounces.google.com>) id 1q6bz3-0001R0-Lo for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:45 -0400 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3SoN_ZAsKCnkXZhboibvqkddlldib.Zljnbjr-absbiklkdkr.lod@flex--ackerleytng.bounces.google.com>) id 1q6bz1-0002B3-JN for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:45 -0400 Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bb39316a68eso1899791276.0 for ; Tue, 06 Jun 2023 12:04:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078282; x=1688670282; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZBX6a15Tnsz/uKq26LcenmelaNrPaeGwC3t0ZYHlu0o=; b=SiIGZBUhZs629JysZEAyyBg7ttu9GLTQayTGdSZGve4dLO6zhBZcJhFcd1G73mhuek LHQGWYqLVU6XGceDTMr/+ObLmxrXdmeyPE4htH53qamVc0jx3GxeXvlfVB9JCraAIS1o KEzaHjtn2Qo3RoZRS3XAY/vwbMPt0gdnui/cnheq8vpMkjsxIO1Je1R+Or+Zif0wezPw uUYILrxrg8kwyk8/xo82WMddoKWumnm5vCXiz84rf/1WA32vat2RPpMWIr80UC3bv5gu +n3kSoBT8FJFvccnZi21mVuxGSb5VrWJMuxYUlyl2Etedh0yvhauw6W7NlVmyqj5TDoC ZVSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078282; x=1688670282; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZBX6a15Tnsz/uKq26LcenmelaNrPaeGwC3t0ZYHlu0o=; b=aPA2/5/R15t5LFG/BzSV7JiBEKl+4Rx9Hd6pDHzt78iTu0iY+uZBC6/pxgHLRttF0K Ayx0NrbouVcVA93LzLUMGoNTV7GybUOtEPeI7sieBay+UBCzk/yM3vudK7neJG/TFpWV u1blJTEQV9VSg+fXZCII/rBkZCnok6QRQFvfdiu8QuNiN3pCD3qttugVMUzMQ7Ws7hcb LSgDTh8NVXsaAU5yxZWYsI/IeRJmhSC5aKfi6Om0NLFVIRuKBJaLhpNPZKg9V45cWAb4 KXHQQbknVq5kyyc5Yp5mY1CTcwUpl1O/iNm2s1oxSjDUosR1K8qWp4AonKsVD9Z+Ztin XDPg== X-Gm-Message-State: AC+VfDzcwRmTciX0ZKADQbwo3oqR/Eu6LoSKYapP/ag4/bU6cDcfZYII XzUIaO079pg8yo5g/qw4n0vsTD3rHrigyPxoMQ== X-Google-Smtp-Source: ACHHUZ5Cpne/EIdT1I7Li9OeDmgR0RpPGqTGa8YIhJkObOQifrmAjwQv0QQeKIDO8N9Vq8SkzWw7xg09f24itua+cw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a25:2446:0:b0:b9a:703d:e650 with SMTP id k67-20020a252446000000b00b9a703de650mr1068675ybk.7.1686078282407; Tue, 06 Jun 2023 12:04:42 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:01 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <18e518695854cc7243866d7b1be2fbbb3aa87c71.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 16/19] KVM: guest_mem: hugetlb: allocate and truncate from hugetlb From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::b49; envelope-from=3SoN_ZAsKCnkXZhboibvqkddlldib.Zljnbjr-absbiklkdkr.lod@flex--ackerleytng.bounces.google.com; helo=mail-yb1-xb49.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Introduce kvm_gmem_hugetlb_get_folio(), then update kvm_gmem_allocate() and kvm_gmem_truncate() to use hugetlb functions. Signed-off-by: Ackerley Tng --- virt/kvm/guest_mem.c | 215 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 188 insertions(+), 27 deletions(-) diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index b533143e2878..6271621f6b73 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -43,6 +43,95 @@ static loff_t kvm_gmem_get_size(struct file *file) return i_size_read(file_inode(file)); } +static struct folio *kvm_gmem_hugetlb_alloc_and_cache_folio( + struct file *file, pgoff_t hindex) +{ + int err; + struct folio *folio; + struct kvm_gmem *gmem; + struct hstate *h; + struct resv_map *resv_map; + unsigned long offset; + struct vm_area_struct pseudo_vma; + + gmem = file->private_data; + h = gmem->hugetlb.h; + resv_map = gmem->hugetlb.resv_map; + offset = hindex << huge_page_shift(h); + + vma_init(&pseudo_vma, NULL); + vm_flags_init(&pseudo_vma, VM_HUGETLB | VM_MAYSHARE | VM_SHARED); + /* vma infrastructure is dependent on vm_file being set */ + pseudo_vma.vm_file = file; + + /* TODO setup NUMA policy. Meanwhile, fallback to get_task_policy(). */ + pseudo_vma.vm_policy = NULL; + folio = alloc_hugetlb_folio_from_subpool( + gmem->hugetlb.spool, h, resv_map, &pseudo_vma, offset, 0); + /* Remember to take and drop refcount from vm_policy */ + if (IS_ERR(folio)) + return folio; + + /* + * FIXME: Skip clearing pages when trusted firmware will do it when + * assigning memory to the guest. + */ + clear_huge_page(&folio->page, offset, pages_per_huge_page(h)); + __folio_mark_uptodate(folio); + err = hugetlb_filemap_add_folio(file->f_mapping, h, folio, hindex); + if (unlikely(err)) { + restore_reserve_on_error(resv_map, hindex, true, folio); + folio_put(folio); + folio = ERR_PTR(err); + } + + return folio; +} + +/** + * Gets a hugetlb folio, from @file, at @index (in terms of PAGE_SIZE) within + * the file. + * + * The returned folio will be in @file's page cache, and locked. + */ +static struct folio *kvm_gmem_hugetlb_get_folio(struct file *file, pgoff_t index) +{ + struct folio *folio; + u32 hash; + /* hindex is in terms of huge_page_size(h) and not PAGE_SIZE */ + pgoff_t hindex; + struct kvm_gmem *gmem; + struct hstate *h; + struct address_space *mapping; + + gmem = file->private_data; + h = gmem->hugetlb.h; + hindex = index >> huge_page_order(h); + + mapping = file->f_mapping; + hash = hugetlb_fault_mutex_hash(mapping, hindex); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + + rcu_read_lock(); + folio = filemap_lock_folio(mapping, hindex); + rcu_read_unlock(); + if (folio) + goto folio_valid; + + folio = kvm_gmem_hugetlb_alloc_and_cache_folio(file, hindex); + /* + * TODO Perhaps the interface of kvm_gmem_get_folio should change to better + * report errors + */ + if (IS_ERR(folio)) + folio = NULL; + +folio_valid: + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + + return folio; +} + static struct folio *kvm_gmem_get_huge_folio(struct file *file, pgoff_t index) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -74,36 +163,56 @@ static struct folio *kvm_gmem_get_huge_folio(struct file *file, pgoff_t index) #endif } +/** + * Gets a folio, from @file, at @index (in terms of PAGE_SIZE) within the file. + * + * The returned folio will be in @file's page cache and locked. + */ static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index) { struct folio *folio; + struct kvm_gmem *gmem = file->private_data; - folio = kvm_gmem_get_huge_folio(file, index); - if (!folio) { - folio = filemap_grab_folio(file->f_mapping, index); + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) { + folio = kvm_gmem_hugetlb_get_folio(file, index); + + /* hugetlb gmem does not fall back to non-hugetlb pages */ if (!folio) return NULL; - } - /* - * TODO: Confirm this won't zero in-use pages, and skip clearing pages - * when trusted firmware will do it when assigning memory to the guest. - */ - if (!folio_test_uptodate(folio)) { - unsigned long nr_pages = folio_nr_pages(folio); - unsigned long i; + /* + * Don't need to clear pages because + * kvm_gmem_hugetlb_alloc_and_cache_folio() already clears pages + * when allocating + */ + } else { + folio = kvm_gmem_get_huge_folio(file, index); + if (!folio) { + folio = filemap_grab_folio(file->f_mapping, index); + if (!folio) + return NULL; + } - for (i = 0; i < nr_pages; i++) - clear_highpage(folio_page(folio, i)); - } + /* + * TODO: Confirm this won't zero in-use pages, and skip clearing pages + * when trusted firmware will do it when assigning memory to the guest. + */ + if (!folio_test_uptodate(folio)) { + unsigned long nr_pages = folio_nr_pages(folio); + unsigned long i; - /* - * filemap_grab_folio() uses FGP_ACCESSED, which already called - * folio_mark_accessed(), so we clear it. - * TODO: Should we instead be clearing this when truncating? - * TODO: maybe don't use FGP_ACCESSED at all and call __filemap_get_folio directly. - */ - folio_clear_referenced(folio); + for (i = 0; i < nr_pages; i++) + clear_highpage(folio_page(folio, i)); + } + + /* + * filemap_grab_folio() uses FGP_ACCESSED, which already called + * folio_mark_accessed(), so we clear it. + * TODO: Should we instead be clearing this when truncating? + * TODO: maybe don't use FGP_ACCESSED at all and call __filemap_get_folio directly. + */ + folio_clear_referenced(folio); + } /* * Indicate that this folio matches the backing store (in this case, has @@ -156,6 +265,44 @@ static void kvm_gmem_invalidate_end(struct kvm *kvm, struct kvm_gmem *gmem, KVM_MMU_UNLOCK(kvm); } +static void kvm_gmem_hugetlb_truncate_range(struct inode *inode, + loff_t offset, loff_t len) +{ + loff_t hsize; + loff_t full_hpage_start; + loff_t full_hpage_end; + struct kvm_gmem *gmem; + struct hstate *h; + struct address_space *mapping; + + mapping = inode->i_mapping; + gmem = mapping->private_data; + h = gmem->hugetlb.h; + hsize = huge_page_size(h); + full_hpage_start = round_up(offset, hsize); + full_hpage_end = round_down(offset + len, hsize); + + /* If range starts before first full page, zero partial page. */ + if (offset < full_hpage_start) { + hugetlb_zero_partial_page( + h, mapping, offset, min(offset + len, full_hpage_start)); + } + + /* Remove full pages from the file. */ + if (full_hpage_end > full_hpage_start) { + remove_mapping_hugepages(mapping, h, gmem->hugetlb.spool, + gmem->hugetlb.resv_map, inode, + full_hpage_start, full_hpage_end); + } + + + /* If range extends beyond last full page, zero partial page. */ + if ((offset + len) > full_hpage_end && (offset + len) > full_hpage_start) { + hugetlb_zero_partial_page( + h, mapping, full_hpage_end, offset + len); + } +} + static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len) { struct kvm_gmem *gmem = file->private_data; @@ -171,7 +318,10 @@ static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len) kvm_gmem_invalidate_begin(kvm, gmem, start, end); - truncate_inode_pages_range(file->f_mapping, offset, offset + len - 1); + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) + kvm_gmem_hugetlb_truncate_range(file_inode(file), offset, len); + else + truncate_inode_pages_range(file->f_mapping, offset, offset + len - 1); kvm_gmem_invalidate_end(kvm, gmem, start, end); @@ -183,6 +333,7 @@ static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len) static long kvm_gmem_allocate(struct file *file, loff_t offset, loff_t len) { struct address_space *mapping = file->f_mapping; + struct kvm_gmem *gmem = file->private_data; pgoff_t start, index, end; int r; @@ -192,9 +343,14 @@ static long kvm_gmem_allocate(struct file *file, loff_t offset, loff_t len) filemap_invalidate_lock_shared(mapping); - start = offset >> PAGE_SHIFT; - /* Align so that at least 1 page is allocated */ - end = ALIGN(offset + len, PAGE_SIZE) >> PAGE_SHIFT; + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) { + start = offset >> huge_page_shift(gmem->hugetlb.h); + end = ALIGN(offset + len, huge_page_size(gmem->hugetlb.h)) >> PAGE_SHIFT; + } else { + start = offset >> PAGE_SHIFT; + /* Align so that at least 1 page is allocated */ + end = ALIGN(offset + len, PAGE_SIZE) >> PAGE_SHIFT; + } r = 0; for (index = start; index < end; ) { @@ -211,7 +367,7 @@ static long kvm_gmem_allocate(struct file *file, loff_t offset, loff_t len) break; } - index = folio_next_index(folio); + index += folio_nr_pages(folio); folio_unlock(folio); folio_put(folio); @@ -625,7 +781,12 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, return -ENOMEM; } - page = folio_file_page(folio, index); + /* + * folio_file_page() always returns the head page for hugetlb + * folios. Reimplement to get the page within this folio, even for + * hugetlb pages. + */ + page = folio_page(folio, index & (folio_nr_pages(folio) - 1)); *pfn = page_to_pfn(page); *order = thp_order(compound_head(page)); From patchwork Tue Jun 6 19:04:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269618 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B2EB3C77B73 for ; Tue, 6 Jun 2023 19:06:41 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bz8-0001Sx-1p; Tue, 06 Jun 2023 15:04:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3TIN_ZAsKCnsZbjdqkdxsmffnnfkd.bnlpdlt-cdudkmnmfmt.nqf@flex--ackerleytng.bounces.google.com>) id 1q6bz6-0001S3-Cm for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:48 -0400 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3TIN_ZAsKCnsZbjdqkdxsmffnnfkd.bnlpdlt-cdudkmnmfmt.nqf@flex--ackerleytng.bounces.google.com>) id 1q6bz3-0001vd-Qj for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:48 -0400 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-ba8337ade1cso10279202276.2 for ; Tue, 06 Jun 2023 12:04:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078284; x=1688670284; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xVPr1dIpnxklN0GMh0WV40gxjr+XB/duCXqFm/4Eedk=; b=OpWI8RPhFgIw5MoV+RS2cmQpFpGPXsQqq6sU1K61/owOmBB1Ql0+w7FJ3nZldnf8p9 +qYBb0NtRqwQu8dWDMwOUTG0ATZVJvom27ARS7xgWW5TfboiScGSMfZ8N45qzVKLAY2H 4CU726nNQYrYfpWC/qzLp0r9J8us1SCC2lEqZrYJr9OdHyFq/TUwds9iZNdlTcobAHRr ZZALUwW7mEujbc7yxykj2/g4xPWE/5QaSH6KTAlpAQ02/2PJYX27kEua6tRoLiCjwPDV pyCH4FFtF/i8mbhld01B2RuTsSf6teTynYWziBlTcigs4W2oRa0ZqtYf/T+WFi+IMehY DIyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078284; x=1688670284; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xVPr1dIpnxklN0GMh0WV40gxjr+XB/duCXqFm/4Eedk=; b=cGQYkNGPfAvgOezWmSPTT3cosUhp8S5cm6tpMmyvfyL5K0fBgyApJR+vdyTle0FWlh HBqod58WntLLAPDltuDjegZFVW16A8lViNTjQK4oqxsagPEzsLdHtCOnCpJUqhAObYlZ RO2n+hutiwHXqoaclFRB4AxIzlxHZQ91W6YbmdcYYjrZeFpqdysEiuojs6k+2jwobFUc 2Pq1FRBCjrZOb/0Q0IdGhLwlVT0HkmcmxKlPpmfP+TgG2DQkGtbWwjKI/Z4CqQYtxFC8 52uEQOjcCjMGCAtBwTUI612bAvxGkgI5QW16pH+QbKE9dQVStWRg2ovvhv8zy+vo0K/n X+/A== X-Gm-Message-State: AC+VfDwPXYHT0CXUn3AE7JoxkixfPgrC1dX5OZqaEnsHBOgPwESQ6zqA b4AXlRy+OM/mw6N9AnNutpypZtv1EK5fsTeP/A== X-Google-Smtp-Source: ACHHUZ6iw5MaJvx6Biay/guXDPe54wqyOEjmtoFNQ8eDgpYM8NsdWofxvbuGdqaNO+FVNrUBL1ysZvJNR4qrxfEOBQ== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:72d:b0:ba1:d0:7f7c with SMTP id l13-20020a056902072d00b00ba100d07f7cmr1128681ybt.2.1686078284464; Tue, 06 Jun 2023 12:04:44 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:02 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <2b26bcc8b10f8a11e6405d4cea5f1235e82e83c9.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 17/19] KVM: selftests: Add basic selftests for hugetlbfs-backed guest_mem From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::b4a; envelope-from=3TIN_ZAsKCnsZbjdqkdxsmffnnfkd.bnlpdlt-cdudkmnmfmt.nqf@flex--ackerleytng.bounces.google.com; helo=mail-yb1-xb4a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Add tests for 2MB and 1GB page sizes. Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/guest_memfd_test.c | 33 ++++++++++++++----- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index 059b33cdecec..6e24631119c6 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -90,20 +90,14 @@ static void test_fallocate(int fd, size_t page_size, size_t total_size) TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed"); } - -int main(int argc, char *argv[]) +void test_guest_mem(struct kvm_vm *vm, uint32_t flags, size_t page_size) { - size_t page_size; - size_t total_size; int fd; - struct kvm_vm *vm; + size_t total_size; - page_size = getpagesize(); total_size = page_size * 4; - vm = vm_create_barebones(); - - fd = vm_create_guest_memfd(vm, total_size, 0); + fd = vm_create_guest_memfd(vm, total_size, flags); test_file_read_write(fd); test_mmap(fd, page_size); @@ -112,3 +106,24 @@ int main(int argc, char *argv[]) close(fd); } + +int main(int argc, char *argv[]) +{ + struct kvm_vm *vm = vm_create_barebones(); + + printf("Test guest mem 4K\n"); + test_guest_mem(vm, 0, getpagesize()); + printf(" PASSED\n"); + + printf("Test guest mem hugetlb 2M\n"); + test_guest_mem( + vm, KVM_GUEST_MEMFD_HUGETLB | KVM_GUEST_MEMFD_HUGE_2MB, 2UL << 20); + printf(" PASSED\n"); + + printf("Test guest mem hugetlb 1G\n"); + test_guest_mem( + vm, KVM_GUEST_MEMFD_HUGETLB | KVM_GUEST_MEMFD_HUGE_1GB, 1UL << 30); + printf(" PASSED\n"); + + return 0; +} From patchwork Tue Jun 6 19:04:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269572 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 786F9C77B73 for ; Tue, 6 Jun 2023 19:05:33 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bzD-0001Ty-51; Tue, 06 Jun 2023 15:04:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3ToN_ZAsKCn0bdlfsmfzuohhpphmf.dpnrfnv-efwfmopohov.psh@flex--ackerleytng.bounces.google.com>) id 1q6bz9-0001TI-Au for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:51 -0400 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3ToN_ZAsKCn0bdlfsmfzuohhpphmf.dpnrfnv-efwfmopohov.psh@flex--ackerleytng.bounces.google.com>) id 1q6bz7-0002Cl-Kf for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:51 -0400 Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bacfa4eefcbso13845349276.1 for ; Tue, 06 Jun 2023 12:04:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078286; x=1688670286; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=o3QY+GjjNY/B/LKbvEsBmmion/NMqJeuEYcwlsx1w+c=; b=ga7yY2wFSjsTCtNYT2NMM1yjmokiEXDaCTuXNhL4QFIjeFOGWXP9m4xRNkNarZKOBt me9uyDk1llgo74NsZn/+85xOGgk1JymFHYuOx2vpupSEs8KBznt7C4GK7pW1At3io3MV omuLB1jzJaLc4Zg0yimBb4CzacnWix8fdVSxMBPt8YeYd8UktVq4LjjSoaTbCwhO+eXy EZa5FIg//RusZW7f2NgtTOFoqGNu6khof2qQQ8vk0r6gz2il2XRm/VN/dp/gxWzndj0K LOfFsFQnLTJrS+CT3R61MXxoPNTsaAUuUv0u7TZgtSvuJTPRds1QGGcjrMoDqzJSrIHy 9Thg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078286; x=1688670286; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=o3QY+GjjNY/B/LKbvEsBmmion/NMqJeuEYcwlsx1w+c=; b=R93d5g6f5iAKyqpF8Klp65t+rm0lmwjnPW0CsnwbcR3tBhwOsOk/0BZGbM0PzV7EYQ OYUZQzDti9eJvT728XdFGoVkTIWYF3Iu+ONN0nGk7CqpipH0qMuukITfnTI358BUY5gf n5sJI85pMj6IGY6pWTy5z/euXlck7cMFSZF5weE5VFHj9/PTGo2OuO6tqV9a5uF306KO iDhTSsgqtLcWIdraHYS2kvJuUTv5AozuXoIHAeCnj2NXZLfiqPMOShBU8wsu4sMrpu/x 35iRzkaB/Ww6N8TAiCwi55+xd4ayb4LcMTtFsEeLMf8E/rey13GaG50hDSnfNjlH7Z8v SpyQ== X-Gm-Message-State: AC+VfDzx+f8mfJ6wNviDLTL20W4mvTwFLJ1wd+Nhw63BZQ3JI3R9CPUj e/rtJ0c+3vVjzixx6Bjskvc15/tLBYivafR06g== X-Google-Smtp-Source: ACHHUZ7zyWkKPX7nW+eqyXqGuqnjfZMxWtQ7sqNr3g3ix//f4WlCF2yUeIoJ+N8uNLRa6EBYQO+4YPcXyNwMiHRgqA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a25:aea3:0:b0:bb3:9b99:f3f5 with SMTP id b35-20020a25aea3000000b00bb39b99f3f5mr1433391ybj.4.1686078286504; Tue, 06 Jun 2023 12:04:46 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:03 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <5f0d27ce06c03761974264bd8a890614ea7ecb32.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 18/19] KVM: selftests: Support various types of backing sources for private memory From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::b4a; envelope-from=3ToN_ZAsKCn0bdlfsmfzuohhpphmf.dpnrfnv-efwfmopohov.psh@flex--ackerleytng.bounces.google.com; helo=mail-yb1-xb4a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Adds support for various type of backing sources for private memory (in the sense of confidential computing), similar to the backing sources available for shared memory. Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/include/test_util.h | 14 ++++ tools/testing/selftests/kvm/lib/test_util.c | 74 +++++++++++++++++++ 2 files changed, 88 insertions(+) diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index a6e9f215ce70..899ea15ca8a9 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -122,6 +122,16 @@ struct vm_mem_backing_src_alias { uint32_t flag; }; +enum vm_pmem_backing_src_type { + VM_PMEM_SRC_GMEM, + VM_PMEM_SRC_HUGETLB, /* Use kernel default page size for hugetlb pages */ + VM_PMEM_SRC_HUGETLB_2MB, + VM_PMEM_SRC_HUGETLB_1GB, + NUM_PMEM_SRC_TYPES, +}; + +#define DEFAULT_VM_PMEM_SRC VM_PMEM_SRC_GMEM + #define MIN_RUN_DELAY_NS 200000UL bool thp_configured(void); @@ -132,6 +142,10 @@ size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); +void pmem_backing_src_help(const char *flag); +enum vm_pmem_backing_src_type parse_pmem_backing_src_type(const char *type_name); +const struct vm_mem_backing_src_alias *vm_pmem_backing_src_alias(uint32_t i); +size_t get_pmem_backing_src_pagesz(uint32_t i); long get_run_delay(void); /* diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index b772193f6c18..62efb7b8ba51 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -287,6 +288,34 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) return &aliases[i]; } +const struct vm_mem_backing_src_alias *vm_pmem_backing_src_alias(uint32_t i) +{ + static const struct vm_mem_backing_src_alias aliases[] = { + [VM_PMEM_SRC_GMEM] = { + .name = "pmem_gmem", + .flag = 0, + }, + [VM_PMEM_SRC_HUGETLB] = { + .name = "pmem_hugetlb", + .flag = KVM_GUEST_MEMFD_HUGETLB, + }, + [VM_PMEM_SRC_HUGETLB_2MB] = { + .name = "pmem_hugetlb_2mb", + .flag = KVM_GUEST_MEMFD_HUGETLB | KVM_GUEST_MEMFD_HUGE_2MB, + }, + [VM_PMEM_SRC_HUGETLB_1GB] = { + .name = "pmem_hugetlb_1gb", + .flag = KVM_GUEST_MEMFD_HUGETLB | KVM_GUEST_MEMFD_HUGE_1GB, + }, + }; + _Static_assert(ARRAY_SIZE(aliases) == NUM_PMEM_SRC_TYPES, + "Missing new backing private mem src types?"); + + TEST_ASSERT(i < NUM_PMEM_SRC_TYPES, "Private mem backing src type ID %d too big", i); + + return &aliases[i]; +} + #define MAP_HUGE_PAGE_SIZE(x) (1ULL << ((x >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK)) size_t get_backing_src_pagesz(uint32_t i) @@ -307,6 +336,20 @@ size_t get_backing_src_pagesz(uint32_t i) } } +size_t get_pmem_backing_src_pagesz(uint32_t i) +{ + uint32_t flag = vm_pmem_backing_src_alias(i)->flag; + + switch (i) { + case VM_PMEM_SRC_GMEM: + return getpagesize(); + case VM_PMEM_SRC_HUGETLB: + return get_def_hugetlb_pagesz(); + default: + return MAP_HUGE_PAGE_SIZE(flag); + } +} + bool is_backing_src_hugetlb(uint32_t i) { return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); @@ -343,6 +386,37 @@ enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name) return -1; } +static void print_available_pmem_backing_src_types(const char *prefix) +{ + int i; + + printf("%sAvailable private mem backing src types:\n", prefix); + + for (i = 0; i < NUM_PMEM_SRC_TYPES; i++) + printf("%s %s\n", prefix, vm_pmem_backing_src_alias(i)->name); +} + +void pmem_backing_src_help(const char *flag) +{ + printf(" %s: specify the type of memory that should be used to\n" + " back guest private memory. (default: %s)\n", + flag, vm_pmem_backing_src_alias(DEFAULT_VM_MEM_SRC)->name); + print_available_pmem_backing_src_types(" "); +} + +enum vm_pmem_backing_src_type parse_pmem_backing_src_type(const char *type_name) +{ + int i; + + for (i = 0; i < NUM_SRC_TYPES; i++) + if (!strcmp(type_name, vm_pmem_backing_src_alias(i)->name)) + return i; + + print_available_pmem_backing_src_types(""); + TEST_FAIL("Unknown private mem backing src type: %s", type_name); + return -1; +} + long get_run_delay(void) { char path[64]; From patchwork Tue Jun 6 19:04:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269571 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1CBFAC7EE43 for ; Tue, 6 Jun 2023 19:05:13 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bzG-0001YH-TV; Tue, 06 Jun 2023 15:05:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3UIN_ZAsKCn8dfnhuoh1wqjjrrjoh.frpthpx-ghyhoqrqjqx.ruj@flex--ackerleytng.bounces.google.com>) id 1q6bzA-0001Tn-S7 for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:52 -0400 Received: from mail-pf1-x44a.google.com ([2607:f8b0:4864:20::44a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3UIN_ZAsKCn8dfnhuoh1wqjjrrjoh.frpthpx-ghyhoqrqjqx.ruj@flex--ackerleytng.bounces.google.com>) id 1q6bz8-0002Db-5n for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:52 -0400 Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-653843401d5so4464848b3a.3 for ; Tue, 06 Jun 2023 12:04:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078288; x=1688670288; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CCEiTTKrsebxDoXfSRZP/CN+WTuYdz3sOeWxyKUUyPY=; b=KFM+d/JKzgYvN5OR6kU44o+ZYxe7Q/H7MNTOnP/z0cxW3//uRTJeZxZ8dFp8Jit4Gf YquLlWVy9xCdFnBGp+gPFhLtGo2w3F50cPqXWf8JFYlZ/OGxawmSMJJ/jIE184rG6FHp ixTL8P/2ovgE/QdCx/NLmyiW9xL5hbY3itkmDOuPi4r2D/Bp6z5cypKETkTA5IN3CsDs 1vhuCHFDhZuON8z8NNnBNoq7He9ZNsDe7Xmjzq9wK+9M2GgIoXw9C4JhNu7KegBFAMS3 MULic01j2ol2uU6vWbv87Xaf10SomC0FL9lq9TfMGxJn5DnDmpEo+KEOdrVPsgYMpseZ BMRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078288; x=1688670288; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CCEiTTKrsebxDoXfSRZP/CN+WTuYdz3sOeWxyKUUyPY=; b=HAv617qtRx60Ff+QVKHrcFAZoUAYHARndLzZraj9tMuAtxhPaNgtevex71JRsftJDx doTU5nug+l4Un4tDx2vSL6Y6jXWYS/2yUo80O2UQFLnFaPSWpTnV0lGxWzLWFnQHMmut XdaZ0Mj44bbakoHNR3q1YRoGqb1bftm619Mu1ItIInVlNrRtviHJArYaez2OiI4FIxal bimQGQ3RUT0RvnTNQ0Q1OtJUSPBkERabb3gMfUnFODWb1qA918My33VVcg3cdjmAcdJu JneqlbOD7LitQ+2y1ctOVGwJrtqAAVKHstBY+O8K03kYOZXHL5Pxl1agsJwa1BantzdX Z/jw== X-Gm-Message-State: AC+VfDwC/i1ydGftcFYmUXZRneMK3cDMM3Iyx6el48cgJuWsTS3MJ706 VWA70HEAoleowPrb0S376c9EOPcBw7JY0zvSnw== X-Google-Smtp-Source: ACHHUZ5UZgVYlUWSYfljNriuMM/SFMypamtYGG6ajWbl9pU1XKFK+BBOu36l6aAPASb6U7Gvlg5sOoRl6fqEbUEr5Q== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6a00:13a6:b0:657:f26e:b025 with SMTP id t38-20020a056a0013a600b00657f26eb025mr1318192pfg.6.1686078288430; Tue, 06 Jun 2023 12:04:48 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:04 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <3ae2d02c45c5fb91b490b1674165c733efb871d6.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 19/19] KVM: selftests: Update test for various private memory backing source types From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass client-ip=2607:f8b0:4864:20::44a; envelope-from=3UIN_ZAsKCn8dfnhuoh1wqjjrrjoh.frpthpx-ghyhoqrqjqx.ruj@flex--ackerleytng.bounces.google.com; helo=mail-pf1-x44a.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Update private_mem_conversions_test for various private memory backing source types Signed-off-by: Ackerley Tng --- .../kvm/x86_64/private_mem_conversions_test.c | 38 ++++++++++++++----- 1 file changed, 28 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c index 6a353cf64f52..27a7e5099b7b 100644 --- a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c +++ b/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c @@ -240,14 +240,15 @@ static void *__test_mem_conversions(void *__vcpu) } } -static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t nr_vcpus, - uint32_t nr_memslots) +static void test_mem_conversions(enum vm_mem_backing_src_type src_type, + enum vm_pmem_backing_src_type pmem_src_type, + uint32_t nr_vcpus, uint32_t nr_memslots) { - const size_t memfd_size = PER_CPU_DATA_SIZE * nr_vcpus; struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; pthread_t threads[KVM_MAX_VCPUS]; struct kvm_vm *vm; int memfd, i, r; + size_t pmem_aligned_size, memfd_size; size_t test_unit_size; const struct vm_shape shape = { @@ -270,21 +271,32 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t * Allocate enough memory so that each vCPU's chunk of memory can be * naturally aligned with respect to the size of the backing store. */ - test_unit_size = align_up(PER_CPU_DATA_SIZE, get_backing_src_pagesz(src_type)); + test_unit_size = align_up(PER_CPU_DATA_SIZE, + max(get_backing_src_pagesz(src_type), + get_pmem_backing_src_pagesz(pmem_src_type))); } - memfd = vm_create_guest_memfd(vm, memfd_size, 0); + pmem_aligned_size = PER_CPU_DATA_SIZE; + if (nr_memslots > 1) { + pmem_aligned_size = align_up(PER_CPU_DATA_SIZE, + get_pmem_backing_src_pagesz(pmem_src_type)); + } + + memfd_size = pmem_aligned_size * nr_vcpus; + memfd = vm_create_guest_memfd(vm, memfd_size, + vm_pmem_backing_src_alias(pmem_src_type)->flag); for (i = 0; i < nr_memslots; i++) { uint64_t gpa = BASE_DATA_GPA + i * test_unit_size; - uint64_t npages = PER_CPU_DATA_SIZE / vm->page_size; + uint64_t npages = pmem_aligned_size / vm->page_size; /* Make sure the memslot is large enough for all the test units */ if (nr_memslots == 1) npages *= nr_vcpus; + /* Offsets must be aligned to private mem's page size */ vm_mem_add(vm, src_type, gpa, BASE_DATA_SLOT + i, npages, - KVM_MEM_PRIVATE, memfd, PER_CPU_DATA_SIZE * i); + KVM_MEM_PRIVATE, memfd, pmem_aligned_size * i); } for (i = 0; i < nr_vcpus; i++) { @@ -324,10 +336,12 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t static void usage(const char *cmd) { puts(""); - printf("usage: %s [-h] [-m] [-s mem_type] [-n nr_vcpus]\n", cmd); + printf("usage: %s [-h] [-m] [-s mem_type] [-p pmem_type] [-n nr_vcpus]\n", cmd); puts(""); backing_src_help("-s"); puts(""); + pmem_backing_src_help("-p"); + puts(""); puts(" -n: specify the number of vcpus (default: 1)"); puts(""); puts(" -m: use multiple memslots (default: 1)"); @@ -337,6 +351,7 @@ static void usage(const char *cmd) int main(int argc, char *argv[]) { enum vm_mem_backing_src_type src_type = DEFAULT_VM_MEM_SRC; + enum vm_pmem_backing_src_type pmem_src_type = DEFAULT_VM_PMEM_SRC; bool use_multiple_memslots = false; uint32_t nr_vcpus = 1; uint32_t nr_memslots; @@ -345,11 +360,14 @@ int main(int argc, char *argv[]) TEST_REQUIRE(kvm_has_cap(KVM_CAP_EXIT_HYPERCALL)); TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_PROTECTED_VM)); - while ((opt = getopt(argc, argv, "hms:n:")) != -1) { + while ((opt = getopt(argc, argv, "hms:p:n:")) != -1) { switch (opt) { case 's': src_type = parse_backing_src_type(optarg); break; + case 'p': + pmem_src_type = parse_pmem_backing_src_type(optarg); + break; case 'n': nr_vcpus = atoi_positive("nr_vcpus", optarg); break; @@ -365,7 +383,7 @@ int main(int argc, char *argv[]) nr_memslots = use_multiple_memslots ? nr_vcpus : 1; - test_mem_conversions(src_type, nr_vcpus, nr_memslots); + test_mem_conversions(src_type, pmem_src_type, nr_vcpus, nr_memslots); return 0; }