From patchwork Tue Jun 6 19:03:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269575 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D238C7EE2A for ; Tue, 6 Jun 2023 19:04:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239220AbjFFTEW (ORCPT ); Tue, 6 Jun 2023 15:04:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239201AbjFFTEQ (ORCPT ); Tue, 6 Jun 2023 15:04:16 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BFF010FF for ; Tue, 6 Jun 2023 12:04:15 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bb3855c34deso1972004276.2 for ; Tue, 06 Jun 2023 12:04:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078255; x=1688670255; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=W42CnlEhHXSQD8oU0aesgTIb+2nDDRqlSHCZLxoyAZw=; b=SI6wxzD3jNLZvjIR3ui3oLLkrgjrOnUdyqmkDtNB4LvaXzaOxSopYwQDXGSPUwGt0b d1EHBVyiG4hUUHi15P58bSdZq1hiIfxHNvpvia7Cxzf6d0mYTmKP/FDWWkzVCNVJSia8 WKQpt+ZebghqzxY/2xXY0xK+V+7VuTACGmy8UzpKH/NOajrF02Jvbazs2Qt1cP/5GZi3 Pxw8kkdCQkLaRblrUwFR5NCQzIvEfWgJISfEM1oMrdysWZB/g9FTr60M+vCWXu+AmqCG W3rBiG3Eit8E92Xwv+sO+1jySrdLIuEFp3Em5lURfr02pKqbadaZwr4BvewtCxcGcLfE Bs7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078255; x=1688670255; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W42CnlEhHXSQD8oU0aesgTIb+2nDDRqlSHCZLxoyAZw=; b=aLU1fspruAald/p3tBKUC8ox5jlgqlJqpNsTyo5o8nTGPJGvbLjgdt1A50P+HIUeXk dbb58eVhWyPxZuiWyk5cojsn0p3nydWRuyPNM2I14u3b6UEnf/1QQEsQTfhHF9p3QUyL +vJBN8xOLJl2pDTx47DHZd9RFhCIgUKH32GDU7uLz7Gfz15FZ8gZR+YzRAkyOg2o7dwD YTvjmGp3xwuGYCPUedbfQYM7QmtnngBXIiY+MZEoWkizR2kUipJ/F+0hqLqpBm42NNal p/mWROjls1FS51yYNOum0sd8gVA9g+xLMMVwhOSrYnYl4kT24tvGIa5zuWIlepS8Nxic mW7A== X-Gm-Message-State: AC+VfDww4kpA5xlmCbcInNryD2u6LBaGRk7rtJZ+feByIqdXRzU/NQNQ y69uBzNGk4btP9jfO+GmaKClFxRHZt/pnwtstw== X-Google-Smtp-Source: ACHHUZ535voW+NPlWmi/k960ILK6GzLJuoo4o0Ye5mrBBjql7Wgmyk28/ve76hKFr5oQ8deqfVoTqFa20GrqC6sA9g== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a25:2ce:0:b0:bb3:ac6a:6d61 with SMTP id 197-20020a2502ce000000b00bb3ac6a6d61mr1216938ybc.3.1686078254851; Tue, 06 Jun 2023 12:04:14 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:46 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 01/19] mm: hugetlb: Expose get_hstate_idx() From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Expose get_hstate_idx() so it can be used from KVM's guest_mem code Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 9 --------- include/linux/hugetlb.h | 14 ++++++++++++++ 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 9062da6da567..406d7366cf3e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1560,15 +1560,6 @@ static int can_do_hugetlb_shm(void) return capable(CAP_IPC_LOCK) || in_group_p(shm_group); } -static int get_hstate_idx(int page_size_log) -{ - struct hstate *h = hstate_sizelog(page_size_log); - - if (!h) - return -1; - return hstate_index(h); -} - /* * Note that size should be aligned to proper hugepage size in caller side, * otherwise hugetlb_reserve_pages reserves one less hugepages than intended. diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 7c977d234aba..37c2edf7beea 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -876,6 +876,15 @@ static inline int hstate_index(struct hstate *h) return h - hstates; } +static inline int get_hstate_idx(int page_size_log) +{ + struct hstate *h = hstate_sizelog(page_size_log); + + if (!h) + return -1; + return hstate_index(h); +} + extern int dissolve_free_huge_page(struct page *page); extern int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn); @@ -1142,6 +1151,11 @@ static inline int hstate_index(struct hstate *h) return 0; } +static inline int get_hstate_idx(int page_size_log) +{ + return 0; +} + static inline int dissolve_free_huge_page(struct page *page) { return 0; From patchwork Tue Jun 6 19:03:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269576 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76A35C7EE2C for ; Tue, 6 Jun 2023 19:04:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239259AbjFFTEb (ORCPT ); Tue, 6 Jun 2023 15:04:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239206AbjFFTES (ORCPT ); Tue, 6 Jun 2023 15:04:18 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2051FE70 for ; Tue, 6 Jun 2023 12:04:17 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bb1e332f648so8842929276.0 for ; Tue, 06 Jun 2023 12:04:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078257; x=1688670257; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+hBYwj+hDVdRsN8/1AwobMe3CTJKx3rv/kAyLXc58so=; b=H7E3FiMFKx8wp9Dwc4KoRhYITqBdXBFbeuSdYVU7Xl2quiLV1TI/RBj6a84p3NhH+A 6Ot5kD4xdeERpXhx7UwpPyWytx0ckODJexzs4v23BK+jBmvA/ejV4+lVF1eJPpgHviEu uumCUBgaOBD3ECdp59KweUCJLiBB+HqC6NZrMNuSqXw8B1oiojQQln/v/JBqZ3dwGu23 pfO6fIO6tANLPLyeFwUcf8Qi72Eo5MQO/Xul1ZcTopSBQKuO94fsqUr8V4qbQI+x7jBp uRAq7xDxsM2mm1v+/TKXiQUaFoQhkNFaDB59XiYo7wFgeMK27ohvJb62IAQGNTsPTIVZ PvjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078257; x=1688670257; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+hBYwj+hDVdRsN8/1AwobMe3CTJKx3rv/kAyLXc58so=; b=gzU1ow3+yM2UFjlbaNDvvb3Lh5UcNO3TE5szbRWaPiGLQboX0A92koBWohD14VHEd4 /InZU20aS9ATeZ0cNHehYq1up/wVeilEHPi39x40tbHwkUi1bw+qxN54J/bZg4dgBUPK 33XLvsbror5SSsIWaamF+Q7ZKS4gPLOT/4VdboDzE3O+JhYcPiLuO28f3SpDgyxEANk9 A3StTTLCPUxSumW0Z+ey82Tt+xHEcWzOdq7wp/OPAhwETAERtG8AOD5P0bCnz/uggWIz FzAVp7kl0BqFO3zK2lhSnGZPhCjHidFQKWcYFefFQ0+BJ6kzdrjmwrRPjFgK8HPb2q5Z Os9A== X-Gm-Message-State: AC+VfDyHbvnrW9H2sXc5ykbTrOsNulG+xEEDe6eA42lN1ynzQgkSahyr jcQE8YjDUEPTQSQVEnNpGnIn5P1dwIJzgy+G2A== X-Google-Smtp-Source: ACHHUZ68tuaeB7ur+Gj0U5ZR5IxxzkEh+TRxbplImu5DoiCt68fH6drAzpWUcd9Di+kjyJGUd+Eua/GN38BmucMMtA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:1882:b0:ba8:337a:d8a3 with SMTP id cj2-20020a056902188200b00ba8337ad8a3mr1661367ybb.11.1686078256773; Tue, 06 Jun 2023 12:04:16 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:47 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <97c312c8c0b56218454d546a540a3ea2e2a825e2.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 02/19] mm: hugetlb: Move and expose hugetlbfs_zero_partial_page From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Zeroing of pages is generalizable to hugetlb and is not specific to hugetlbfs. Rename hugetlbfs_zero_partial_page => hugetlb_zero_partial_page, move it to mm/hugetlb.c and expose it in linux/hugetlb.h. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 27 ++------------------------- include/linux/hugetlb.h | 6 ++++++ mm/hugetlb.c | 22 ++++++++++++++++++++++ 3 files changed, 30 insertions(+), 25 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 406d7366cf3e..3dab50d3ed88 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -688,29 +688,6 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) remove_inode_hugepages(inode, offset, LLONG_MAX); } -static void hugetlbfs_zero_partial_page(struct hstate *h, - struct address_space *mapping, - loff_t start, - loff_t end) -{ - pgoff_t idx = start >> huge_page_shift(h); - struct folio *folio; - - folio = filemap_lock_folio(mapping, idx); - if (!folio) - return; - - start = start & ~huge_page_mask(h); - end = end & ~huge_page_mask(h); - if (!end) - end = huge_page_size(h); - - folio_zero_segment(folio, (size_t)start, (size_t)end); - - folio_unlock(folio); - folio_put(folio); -} - static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) { struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode); @@ -737,7 +714,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) /* If range starts before first full page, zero partial page. */ if (offset < hole_start) - hugetlbfs_zero_partial_page(h, mapping, + hugetlb_zero_partial_page(h, mapping, offset, min(offset + len, hole_start)); /* Unmap users of full pages in the hole. */ @@ -750,7 +727,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) /* If range extends beyond last full page, zero partial page. */ if ((offset + len) > hole_end && (offset + len) > hole_start) - hugetlbfs_zero_partial_page(h, mapping, + hugetlb_zero_partial_page(h, mapping, hole_end, offset + len); i_mmap_unlock_write(mapping); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 37c2edf7beea..023293ceec25 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -256,6 +256,9 @@ long hugetlb_change_protection(struct vm_area_struct *vma, bool is_hugetlb_entry_migration(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); +void hugetlb_zero_partial_page(struct hstate *h, struct address_space *mapping, + loff_t start, loff_t end); + #else /* !CONFIG_HUGETLB_PAGE */ static inline void hugetlb_dup_vma_private(struct vm_area_struct *vma) @@ -464,6 +467,9 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm, static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { } +static inline void hugetlb_zero_partial_page( + struct hstate *h, struct address_space *mapping, loff_t start, loff_t end) {} + #endif /* !CONFIG_HUGETLB_PAGE */ /* * hugepages at page global directory. If arch support diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07abcb6eb203..9c9262833b4f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7407,6 +7407,28 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) ALIGN_DOWN(vma->vm_end, PUD_SIZE)); } +void hugetlb_zero_partial_page(struct hstate *h, + struct address_space *mapping, + loff_t start, loff_t end) +{ + pgoff_t idx = start >> huge_page_shift(h); + struct folio *folio; + + folio = filemap_lock_folio(mapping, idx); + if (!folio) + return; + + start = start & ~huge_page_mask(h); + end = end & ~huge_page_mask(h); + if (!end) + end = huge_page_size(h); + + folio_zero_segment(folio, (size_t)start, (size_t)end); + + folio_unlock(folio); + folio_put(folio); +} + #ifdef CONFIG_CMA static bool cma_reserve_called __initdata; From patchwork Tue Jun 6 19:03:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269577 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44110C7EE2A for ; Tue, 6 Jun 2023 19:04:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239286AbjFFTEd (ORCPT ); Tue, 6 Jun 2023 15:04:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239219AbjFFTEV (ORCPT ); Tue, 6 Jun 2023 15:04:21 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A5D310FE for ; Tue, 6 Jun 2023 12:04:19 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-56561689700so105699757b3.2 for ; Tue, 06 Jun 2023 12:04:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078258; x=1688670258; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4awdlrIAeNpwHVWeIID1BUVh8RCTYusTWiK5XfdHg+o=; b=EM6Va8BzGwCED0zIh64Bnjix1yyjlAzWk5b48JtiXuhP31W1mbMccgQlZPFYy2Rc8v tMovnZrmg6i3waIkF808VpF/dEtWacRQ+yv83fbRyzUMvkWnAQduvEvPNXK1sSt1N2Cy PtYuLHfMT8qcUe5nsKygK7f4ppJDBYrYq77vKTy6tbx4tH0TKa3ANgH0owgHfn5b1Tx1 OI+RAo/8et/7ti+hO4SSDEjjeJCABpVM+ohklHFBBqTgC2zed5wK6sgfxTCj5lcV8B7r QzmY/WCfEZ0p/5xV7lrrRx2FQaR4Eg7AC1b0TCHXbKQd5gKgaSLxDQ+6klKWy+rQ6YtV ePmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078258; x=1688670258; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4awdlrIAeNpwHVWeIID1BUVh8RCTYusTWiK5XfdHg+o=; b=BDmphzuZlL0F1BIeA6iPCMpc8g2G9j+44avieoMXnodMqXaUyJjiuQRLWtUD3GKVia yKQoRffgPQzawZJqZNsbXD5diWAAVrcXcD3y0Y1EUxX3fLstRcmt/K/02iM+u2KzKNL2 rA/unkBcMx8f1f0hmiLOtL4k6hGyzk3w2PrZHSfXKOd2ox4XDEQeZwaQblGODOMxSADs a5a74WJurCaXYCsz+Clz0zAS2lMTzbKIzXxWjx1XLq/3Cg81rSv5M6HDRnYHswrHaLAy +KhjOCpVFZL8sDhX2XMDQspCp9hd//k6xsE3muO4JfoXvH4beNN2S7yBOmDQewreK4Du IL7w== X-Gm-Message-State: AC+VfDxlZHovQS6X/dYo6UGqwfXTvFPp0dDvYh6QRksntsbloTAdjp42 305YYdYArSeTjcFQYTe8c3xreAIntuelRaP0gg== X-Google-Smtp-Source: ACHHUZ7qnoxO8i1wmNBAi7fatM+eI5rwKFB4S/AWZNBrdb9BhmqSWgFoaD3eNX27zoBKSk99WIOAutKPR4Hz0jbmxQ== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:10c8:b0:ba8:797c:9bc7 with SMTP id w8-20020a05690210c800b00ba8797c9bc7mr1725260ybu.11.1686078258501; Tue, 06 Jun 2023 12:04:18 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:48 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <0ae157ec9e196f353ecf9036dbffdc295c994817.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 03/19] mm: hugetlb: Expose remove_inode_hugepages From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org TODO may want to move this to hugetlb Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 3 +-- include/linux/hugetlb.h | 4 ++++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3dab50d3ed88..4f25df31ae80 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -611,8 +611,7 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. */ -static void remove_inode_hugepages(struct inode *inode, loff_t lstart, - loff_t lend) +void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) { struct hstate *h = hstate_inode(inode); struct address_space *mapping = &inode->i_data; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 023293ceec25..1483020b412b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -259,6 +259,8 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); void hugetlb_zero_partial_page(struct hstate *h, struct address_space *mapping, loff_t start, loff_t end); +void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend); + #else /* !CONFIG_HUGETLB_PAGE */ static inline void hugetlb_dup_vma_private(struct vm_area_struct *vma) @@ -470,6 +472,8 @@ static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { } static inline void hugetlb_zero_partial_page( struct hstate *h, struct address_space *mapping, loff_t start, loff_t end) {} +static inline void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) {} + #endif /* !CONFIG_HUGETLB_PAGE */ /* * hugepages at page global directory. If arch support From patchwork Tue Jun 6 19:03:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269580 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9176AC7EE43 for ; Tue, 6 Jun 2023 19:05:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239008AbjFFTFC (ORCPT ); Tue, 6 Jun 2023 15:05:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239232AbjFFTEa (ORCPT ); Tue, 6 Jun 2023 15:04:30 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CF5E1706 for ; Tue, 6 Jun 2023 12:04:21 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bb2a7308f21so5264235276.2 for ; Tue, 06 Jun 2023 12:04:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078260; x=1688670260; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EVsoBqdtoJX8/StMCoaJ2TBhLAuQb0ZIsYKuvX8foFc=; b=3bRhDOn2F1YoY78ybE5Buf9pXWlEWLIrZLl3zmMqCMxMWePwOymNgzPlL05No/cBty JhXR5aZftY97cDZvNkSb70a55Fkj8oothgT0n+qKkVwOjj7CYzQnLwO2wmEIMdOIIq24 jiWbdBDp6sWsJWroyTU36/XenlT2tddUva4/zzFiwtqHgjy72QN5YVUYSt+vJDoI3IyN VcmHCICw1jUnn1dcw1Z79WvsPN60OZDQjPZspnctaG3vX9a9apYfWKKt97pGmH1ZZWZ+ n8NTZLQKSqtuCbrmmo44T26y4ZuXSivXoMS2qmsA9OyppL+7KUrgyC0TXXVY+vLySr9n vPOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078260; x=1688670260; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EVsoBqdtoJX8/StMCoaJ2TBhLAuQb0ZIsYKuvX8foFc=; b=FS+RMPlIjtIUVU6D8CazzYLrmEh/zabUGpOp2hW3cITfZcC/ZkU3W187ViAYQhwV0u nM1femaEm9hg5aOXUg6utCBfh7/kIJ60nXa04pqYFSEZpyNFRiNIpRs7mZOScOEkWSDx +4IVy3gfP3X+WzumGzVb/yccGScoDav04skM4ueEJGG6e/J03gBc63zPMEOZSZCDps8h 0ZnazynuYZ7GiEKhoP9Z40MQQ+5dTca30LNElLj9nYcX99S7nVGZcbwDV4NbDQtF/jD8 77iuGZGGfGA5spAZV5vX06mXPD6HjP8DPy/EGoLDCYFnVSjRWKWHMYArcVRJjLtDWu7h K8+g== X-Gm-Message-State: AC+VfDwujW0Xu2dNYt0G4aV7SqQUNDgz+/rRJfTQrBF5lkyvie3hBvGh qpXCA5+NuabM66JX/wUMO36cDe3sz7TzpsrVaQ== X-Google-Smtp-Source: ACHHUZ7pHc1WWdEMTu1ZaLAHhToRSasMGs58ikGHS3ezVoQuVBFPIMzugkY1x4qddS4xfRUOsuXb06aDGCKa9mnCdw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:124f:b0:ba8:2e68:7715 with SMTP id t15-20020a056902124f00b00ba82e687715mr1639977ybu.2.1686078260457; Tue, 06 Jun 2023 12:04:20 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:49 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 04/19] mm: hugetlb: Decouple hstate, subpool from inode From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org hstate and subpool being retrievable from inode via hstate_inode() and subpool_inode() respectively is a hugetlbfs concept. hugetlb should be agnostic of hugetlbfs and hugetlb accounting functions should accept hstate (required) and subpool (can be NULL) independently of inode. inode is still a parameter for these accounting functions since the inode's block counts need to be updated during accounting. The inode's resv_map will also still need to be updated if not NULL. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 59 ++++++++++++++++++++++++++++------------- include/linux/hugetlb.h | 32 +++++++++++++++++----- mm/hugetlb.c | 49 ++++++++++++++++++++-------------- 3 files changed, 95 insertions(+), 45 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 4f25df31ae80..0fc49b6252e4 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -164,7 +164,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) file_accessed(file); ret = -ENOMEM; - if (!hugetlb_reserve_pages(inode, + if (!hugetlb_reserve_pages(h, subpool_inode(inode), inode, vma->vm_pgoff >> huge_page_order(h), len >> huge_page_shift(h), vma, vma->vm_flags)) @@ -550,14 +550,18 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, } } -/* +/** + * Remove folio from page_cache and userspace mappings. Also unreserves pages, + * updating hstate @h, subpool @spool (if not NULL), @inode block info and + * @inode's resv_map (if not NULL). + * * Called with hugetlb fault mutex held. * Returns true if page was actually removed, false otherwise. */ -static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, - struct address_space *mapping, - struct folio *folio, pgoff_t index, - bool truncate_op) +static bool remove_mapping_single_folio( + struct address_space *mapping, struct folio *folio, pgoff_t index, + struct hstate *h, struct hugepage_subpool *spool, struct inode *inode, + bool truncate_op) { bool ret = false; @@ -582,9 +586,8 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, hugetlb_delete_from_page_cache(folio); ret = true; if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, index, - index + 1, 1))) - hugetlb_fix_reserve_counts(inode); + if (unlikely(hugetlb_unreserve_pages(h, spool, inode, index, index + 1, 1))) + hugetlb_fix_reserve_counts(h, spool); } folio_unlock(folio); @@ -592,7 +595,14 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, } /* - * remove_inode_hugepages handles two distinct cases: truncation and hole + * Remove hugetlb page mappings from @mapping between offsets [@lstart, @lend). + * Also updates reservations in: + * + hstate @h (required) + * + subpool @spool (can be NULL) + * + resv_map in @inode (can be NULL) + * and updates blocks in @inode (required) + * + * remove_mapping_hugepages handles two distinct cases: truncation and hole * punch. There are subtle differences in operation for each case. * * truncation is indicated by end of range being LLONG_MAX @@ -611,10 +621,10 @@ static bool remove_inode_single_folio(struct hstate *h, struct inode *inode, * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. */ -void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) +void remove_mapping_hugepages(struct address_space *mapping, + struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, loff_t lstart, loff_t lend) { - struct hstate *h = hstate_inode(inode); - struct address_space *mapping = &inode->i_data; const pgoff_t start = lstart >> huge_page_shift(h); const pgoff_t end = lend >> huge_page_shift(h); struct folio_batch fbatch; @@ -636,8 +646,8 @@ void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) /* * Remove folio that was part of folio_batch. */ - if (remove_inode_single_folio(h, inode, mapping, folio, - index, truncate_op)) + if (remove_mapping_single_folio(mapping, folio, index, + h, spool, inode, truncate_op)) freed++; mutex_unlock(&hugetlb_fault_mutex_table[hash]); @@ -647,7 +657,16 @@ void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) } if (truncate_op) - (void)hugetlb_unreserve_pages(inode, start, LONG_MAX, freed); + (void)hugetlb_unreserve_pages(h, spool, inode, start, LONG_MAX, freed); +} + +void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) +{ + struct address_space *mapping = &inode->i_data; + struct hstate *h = hstate_inode(inode); + struct hugepage_subpool *spool = subpool_inode(inode); + + return remove_mapping_hugepages(mapping, h, spool, inode, lstart, lend); } static void hugetlbfs_evict_inode(struct inode *inode) @@ -1548,6 +1567,7 @@ struct file *hugetlb_file_setup(const char *name, size_t size, struct vfsmount *mnt; int hstate_idx; struct file *file; + struct hstate *h; hstate_idx = get_hstate_idx(page_size_log); if (hstate_idx < 0) @@ -1578,9 +1598,10 @@ struct file *hugetlb_file_setup(const char *name, size_t size, inode->i_size = size; clear_nlink(inode); - if (!hugetlb_reserve_pages(inode, 0, - size >> huge_page_shift(hstate_inode(inode)), NULL, - acctflag)) + h = hstate_inode(inode); + if (!hugetlb_reserve_pages(h, subpool_inode(inode), inode, 0, + size >> huge_page_shift(h), NULL, + acctflag)) file = ERR_PTR(-ENOMEM); else file = alloc_file_pseudo(inode, mnt, name, O_RDWR, diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 1483020b412b..2457d7a21974 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -166,11 +166,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct page **pagep, bool wp_copy); #endif /* CONFIG_USERFAULTFD */ -bool hugetlb_reserve_pages(struct inode *inode, long from, long to, - struct vm_area_struct *vma, - vm_flags_t vm_flags); -long hugetlb_unreserve_pages(struct inode *inode, long start, long end, - long freed); +bool hugetlb_reserve_pages(struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, + long from, long to, + struct vm_area_struct *vma, + vm_flags_t vm_flags); +long hugetlb_unreserve_pages(struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, long start, long end, long freed); bool isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, @@ -178,7 +180,7 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags, void folio_putback_active_hugetlb(struct folio *folio); void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason); void free_huge_page(struct page *page); -void hugetlb_fix_reserve_counts(struct inode *inode); +void hugetlb_fix_reserve_counts(struct hstate *h, struct hugepage_subpool *spool); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); @@ -259,6 +261,9 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); void hugetlb_zero_partial_page(struct hstate *h, struct address_space *mapping, loff_t start, loff_t end); +void remove_mapping_hugepages(struct address_space *mapping, + struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, loff_t lstart, loff_t lend); void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend); #else /* !CONFIG_HUGETLB_PAGE */ @@ -472,6 +477,9 @@ static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { } static inline void hugetlb_zero_partial_page( struct hstate *h, struct address_space *mapping, loff_t start, loff_t end) {} +static inline void remove_mapping_hugepages( + struct address_space *mapping, struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, loff_t lstart, loff_t lend) {} static inline void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) {} #endif /* !CONFIG_HUGETLB_PAGE */ @@ -554,6 +562,12 @@ static inline struct hstate *hstate_inode(struct inode *i) { return HUGETLBFS_SB(i->i_sb)->hstate; } + +static inline struct hugepage_subpool *subpool_inode(struct inode *inode) +{ + return HUGETLBFS_SB(inode->i_sb)->spool; +} + #else /* !CONFIG_HUGETLBFS */ #define is_file_hugepages(file) false @@ -568,6 +582,12 @@ static inline struct hstate *hstate_inode(struct inode *i) { return NULL; } + +static inline struct hugepage_subpool *subpool_inode(struct inode *inode) +{ + return NULL; +} + #endif /* !CONFIG_HUGETLBFS */ #ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9c9262833b4f..9da419b930df 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -247,11 +247,6 @@ static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, return ret; } -static inline struct hugepage_subpool *subpool_inode(struct inode *inode) -{ - return HUGETLBFS_SB(inode->i_sb)->spool; -} - static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) { return subpool_inode(file_inode(vma->vm_file)); @@ -898,16 +893,13 @@ static long region_del(struct resv_map *resv, long f, long t) * appear as a "reserved" entry instead of simply dangling with incorrect * counts. */ -void hugetlb_fix_reserve_counts(struct inode *inode) +void hugetlb_fix_reserve_counts(struct hstate *h, struct hugepage_subpool *spool) { - struct hugepage_subpool *spool = subpool_inode(inode); long rsv_adjust; bool reserved = false; rsv_adjust = hugepage_subpool_get_pages(spool, 1); if (rsv_adjust > 0) { - struct hstate *h = hstate_inode(inode); - if (!hugetlb_acct_memory(h, 1)) reserved = true; } else if (!rsv_adjust) { @@ -6762,15 +6754,22 @@ long hugetlb_change_protection(struct vm_area_struct *vma, return pages > 0 ? (pages << h->order) : pages; } -/* Return true if reservation was successful, false otherwise. */ -bool hugetlb_reserve_pages(struct inode *inode, - long from, long to, - struct vm_area_struct *vma, - vm_flags_t vm_flags) +/** + * Reserves pages between vma indices @from and @to by handling accounting in: + * + hstate @h (required) + * + subpool @spool (can be NULL) + * + @inode (required if @vma is NULL) + * + * Will setup resv_map in @vma if necessary. + * Return true if reservation was successful, false otherwise. + */ +bool hugetlb_reserve_pages(struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, + long from, long to, + struct vm_area_struct *vma, + vm_flags_t vm_flags) { long chg = -1, add = -1; - struct hstate *h = hstate_inode(inode); - struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; struct hugetlb_cgroup *h_cg = NULL; long gbl_reserve, regions_needed = 0; @@ -6921,13 +6920,23 @@ bool hugetlb_reserve_pages(struct inode *inode, return false; } -long hugetlb_unreserve_pages(struct inode *inode, long start, long end, - long freed) +/** + * Unreserves pages between vma indices @start and @end by handling accounting + * in: + * + hstate @h (required) + * + subpool @spool (can be NULL) + * + @inode (required) + * + resv_map in @inode (can be NULL) + * + * @freed is the number of pages freed, for updating inode->i_blocks. + * + * Returns 0 on success. + */ +long hugetlb_unreserve_pages(struct hstate *h, struct hugepage_subpool *spool, + struct inode *inode, long start, long end, long freed) { - struct hstate *h = hstate_inode(inode); struct resv_map *resv_map = inode_resv_map(inode); long chg = 0; - struct hugepage_subpool *spool = subpool_inode(inode); long gbl_reserve; /* From patchwork Tue Jun 6 19:03:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73622C7EE2A for ; Tue, 6 Jun 2023 19:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239242AbjFFTFD (ORCPT ); Tue, 6 Jun 2023 15:05:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239235AbjFFTEa (ORCPT ); Tue, 6 Jun 2023 15:04:30 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEB29171C for ; Tue, 6 Jun 2023 12:04:22 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-53f84f75bf4so2292828a12.3 for ; Tue, 06 Jun 2023 12:04:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078262; x=1688670262; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TbnAfmaUEaUTjNKT8epdT4a2IjxXaCNnR/Vzf4sBC5I=; b=HeyHfG//kNLUaGcFQWwTewKEgcn4vPYn8juDXqEsqn0adr4vG649KYKYHbBgxoXbAB ahAsJnvxbrxRZIn8HaOziynSothNYKlQfcTu3jAM4uhoNpX3LWBrkvui47Zp8M9CRSxM Hv3v6i8Oi9BrSmd48ktue72BHNGZsJ2uSICRouR0ffp2LrDCVbnCBmA+g2SulzB0hGNV h/9GOrxYxtycjJuMZs46LGZ0YhTsDAihpMdCj9p8GQJvBLZp7PdD2PUuqFKV1kr3TMS1 NWDPMg3o3REsZ5x50h8pQh4zEK7vomJrpG+mCaUeJtrTs1RdwqH+EB5u26LqsONPQnLR jEIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078262; x=1688670262; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TbnAfmaUEaUTjNKT8epdT4a2IjxXaCNnR/Vzf4sBC5I=; b=D17ylKg/mdWpr0HIotbdZqw8a3FyXHLn0Rt7bSNRwv/0n1Cy7qofGPy8WpHzqrqxw1 xoAq3svcRBOJo+FfuG14bEnXyjWLW5NED9m1Ow2gtLOEyYL668MsW2w1IBfuaxkyep8G wTCf1X2jxiyW89OjZOd/m3FQVG/Ccu/m9TfXEB3AYRjGoKxe+f8dbi23/wKf9Qan3Wcb /lpPXwuwBbIRFSYA3VWTAI/v5VfnjDsLxYGf7MDWlzn+zCmXO6ICGJHOrRdRgZEV1UB1 UCWeELpXmYqrAO4fS6cysDVu7d15sw/Zl8lvwJPKEPrIWwplCwwIg42EdSPw1Z+9RaX9 Dujg== X-Gm-Message-State: AC+VfDzGu+nyGCm1BHRgVJ4XSlC/h/j9VaDPRVGBnIjG/+HJivTd/z4s URQp758Q1MdZJVUFLvDYtWTZEMpOkNwffJQMIA== X-Google-Smtp-Source: ACHHUZ7WRmD6VfCo02eYhPrQlyc6XnpZbLw7oQUgpoj7hXhb698PIwiX79ZF4kn3hNLKX22XHrTVhzRB5PnobIvmJA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a63:fe05:0:b0:513:9753:46d2 with SMTP id p5-20020a63fe05000000b00513975346d2mr634687pgh.2.1686078262212; Tue, 06 Jun 2023 12:04:22 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:50 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <7827774c13e975d3d1dedc4a4684cb92eac8b548.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 05/19] mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool and hstate From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org subpool_inode() and hstate_inode() are hugetlbfs-specific. By allowing subpool and hstate to be specified, hugetlb is further modularized from hugetlbfs. Signed-off-by: Ackerley Tng --- include/linux/hugetlb.h | 3 +++ mm/hugetlb.c | 16 ++++++++++++---- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 2457d7a21974..14df89d1642c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -747,6 +747,9 @@ struct huge_bootmem_page { }; int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); +struct folio *alloc_hugetlb_folio_from_subpool( + struct hugepage_subpool *spool, struct hstate *h, + struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9da419b930df..99ab4bbdb2ce 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3008,11 +3008,10 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) return ret; } -struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, - unsigned long addr, int avoid_reserve) +struct folio *alloc_hugetlb_folio_from_subpool( + struct hugepage_subpool *spool, struct hstate *h, + struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { - struct hugepage_subpool *spool = subpool_vma(vma); - struct hstate *h = hstate_vma(vma); struct folio *folio; long map_chg, map_commit; long gbl_chg; @@ -3139,6 +3138,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, return ERR_PTR(-ENOSPC); } +struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, + unsigned long addr, int avoid_reserve) +{ + struct hugepage_subpool *spool = subpool_vma(vma); + struct hstate *h = hstate_vma(vma); + + return alloc_hugetlb_folio_from_subpool(spool, h, vma, addr, avoid_reserve); +} + int alloc_bootmem_huge_page(struct hstate *h, int nid) __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); int __alloc_bootmem_huge_page(struct hstate *h, int nid) From patchwork Tue Jun 6 19:03:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269578 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E476C77B73 for ; Tue, 6 Jun 2023 19:05:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239326AbjFFTEn (ORCPT ); Tue, 6 Jun 2023 15:04:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239252AbjFFTEb (ORCPT ); Tue, 6 Jun 2023 15:04:31 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB13D1733 for ; Tue, 6 Jun 2023 12:04:24 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id 98e67ed59e1d1-259a77766e1so215771a91.0 for ; Tue, 06 Jun 2023 12:04:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078264; x=1688670264; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MZNbtWHLPp5u2GywkOyP6D11aGDQiwpmebr72+3tZZw=; b=3nPwPewVscUDqVZMn7Uj3o98nrz5j8VLctE94SycAE0CkTh6eROV6trMOeNgO1y4jS yLqfAq9F5z3V/0un8MfG4HdspEU1lRRMGNzpIw+TRUDHmsZbFPuX/+Cikl64LmUYIx66 8bq6oGIyVKNOZEdXnRQamgBHRpoKdYwSVBMUDqfzl7aIO0QNJg3Dh7nLDTiLBxl5+u8s 7Pxbmh7gLIa94u88QAeI5yehSo2cVl1/aF9z6LyF0284EPn+EeLrVrvCYse/GQqVLuDH J/bFMQvA4aC2mtn4McrJir6hXAodHcAaWQLRaE/P7TGgIMKpZ2UbMXusJndkMLSh2bgC CYgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078264; x=1688670264; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MZNbtWHLPp5u2GywkOyP6D11aGDQiwpmebr72+3tZZw=; b=h6KXQAi+ISqYOKQIah8cNN1TBnB/52hLigAi7iJwkosPz7UkLW9UFaOBXIU+z6St+U E5DlsEhm3gt38uerRNqg7CNpWGqkSeKx/JkAuBU3lZpjITHrkSryZojSe81MEgGRYSSX x311oqnv1iagKr9/NmeqVjLs4FAMgh+0mVku7A3SiUX1MAaFrj8+SuR0b8j90igYMWyw xub++PA0IzOybjNCfReIb0kFryzJHDtur4ov9qUOQ/SJEVEIMWxLZLovwk0BkwT2pFmC 0s2sE9yl7f7AdXBvXM3K14ZaImtvBov0cRXgQmswpFtL5CoYslXmy+8efuDKx2UgdrNZ 41cA== X-Gm-Message-State: AC+VfDwjZmBkWbLx39sPL7b6AgAusf1qk5hT54SQdof4BNEBEeh/mnUK QIUBbXk/2hALW+LMVTSXGLKbsk/SfPORjn/tyQ== X-Google-Smtp-Source: ACHHUZ7IGNkoxDfbAlb5rNYiUxqKSeo6I3oYpfv0H1ogSeNyAElarCOglNHY4mpG93xhDZmjnsgqFXX+BQakQxRlBw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:90a:c092:b0:256:b3d0:f2f0 with SMTP id o18-20020a17090ac09200b00256b3d0f2f0mr806359pjs.2.1686078263970; Tue, 06 Jun 2023 12:04:23 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:51 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <69ae008ec4076456078b880575ac310171136ac0.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 06/19] mm: hugetlb: Provide hugetlb_filemap_add_folio() From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org hstate_inode() is hugetlbfs-specific, limiting hugetlb_add_to_page_cache() to hugetlbfs. hugetlb_filemap_add_folio() allows hstate to be specified and further separates hugetlb from hugetlbfs. Signed-off-by: Ackerley Tng --- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 13 ++++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 14df89d1642c..7d49048c5a2a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -756,6 +756,8 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); +int hugetlb_filemap_add_folio(struct address_space *mapping, struct hstate *h, + struct folio *folio, pgoff_t idx); int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, pgoff_t idx); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 99ab4bbdb2ce..d16c6417b90f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5665,11 +5665,10 @@ static bool hugetlbfs_pagecache_present(struct hstate *h, return present; } -int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, - pgoff_t idx) +int hugetlb_filemap_add_folio(struct address_space *mapping, struct hstate *h, + struct folio *folio, pgoff_t idx) { struct inode *inode = mapping->host; - struct hstate *h = hstate_inode(inode); int err; __folio_set_locked(folio); @@ -5693,6 +5692,14 @@ int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping return 0; } +int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, + pgoff_t idx) +{ + struct hstate *h = hstate_inode(mapping->host); + + return hugetlb_filemap_add_folio(mapping, h, folio, idx); +} + static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, From patchwork Tue Jun 6 19:03:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269579 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86E75C7EE37 for ; Tue, 6 Jun 2023 19:05:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239254AbjFFTEo (ORCPT ); Tue, 6 Jun 2023 15:04:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239256AbjFFTEb (ORCPT ); Tue, 6 Jun 2023 15:04:31 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59787173A for ; Tue, 6 Jun 2023 12:04:26 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-5343c1d114cso5935809a12.0 for ; Tue, 06 Jun 2023 12:04:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078266; x=1688670266; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=RTPC0AoXX+NIzZnufKQytW8TkFDwziCOrc5ZhwaQeP0=; b=owBmv/2mMPcrWAYYeOz6IufYws8uNBVmJpdSLiIa/Oxif5X7cqv+vateNaoAbLjKN+ 9g8OIvY2fitd8/bF4vWCTRGtTW4cbQGYeFnSk7O/6JmITSXQUE0V5caWCidr9eoCizfR mnHihiIAiXJOu1//OF3fyNTkiIvpgOeBZOYv+VfEA9HuxcdSL1KjLzz1ww4jo+xPlynH cZu+282ONjHqqdAeelyqOrgfiIuIaT7NMZ4iSDTpZ4yPFrLEm9YlMCTiZ2Jg+4dOz9SD Af25fA3ecy4VQVCca8EotY/PKiVwegJai+On/aGLqcq8u80JQ1wIUoTEAC7FSVslSvP6 ocKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078266; x=1688670266; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RTPC0AoXX+NIzZnufKQytW8TkFDwziCOrc5ZhwaQeP0=; b=kdhBs8vN4wDLd4YlUsaFlWP2xuzi0C92RZYdTf1blnOvbVeK3s6wVZ3OqPN7Y+XWQ5 u9+pIgpFe+YgJFfkwDlFVLUG5si/eAFcZSy+20Qan2qi/OP72OtjNFawMialw4X6zZNZ 0dVJtpSaVWUamOzxv2arDFLetbswJewkmH21Fp0CsfgMjr9sdU5/yEKwh4Um4F/ljEqy HFxmrZxOc5IlxlQTO8OGF7how0T+4Gzbe2CuwOQrjMj+T3eaxZ3onfE4xUJLXKsFiT5e XFekxaR3qR5ajlMG057sjkTzYY2+QoaJElnzwuFpJ0Ka0mzBDGPgkVc+tNJQNFAZ+pV8 t3Ag== X-Gm-Message-State: AC+VfDyuphBiLvqMsogtfH12oY43w81mc9NUyTVhupnLvSc6xQWQljDS LkBUHyyJs65/rAE6CadB7IAOaWSwuQDP3mqFzw== X-Google-Smtp-Source: ACHHUZ7U7Xr5zqg3KoBTJpAqI34Pla/gmFzawbxWzmrT+4L/HPyaurNY1sqxQPPqj80dK4ECUufRqT2mO/+XHSgL7w== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a65:618a:0:b0:542:9fad:de1 with SMTP id c10-20020a65618a000000b005429fad0de1mr614686pgv.12.1686078265785; Tue, 06 Jun 2023 12:04:25 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:52 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <508025e09425a98d52b17cfbdc07340ae05e3e32.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 07/19] mm: hugetlb: Refactor vma_*_reservation functions From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org vma_*_reservation functions rely on vma_resv_map(), which assumes on a hugetlbfs concept of the resv_map being stored in a specific field of the inode. This refactor enables vma_*_reservation functions, now renamed resv_map_*_reservation, to be used with non-hugetlbfs filesystems, further decoupling hugetlb from hugetlbfs. Signed-off-by: Ackerley Tng --- mm/hugetlb.c | 184 +++++++++++++++++++++++++++------------------------ 1 file changed, 99 insertions(+), 85 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d16c6417b90f..d943f83d15a9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2643,89 +2643,81 @@ static void return_unused_surplus_pages(struct hstate *h, /* - * vma_needs_reservation, vma_commit_reservation and vma_end_reservation - * are used by the huge page allocation routines to manage reservations. + * resv_map_needs_reservation, resv_map_commit_reservation and + * resv_map_end_reservation are used by the huge page allocation routines to + * manage reservations. * - * vma_needs_reservation is called to determine if the huge page at addr - * within the vma has an associated reservation. If a reservation is - * needed, the value 1 is returned. The caller is then responsible for - * managing the global reservation and subpool usage counts. After - * the huge page has been allocated, vma_commit_reservation is called - * to add the page to the reservation map. If the page allocation fails, - * the reservation must be ended instead of committed. vma_end_reservation - * is called in such cases. + * resv_map_needs_reservation is called to determine if the huge page at addr + * within the vma has an associated reservation. If a reservation is needed, + * the value 1 is returned. The caller is then responsible for managing the + * global reservation and subpool usage counts. After the huge page has been + * allocated, resv_map_commit_reservation is called to add the page to the + * reservation map. If the page allocation fails, the reservation must be ended + * instead of committed. resv_map_end_reservation is called in such cases. * - * In the normal case, vma_commit_reservation returns the same value - * as the preceding vma_needs_reservation call. The only time this - * is not the case is if a reserve map was changed between calls. It - * is the responsibility of the caller to notice the difference and - * take appropriate action. + * In the normal case, resv_map_commit_reservation returns the same value as the + * preceding resv_map_needs_reservation call. The only time this is not the + * case is if a reserve map was changed between calls. It is the responsibility + * of the caller to notice the difference and take appropriate action. * - * vma_add_reservation is used in error paths where a reservation must - * be restored when a newly allocated huge page must be freed. It is - * to be called after calling vma_needs_reservation to determine if a - * reservation exists. + * resv_map_add_reservation is used in error paths where a reservation must be + * restored when a newly allocated huge page must be freed. It is to be called + * after calling resv_map_needs_reservation to determine if a reservation + * exists. * - * vma_del_reservation is used in error paths where an entry in the reserve - * map was created during huge page allocation and must be removed. It is to - * be called after calling vma_needs_reservation to determine if a reservation + * resv_map_del_reservation is used in error paths where an entry in the reserve + * map was created during huge page allocation and must be removed. It is to be + * called after calling resv_map_needs_reservation to determine if a reservation * exists. */ -enum vma_resv_mode { - VMA_NEEDS_RESV, - VMA_COMMIT_RESV, - VMA_END_RESV, - VMA_ADD_RESV, - VMA_DEL_RESV, +enum resv_map_resv_mode { + RESV_MAP_NEEDS_RESV, + RESV_MAP_COMMIT_RESV, + RESV_MAP_END_RESV, + RESV_MAP_ADD_RESV, + RESV_MAP_DEL_RESV, }; -static long __vma_reservation_common(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr, - enum vma_resv_mode mode) +static long __resv_map_reservation_common(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping, + enum resv_map_resv_mode mode) { - struct resv_map *resv; - pgoff_t idx; long ret; long dummy_out_regions_needed; - resv = vma_resv_map(vma); - if (!resv) - return 1; - - idx = vma_hugecache_offset(h, vma, addr); switch (mode) { - case VMA_NEEDS_RESV: - ret = region_chg(resv, idx, idx + 1, &dummy_out_regions_needed); + case RESV_MAP_NEEDS_RESV: + ret = region_chg(resv, resv_index, resv_index + 1, &dummy_out_regions_needed); /* We assume that vma_reservation_* routines always operate on * 1 page, and that adding to resv map a 1 page entry can only * ever require 1 region. */ VM_BUG_ON(dummy_out_regions_needed != 1); break; - case VMA_COMMIT_RESV: - ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + case RESV_MAP_COMMIT_RESV: + ret = region_add(resv, resv_index, resv_index + 1, 1, NULL, NULL); /* region_add calls of range 1 should never fail. */ VM_BUG_ON(ret < 0); break; - case VMA_END_RESV: - region_abort(resv, idx, idx + 1, 1); + case RESV_MAP_END_RESV: + region_abort(resv, resv_index, resv_index + 1, 1); ret = 0; break; - case VMA_ADD_RESV: - if (vma->vm_flags & VM_MAYSHARE) { - ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + case RESV_MAP_ADD_RESV: + if (may_be_shared_mapping) { + ret = region_add(resv, resv_index, resv_index + 1, 1, NULL, NULL); /* region_add calls of range 1 should never fail. */ VM_BUG_ON(ret < 0); } else { - region_abort(resv, idx, idx + 1, 1); - ret = region_del(resv, idx, idx + 1); + region_abort(resv, resv_index, resv_index + 1, 1); + ret = region_del(resv, resv_index, resv_index + 1); } break; - case VMA_DEL_RESV: - if (vma->vm_flags & VM_MAYSHARE) { - region_abort(resv, idx, idx + 1, 1); - ret = region_del(resv, idx, idx + 1); + case RESV_MAP_DEL_RESV: + if (may_be_shared_mapping) { + region_abort(resv, resv_index, resv_index + 1, 1); + ret = region_del(resv, resv_index, resv_index + 1); } else { - ret = region_add(resv, idx, idx + 1, 1, NULL, NULL); + ret = region_add(resv, resv_index, resv_index + 1, 1, NULL, NULL); /* region_add calls of range 1 should never fail. */ VM_BUG_ON(ret < 0); } @@ -2734,7 +2726,7 @@ static long __vma_reservation_common(struct hstate *h, BUG(); } - if (vma->vm_flags & VM_MAYSHARE || mode == VMA_DEL_RESV) + if (may_be_shared_mapping || mode == RESV_MAP_DEL_RESV) return ret; /* * We know private mapping must have HPAGE_RESV_OWNER set. @@ -2758,34 +2750,39 @@ static long __vma_reservation_common(struct hstate *h, return ret; } -static long vma_needs_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static long resv_map_needs_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - return __vma_reservation_common(h, vma, addr, VMA_NEEDS_RESV); + return __resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_NEEDS_RESV); } -static long vma_commit_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static long resv_map_commit_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - return __vma_reservation_common(h, vma, addr, VMA_COMMIT_RESV); + return __resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_COMMIT_RESV); } -static void vma_end_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static void resv_map_end_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - (void)__vma_reservation_common(h, vma, addr, VMA_END_RESV); + (void)__resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_END_RESV); } -static long vma_add_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static long resv_map_add_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV); + return __resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_ADD_RESV); } -static long vma_del_reservation(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static long resv_map_del_reservation(struct resv_map *resv, pgoff_t resv_index, + bool may_be_shared_mapping) { - return __vma_reservation_common(h, vma, addr, VMA_DEL_RESV); + return __resv_map_reservation_common( + resv, resv_index, may_be_shared_mapping, RESV_MAP_DEL_RESV); } /* @@ -2811,7 +2808,12 @@ static long vma_del_reservation(struct hstate *h, void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct folio *folio) { - long rc = vma_needs_reservation(h, vma, address); + long rc; + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, address); + bool may_share = vma->vm_flags & VM_MAYSHARE; + + rc = resv_map_needs_reservation(resv, resv_index, may_share); if (folio_test_hugetlb_restore_reserve(folio)) { if (unlikely(rc < 0)) @@ -2828,9 +2830,9 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, */ folio_clear_hugetlb_restore_reserve(folio); else if (rc) - (void)vma_add_reservation(h, vma, address); + (void)resv_map_add_reservation(resv, resv_index, may_share); else - vma_end_reservation(h, vma, address); + resv_map_end_reservation(resv, resv_index, may_share); } else { if (!rc) { /* @@ -2841,7 +2843,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, * Remove the entry so that a subsequent allocation * does not consume a reservation. */ - rc = vma_del_reservation(h, vma, address); + rc = resv_map_del_reservation(resv, resv_index, may_share); if (rc < 0) /* * VERY rare out of memory condition. Since @@ -2855,7 +2857,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, } else if (rc < 0) { /* * Rare out of memory condition from - * vma_needs_reservation call. Memory allocation is + * resv_map_needs_reservation call. Memory allocation is * only attempted if a new entry is needed. Therefore, * this implies there is not an entry in the * reserve map. @@ -2877,7 +2879,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, /* * No reservation present, do nothing */ - vma_end_reservation(h, vma, address); + resv_map_end_reservation(resv, resv_index, may_share); } } @@ -3019,13 +3021,17 @@ struct folio *alloc_hugetlb_folio_from_subpool( struct hugetlb_cgroup *h_cg = NULL; bool deferred_reserve; + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, addr); + bool may_share = vma->vm_flags & VM_MAYSHARE; + idx = hstate_index(h); /* * Examine the region/reserve map to determine if the process * has a reservation for the page to be allocated. A return * code of zero indicates a reservation exists (no change). */ - map_chg = gbl_chg = vma_needs_reservation(h, vma, addr); + map_chg = gbl_chg = resv_map_needs_reservation(resv, resv_index, may_share); if (map_chg < 0) return ERR_PTR(-ENOMEM); @@ -3039,7 +3045,7 @@ struct folio *alloc_hugetlb_folio_from_subpool( if (map_chg || avoid_reserve) { gbl_chg = hugepage_subpool_get_pages(spool, 1); if (gbl_chg < 0) { - vma_end_reservation(h, vma, addr); + resv_map_end_reservation(resv, resv_index, may_share); return ERR_PTR(-ENOSPC); } @@ -3104,11 +3110,11 @@ struct folio *alloc_hugetlb_folio_from_subpool( hugetlb_set_folio_subpool(folio, spool); - map_commit = vma_commit_reservation(h, vma, addr); + map_commit = resv_map_commit_reservation(resv, resv_index, may_share); if (unlikely(map_chg > map_commit)) { /* * The page was added to the reservation map between - * vma_needs_reservation and vma_commit_reservation. + * resv_map_needs_reservation and resv_map_commit_reservation. * This indicates a race with hugetlb_reserve_pages. * Adjust for the subpool count incremented above AND * in hugetlb_reserve_pages for the same page. Also, @@ -3134,7 +3140,7 @@ struct folio *alloc_hugetlb_folio_from_subpool( out_subpool_put: if (map_chg || avoid_reserve) hugepage_subpool_put_pages(spool, 1); - vma_end_reservation(h, vma, addr); + resv_map_end_reservation(resv, resv_index, may_share); return ERR_PTR(-ENOSPC); } @@ -5901,12 +5907,16 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * the spinlock. */ if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { - if (vma_needs_reservation(h, vma, haddr) < 0) { + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, address); + bool may_share = vma->vm_flags & VM_MAYSHARE; + + if (resv_map_needs_reservation(resv, resv_index, may_share) < 0) { ret = VM_FAULT_OOM; goto backout_unlocked; } /* Just decrements count, does not deallocate */ - vma_end_reservation(h, vma, haddr); + resv_map_end_reservation(resv, resv_index, may_share); } ptl = huge_pte_lock(h, mm, ptep); @@ -6070,12 +6080,16 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) && !(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(entry)) { - if (vma_needs_reservation(h, vma, haddr) < 0) { + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, address); + bool may_share = vma->vm_flags & VM_MAYSHARE; + + if (resv_map_needs_reservation(resv, resv_index, may_share) < 0) { ret = VM_FAULT_OOM; goto out_mutex; } /* Just decrements count, does not deallocate */ - vma_end_reservation(h, vma, haddr); + resv_map_end_reservation(resv, resv_index, may_share); pagecache_folio = filemap_lock_folio(mapping, idx); } From patchwork Tue Jun 6 19:03:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269582 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD19DC7EE39 for ; Tue, 6 Jun 2023 19:05:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238853AbjFFTFI (ORCPT ); Tue, 6 Jun 2023 15:05:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239276AbjFFTEc (ORCPT ); Tue, 6 Jun 2023 15:04:32 -0400 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34CA71993 for ; Tue, 6 Jun 2023 12:04:28 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-653a5de0478so2662453b3a.0 for ; Tue, 06 Jun 2023 12:04:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078267; x=1688670267; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=q0LlAXxza1jA09p0tGZBn0yslODgyOL+oHXEFWpIcjw=; b=oNCwvb9aRpGDjzemliRPAV2RbEjUkgcqMTcFw35zrX9Wk6VNZIJTDcZ8+sohLR6ypL FaZ8YypjTtdq4RVqmfVtwq8W8DKoRJdnwKkTjuJm/fygYago9102lZaBqLBb3VLlSbNA 8+GKs0dCEhmYGJeMyAKWj2FCcFo2qfmKEi+Bfw7yT1VoVQRRdAWcJACFBwEDVx5EFgYY ssLEqzao5SUtD6qqnRzkJD0PmTNAv31SFuhQy7OBTGaI6F+cg3TLNZhoEHYY01Vr6fPj YBc8jpY1SOo3QPJvqg80DDEnv97O24LbqrhfCC1OmOHsPqRNG8fKpuVSD4Hro3kbXfW6 x8MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078267; x=1688670267; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=q0LlAXxza1jA09p0tGZBn0yslODgyOL+oHXEFWpIcjw=; b=kgcOVICvOAoJJ1lafnqqmaN5Dng9LmkFEiBH9W77AcZLcEGl/gmvi6XaIEu0Kh0UOv 60qcMnqRQgH3uXPnoV0ESIcoF7Lwv2daCDPxc0I84s9jJElll3ekKPeOfujSkl72OHZR tcwRZmPzNPgYKJ+QIDbFaBMmXUYSoqvvu5vG7WvX4Vcly4it8a5PYHWx+ZUEPXz/ieNV 45OXvt0hyCyonrNzsg89FjvjTR5iZsPxygWALhdC0D07yPz1LRD0uAmCmklaTRouAurZ UftDrXkUyFl4FqnH2Gc+TNL9rjaSYUR1gf9fYxBAUGWe6jbuEusOXanTysdgxY43P1My X6jA== X-Gm-Message-State: AC+VfDxIxyR4ij3sQ6qp0jfYKNWXgAy+k2qt1Z6SbxGW4SY8QrinVDY2 n1+rlDQNKJcJ7w9a9F+ubaILWfqctYpnnfm8bg== X-Google-Smtp-Source: ACHHUZ6/geo43pUv13kPM8dbgitYn16Pb8nr3LytYQllA2bz6ABBAmVaVTp99t2le/ojeTQ9xVl065tEgGqpA1Ormw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6a00:188d:b0:64f:cdb0:64a8 with SMTP id x13-20020a056a00188d00b0064fcdb064a8mr1304775pfh.3.1686078267446; Tue, 06 Jun 2023 12:04:27 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:53 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 08/19] mm: hugetlb: Refactor restore_reserve_on_error From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Refactor restore_reserve_on_error to allow resv_map to be passed in. vma_resv_map() assumes the use of hugetlbfs in the way it retrieves the resv_map from the vma and inode. Introduce restore_reserve_on_error_vma() which retains original functionality to simplify refactoring for now. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 37 +++++++++++++++++++++---------------- 3 files changed, 26 insertions(+), 19 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 0fc49b6252e4..44e6ee9a856d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -868,7 +868,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __folio_mark_uptodate(folio); error = hugetlb_add_to_page_cache(folio, mapping, index); if (unlikely(error)) { - restore_reserve_on_error(h, &pseudo_vma, addr, folio); + restore_reserve_on_error_vma(h, &pseudo_vma, addr, folio); folio_put(folio); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 7d49048c5a2a..02a2766d89a4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -760,8 +760,10 @@ int hugetlb_filemap_add_folio(struct address_space *mapping, struct hstate *h, struct folio *folio, pgoff_t idx); int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, pgoff_t idx); -void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, struct folio *folio); +void restore_reserve_on_error(struct resv_map *resv, pgoff_t resv_index, + bool may_share, struct folio *folio); +void restore_reserve_on_error_vma(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct folio *folio); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h, int nid); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d943f83d15a9..4675f9efeba4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2805,15 +2805,10 @@ static long resv_map_del_reservation(struct resv_map *resv, pgoff_t resv_index, * * In case 2, simply undo reserve map modifications done by alloc_hugetlb_folio. */ -void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, struct folio *folio) +void restore_reserve_on_error(struct resv_map *resv, pgoff_t resv_index, + bool may_share, struct folio *folio) { - long rc; - struct resv_map *resv = vma_resv_map(vma); - pgoff_t resv_index = vma_hugecache_offset(h, vma, address); - bool may_share = vma->vm_flags & VM_MAYSHARE; - - rc = resv_map_needs_reservation(resv, resv_index, may_share); + long rc = resv_map_needs_reservation(resv, resv_index, may_share); if (folio_test_hugetlb_restore_reserve(folio)) { if (unlikely(rc < 0)) @@ -2865,7 +2860,7 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, * For shared mappings, no entry in the map indicates * no reservation. We are done. */ - if (!(vma->vm_flags & VM_MAYSHARE)) + if (!may_share) /* * For private mappings, no entry indicates * a reservation is present. Since we can @@ -2883,6 +2878,16 @@ void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, } } +void restore_reserve_on_error_vma(struct hstate *h, struct vm_area_struct *vma, + unsigned long address, struct folio *folio) +{ + struct resv_map *resv = vma_resv_map(vma); + pgoff_t resv_index = vma_hugecache_offset(h, vma, address); + bool may_share = vma->vm_flags & VM_MAYSHARE; + + restore_reserve_on_error(resv, resv_index, may_share, folio); +} + /* * alloc_and_dissolve_hugetlb_folio - Allocate a new folio and dissolve * the old one @@ -5109,8 +5114,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { - restore_reserve_on_error(h, dst_vma, addr, - new_folio); + restore_reserve_on_error_vma(h, dst_vma, addr, + new_folio); folio_put(new_folio); /* huge_ptep of dst_pte won't change as in child */ goto again; @@ -5642,7 +5647,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * unshare) */ if (new_folio != page_folio(old_page)) - restore_reserve_on_error(h, vma, haddr, new_folio); + restore_reserve_on_error_vma(h, vma, haddr, new_folio); folio_put(new_folio); out_release_old: put_page(old_page); @@ -5860,7 +5865,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * to the page cache. So it's safe to call * restore_reserve_on_error() here. */ - restore_reserve_on_error(h, vma, haddr, folio); + restore_reserve_on_error_vma(h, vma, haddr, folio); folio_put(folio); goto out; } @@ -5965,7 +5970,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, spin_unlock(ptl); backout_unlocked: if (new_folio && !new_pagecache_folio) - restore_reserve_on_error(h, vma, haddr, folio); + restore_reserve_on_error_vma(h, vma, haddr, folio); folio_unlock(folio); folio_put(folio); @@ -6232,7 +6237,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* Free the allocated folio which may have * consumed a reservation. */ - restore_reserve_on_error(h, dst_vma, dst_addr, folio); + restore_reserve_on_error_vma(h, dst_vma, dst_addr, folio); folio_put(folio); /* Allocate a temporary folio to hold the copied @@ -6361,7 +6366,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, folio_unlock(folio); out_release_nounlock: if (!folio_in_pagecache) - restore_reserve_on_error(h, dst_vma, dst_addr, folio); + restore_reserve_on_error_vma(h, dst_vma, dst_addr, folio); folio_put(folio); goto out; } From patchwork Tue Jun 6 19:03:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87F94C7EE33 for ; Tue, 6 Jun 2023 19:05:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239293AbjFFTFK (ORCPT ); Tue, 6 Jun 2023 15:05:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239306AbjFFTEg (ORCPT ); Tue, 6 Jun 2023 15:04:36 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0CCE19B4 for ; Tue, 6 Jun 2023 12:04:30 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-53f06f7cc74so3500005a12.1 for ; Tue, 06 Jun 2023 12:04:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078270; x=1688670270; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QY81mipd5eg6c/BuU1bdNHCWJQ+u0Nbd1U9TPL2kLvI=; b=KOWrQMhz2uWLqPOLL/0JQqPmlrC3dCiI9oJNy8GCgLIowS+OzS7WKaPGY9vPm3PtAT fUuRJYwgpjWgfKigP2vr98VC/4aqX3GYpvsp4hPnsGOFwNN2fFFSeRFXL9FC8QZZeOsv TXZBuwWg2rM6bvxVXZ2UCQuJ6ff8v2+LFRqOgjtb865dVZ2l7fY2f7Z/OtzPXYHKjbfN ZRjrOmzbbZGa3KF/56lwJLd6R/hiuFa+vyvCC7nno3amrl0TRMEIewtSKtG/z8kueDPR Li+slKiOskei/vBhvsQJtHNmBIeIcarnb3mQ6zRjd56sBUc1JOzEZw6pSH3bcClc+mm/ tubw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078270; x=1688670270; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QY81mipd5eg6c/BuU1bdNHCWJQ+u0Nbd1U9TPL2kLvI=; b=YjH2UmDD1nk7Zui7JWIt7hxmCYa9t6IYZDuRdo7O72jcObUYu6apaQn7W6H8bujtCz r1QwSrn+cykOIOfmFZH4MW5rPnghST5iWw24twN+z1/vJpXVzGp7DGRlVvlxJXBjL5cr /AwZO/hutI3NCnkHXzZ9mZEOy6i5ymgoY5hjtDmaeDtMI0f8CLEzcNTlfY7WDJbriP3M bfch5gTRK45+o1GjvShFQk3eiMPuQPa5SgdBWUztjaxWyGu3bJ9my1WkUCH56CDMpf+w p5vebFXJjH0Rh7AAdF9AtUDejfmq0rBBKk2GOBF8HRTcH6VaEVf4FL1HDxXjLDfcDTKw YYRQ== X-Gm-Message-State: AC+VfDyIeHeEe6r5H4YIvbOxzKs/mC85EpMHnVx9IdnWgGsJXQU/xh3l eBJ9SfNG9HU/pqNAGAygJX9yJ9A9KRFtHaMlAQ== X-Google-Smtp-Source: ACHHUZ7uGZgeuxGQjEekaNfEPd3cHpOSpYga7qIjxZtHDS9vRTE59C+39iUxIiNEiIcn3iYrIGeQIhYQVNbQFyY1wQ== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a63:4848:0:b0:534:7672:433e with SMTP id x8-20020a634848000000b005347672433emr763304pgk.3.1686078269556; Tue, 06 Jun 2023 12:04:29 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:54 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <7937abfd3f2d071820a1bcb84e05bf48e38e2e5b.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 09/19] mm: hugetlb: Use restore_reserve_on_error directly in filesystems From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Expose inode_resv_map() so that hugetlbfs can access its own resv_map. Hide restore_reserve_on_error_vma(), that function is now only used within mm/hugetlb.c. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 21 +++++++++++++++++++-- mm/hugetlb.c | 13 ------------- 3 files changed, 20 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 44e6ee9a856d..53f6a421499d 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -868,7 +868,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, __folio_mark_uptodate(folio); error = hugetlb_add_to_page_cache(folio, mapping, index); if (unlikely(error)) { - restore_reserve_on_error_vma(h, &pseudo_vma, addr, folio); + restore_reserve_on_error(inode_resv_map(inode), index, true, folio); folio_put(folio); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 02a2766d89a4..5fe9643826d7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -568,6 +568,20 @@ static inline struct hugepage_subpool *subpool_inode(struct inode *inode) return HUGETLBFS_SB(inode->i_sb)->spool; } +static inline struct resv_map *inode_resv_map(struct inode *inode) +{ + /* + * At inode evict time, i_mapping may not point to the original + * address space within the inode. This original address space + * contains the pointer to the resv_map. So, always use the + * address space embedded within the inode. + * The VERY common case is inode->mapping == &inode->i_data but, + * this may not be true for device special inodes. + */ + return (struct resv_map *)(&inode->i_data)->private_data; +} + + #else /* !CONFIG_HUGETLBFS */ #define is_file_hugepages(file) false @@ -588,6 +602,11 @@ static inline struct hugepage_subpool *subpool_inode(struct inode *inode) return NULL; } +static inline struct resv_map *inode_resv_map(struct inode *inode) +{ + return NULL; +} + #endif /* !CONFIG_HUGETLBFS */ #ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA @@ -762,8 +781,6 @@ int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping pgoff_t idx); void restore_reserve_on_error(struct resv_map *resv, pgoff_t resv_index, bool may_share, struct folio *folio); -void restore_reserve_on_error_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address, struct folio *folio); /* arch callback */ int __init __alloc_bootmem_huge_page(struct hstate *h, int nid); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4675f9efeba4..540634aec181 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1091,19 +1091,6 @@ void resv_map_release(struct kref *ref) kfree(resv_map); } -static inline struct resv_map *inode_resv_map(struct inode *inode) -{ - /* - * At inode evict time, i_mapping may not point to the original - * address space within the inode. This original address space - * contains the pointer to the resv_map. So, always use the - * address space embedded within the inode. - * The VERY common case is inode->mapping == &inode->i_data but, - * this may not be true for device special inodes. - */ - return (struct resv_map *)(&inode->i_data)->private_data; -} - static struct resv_map *vma_resv_map(struct vm_area_struct *vma) { VM_BUG_ON_VMA(!is_vm_hugetlb_page(vma), vma); From patchwork Tue Jun 6 19:03:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 609CCC77B7A for ; Tue, 6 Jun 2023 19:05:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239340AbjFFTFW (ORCPT ); Tue, 6 Jun 2023 15:05:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239343AbjFFTEs (ORCPT ); Tue, 6 Jun 2023 15:04:48 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C68521715 for ; Tue, 6 Jun 2023 12:04:32 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-bb3a35ba742so1536082276.0 for ; Tue, 06 Jun 2023 12:04:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078271; x=1688670271; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QP9zRN1PPC9/vVggheoZtkj7ew+Av1V49kv617fnYR8=; b=FI4N69NL3RScUYfhkbRLrL4c1EfS/desGI2Shr1/96TXRRDn4geHvyU1N7eGgqKJyG GWBPzDYB1BfYFhv6zCt0EhdFu5tldM3yzA5NvQB4/zCb+hcK/bM/QuqRiIfDvi5+hI04 SbvBvKqQxgCQVjt+cTCPYL6N1+wfniCAiJKkEW+sWbUf69X9qSMHub3UvTd9MQDzXMi4 1B4c0zvORWB8HqA5+0cdcj+spglhu6jaXUuf69LPtiQvLZve3RcWtPEQxixl3Ypwmw86 K8LGygCgyEMNtUVq9AiwfPbvahkLarTMGurJs3foOYOqn4CdQy93rRAZp4edz17ORY67 OYlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078271; x=1688670271; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QP9zRN1PPC9/vVggheoZtkj7ew+Av1V49kv617fnYR8=; b=kS68L2ZAyq6qEk9vEc2MwHWtY3qIjOAWflAk1YgtfuHRjD+Q9jr0UZsngcKvfTx05f C6aAirYHIxghSbt7HKTkqMsVqxqXYz4wVjnHffWzcG9nrR/Hsz3AbPqvTHmf1YFmfFox qsfvQSLcbZ1BZ4fR3HDfhT+Ik3YgqRjcOm1R14rM6e2YYXPsh8w09qbvPMaCVJjS11uo rOpDOyQrWnK9KRs8QiSI0DbvAn6Zb1SupDXagq4UcgBI9iErMqMvpo/Jh25BStbZeIEq 7Su+FGeXn4+6d5F+jhiJM/r+fEsYJMl5/rE2eU6riH2YIbhawbSQ2C6kLlUTppAC5Dwf bFHA== X-Gm-Message-State: AC+VfDxjG+SWu4bgBRftwqxivG6MKFd6Us+n/qgH50mX4Lzwe/jtUdD0 aZ/gIwThKWXKaYRaLtrWzA/lu/p4SHedsatBgA== X-Google-Smtp-Source: ACHHUZ76bqpjC35XKXFBahxA9u7+2eFkt14S7ZQHJlZjw7W0ReYy7o2ytYkzhnc/yAz55ZaZBmAlcbBYdbpzzQGaSA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a25:105:0:b0:bab:a276:caac with SMTP id 5-20020a250105000000b00baba276caacmr1716574ybb.3.1686078271536; Tue, 06 Jun 2023 12:04:31 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:55 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <382ee70df7b65c365a1eab1223f84aecc0c5be10.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 10/19] mm: hugetlb: Parametrize alloc_hugetlb_folio_from_subpool() by resv_map From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Parametrize alloc_hugetlb_folio_from_subpool() by resv_map to remove the use of vma_resv_map() and decouple hugetlb with hugetlbfs. Signed-off-by: Ackerley Tng --- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5fe9643826d7..d564802ace4b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -767,7 +767,7 @@ struct huge_bootmem_page { int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct folio *alloc_hugetlb_folio_from_subpool( - struct hugepage_subpool *spool, struct hstate *h, + struct hugepage_subpool *spool, struct hstate *h, struct resv_map *resv, struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 540634aec181..aebdd8c63439 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3003,7 +3003,7 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) } struct folio *alloc_hugetlb_folio_from_subpool( - struct hugepage_subpool *spool, struct hstate *h, + struct hugepage_subpool *spool, struct hstate *h, struct resv_map *resv, struct vm_area_struct *vma, unsigned long addr, int avoid_reserve) { struct folio *folio; @@ -3013,7 +3013,6 @@ struct folio *alloc_hugetlb_folio_from_subpool( struct hugetlb_cgroup *h_cg = NULL; bool deferred_reserve; - struct resv_map *resv = vma_resv_map(vma); pgoff_t resv_index = vma_hugecache_offset(h, vma, addr); bool may_share = vma->vm_flags & VM_MAYSHARE; @@ -3141,8 +3140,9 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, { struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); + struct resv_map *resv = vma_resv_map(vma); - return alloc_hugetlb_folio_from_subpool(spool, h, vma, addr, avoid_reserve); + return alloc_hugetlb_folio_from_subpool(spool, h, resv, vma, addr, avoid_reserve); } int alloc_bootmem_huge_page(struct hstate *h, int nid) From patchwork Tue Jun 6 19:03:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 451C0C7EE2C for ; Tue, 6 Jun 2023 19:05:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239199AbjFFTFX (ORCPT ); Tue, 6 Jun 2023 15:05:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239353AbjFFTEs (ORCPT ); Tue, 6 Jun 2023 15:04:48 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF0DB172D for ; Tue, 6 Jun 2023 12:04:33 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-53fa00ed93dso5622551a12.3 for ; Tue, 06 Jun 2023 12:04:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078273; x=1688670273; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=DUcXaEC48I4Bm1WwWPPkFcaYKFbZrdo62sI7Vn9jtKM=; b=EsMzcu7MCofZV1M6N8qLTVGfCobRb2m9UBd1k4rqImHaaV90eE+7yVgqGJTlHLVTGg WYLYzf9nuh7wT2wl0ong7Pm+JdEwy0XrWnw7A67dAqqNLmKKthrxFKAZNZCPEmg+Gm9l 1/jfZjJbtssIps71Hn10bsoNM3+P1vWpJqtgmuCPP7gyVh3snIw/tKY/Poj3vjuiFMIm hWDohY5CuxRIZJ1XCd6YRMkjU3OFa1a+IZrlt5R6w1j+nUqvdL8ryozxcFEC3Inu+RpX MrblT0sw+GiuzKbFJnN8KErIkyQh7K372xpyIXAqppgkeoG3kQADStwWWL0FGczDjczb yaGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078273; x=1688670273; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DUcXaEC48I4Bm1WwWPPkFcaYKFbZrdo62sI7Vn9jtKM=; b=GyEH3ooRIZLuwUuh4UNGc0orOlg47rjCsyaP+shM6t41wa4OrcZffXZr/At7MhUst2 EcO3gv10qXohw/6rjICGYFMPeiKbuDe9gromIxsAkoKa2qfEbUF9BykVMqJqX3hciZqL H3QhlgPH8aCQ33b7ofZW7vA/sbUWbR5s/rRlX1+IE/OZZoNyTy6Q0P2jfetJQhG6RTmy /f2nwqMEQHjM+Q/QLbMSfxvdPTbw+oG6t8YVe6fVfllFoQQojC19MI4HmPZY5E8bWg7G WZeA+H/M/DurxMv30AGHKPRB/dWWntD/FMgGw9DbfNTRwD+uzK9NekDO1nzHSAFvksUq athQ== X-Gm-Message-State: AC+VfDzukEuWV2EybxE1z6EFy4lIAOHzya7Pj4bWe2kdtGWyAkOo2XKy r21S9LsU7HmQ4+t+7j2CyS7YSaGznezuyU/hpQ== X-Google-Smtp-Source: ACHHUZ4v54k8mWqyJRdVfnADxQ7LxuS0BkTdcogjdbP6mmP0fUtChdx6VWMt3kG2rI14jmGLU8WGRs/GoPhwfnEqcg== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a65:5c8b:0:b0:53f:b396:6f32 with SMTP id a11-20020a655c8b000000b0053fb3966f32mr628734pgt.3.1686078273321; Tue, 06 Jun 2023 12:04:33 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:56 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <1d0337d32f40b781f9b7509cb40448b81bde6b00.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 11/19] mm: hugetlb: Parametrize hugetlb functions by resv_map From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Parametrize remove_mapping_hugepages() and hugetlb_unreserve_pages() by resv_map to remove the use of inode_resv_map() and decouple hugetlb with hugetlbfs. Signed-off-by: Ackerley Tng --- fs/hugetlbfs/inode.c | 16 ++++++++++------ include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 4 ++-- 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 53f6a421499d..a7791b1390a6 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -560,8 +560,8 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, */ static bool remove_mapping_single_folio( struct address_space *mapping, struct folio *folio, pgoff_t index, - struct hstate *h, struct hugepage_subpool *spool, struct inode *inode, - bool truncate_op) + struct hstate *h, struct hugepage_subpool *spool, struct resv_map *resv_map, + struct inode *inode, bool truncate_op) { bool ret = false; @@ -586,7 +586,8 @@ static bool remove_mapping_single_folio( hugetlb_delete_from_page_cache(folio); ret = true; if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(h, spool, inode, index, index + 1, 1))) + if (unlikely(hugetlb_unreserve_pages(h, spool, resv_map, + inode, index, index + 1, 1))) hugetlb_fix_reserve_counts(h, spool); } @@ -623,6 +624,7 @@ static bool remove_mapping_single_folio( */ void remove_mapping_hugepages(struct address_space *mapping, struct hstate *h, struct hugepage_subpool *spool, + struct resv_map *resv_map, struct inode *inode, loff_t lstart, loff_t lend) { const pgoff_t start = lstart >> huge_page_shift(h); @@ -647,7 +649,7 @@ void remove_mapping_hugepages(struct address_space *mapping, * Remove folio that was part of folio_batch. */ if (remove_mapping_single_folio(mapping, folio, index, - h, spool, inode, truncate_op)) + h, spool, resv_map, inode, truncate_op)) freed++; mutex_unlock(&hugetlb_fault_mutex_table[hash]); @@ -657,7 +659,8 @@ void remove_mapping_hugepages(struct address_space *mapping, } if (truncate_op) - (void)hugetlb_unreserve_pages(h, spool, inode, start, LONG_MAX, freed); + (void)hugetlb_unreserve_pages(h, spool, resv_map, inode, + start, LONG_MAX, freed); } void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) @@ -665,8 +668,9 @@ void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) struct address_space *mapping = &inode->i_data; struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); + struct resv_map *resv_map = inode_resv_map(inode); - return remove_mapping_hugepages(mapping, h, spool, inode, lstart, lend); + return remove_mapping_hugepages(mapping, h, spool, resv_map, inode, lstart, lend); } static void hugetlbfs_evict_inode(struct inode *inode) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d564802ace4b..af04588a5afe 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -172,7 +172,8 @@ bool hugetlb_reserve_pages(struct hstate *h, struct hugepage_subpool *spool, struct vm_area_struct *vma, vm_flags_t vm_flags); long hugetlb_unreserve_pages(struct hstate *h, struct hugepage_subpool *spool, - struct inode *inode, long start, long end, long freed); + struct resv_map *resv_map, struct inode *inode, + long start, long end, long freed); bool isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool unpoison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, @@ -263,6 +264,7 @@ void hugetlb_zero_partial_page(struct hstate *h, struct address_space *mapping, void remove_mapping_hugepages(struct address_space *mapping, struct hstate *h, struct hugepage_subpool *spool, + struct resv_map *resv_map, struct inode *inode, loff_t lstart, loff_t lend); void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend); @@ -479,7 +481,7 @@ static inline void hugetlb_zero_partial_page( static inline void remove_mapping_hugepages( struct address_space *mapping, struct hstate *h, struct hugepage_subpool *spool, - struct inode *inode, loff_t lstart, loff_t lend) {} + struct resv_map *resv_map, struct inode *inode, loff_t lstart, loff_t lend) {} static inline void remove_inode_hugepages(struct inode *inode, loff_t lstart, loff_t lend) {} #endif /* !CONFIG_HUGETLB_PAGE */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index aebdd8c63439..a1cbda457aa7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6954,9 +6954,9 @@ bool hugetlb_reserve_pages(struct hstate *h, struct hugepage_subpool *spool, * Returns 0 on success. */ long hugetlb_unreserve_pages(struct hstate *h, struct hugepage_subpool *spool, - struct inode *inode, long start, long end, long freed) + struct resv_map *resv_map, struct inode *inode, + long start, long end, long freed) { - struct resv_map *resv_map = inode_resv_map(inode); long chg = 0; long gbl_reserve; From patchwork Tue Jun 6 19:03:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269586 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FCD6C7EE2C for ; Tue, 6 Jun 2023 19:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239416AbjFFTFt (ORCPT ); Tue, 6 Jun 2023 15:05:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239376AbjFFTEy (ORCPT ); Tue, 6 Jun 2023 15:04:54 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D3331730 for ; Tue, 6 Jun 2023 12:04:35 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id d2e1a72fcca58-65267350de3so2022447b3a.3 for ; Tue, 06 Jun 2023 12:04:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078275; x=1688670275; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gCnAmjEfrZv4hWergm9cCKqGSfxeJstUIVXgvQlsrbM=; b=0RfjN0vYiusk5ioCQS1nAlCQ9SwNbpNkrAGoCc3DzbyiTfogeskfW/k35AUHJjbjvh qcmNficgsPHY7IzjLRZ7Fx7yHknnSVBialc9Lm2HDy+G56wJLuQdJfrCE1L/whTYztSX 0nMD1VG7QI0IeJ7NlJ0RqSFpSkllQWKP+8SH5M4Tworj/X9kHQ6JcXzt+NcdmrYuQcSD ftcLroNPDkUfGiCVt5bzuOWxk/1vZL8CVOd3YhjEG4x4yYR8uQvEHMIVWqF9xF36YlZ+ oP2GU9HlO/y1x/wbpPSXD25KiOaXofjmCzoeXIdkiuAzfcG5mWunKS1qCnW27bnXJIzu pIkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078275; x=1688670275; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gCnAmjEfrZv4hWergm9cCKqGSfxeJstUIVXgvQlsrbM=; b=QWxFuinGte1cGDTzfPYAVy78VrEjD/YSXYj6RgsHWW08vfDDFaIEzULGA5w9kbFYHy werR78CYSeEsRhYigkmjL46I0aZw/EONUfxDErzWrcD5HCtPNv2uk0x6mWpvWeAS3i36 AT1Er/W6uIcLw4INeqBcxyOXtTAfbluLM8n6OroV6Ys3yMG2a9GgZmSS+4ztuCJOTNjt 3CQ7tCp4bKUCiv1c3gkRjVAQKVLK21ETXbz4PrOD3ojNZCyoNl1DqDFScyIlEHxKFtII zWZnwyZJnzFdPq6mg/MD9o5yRCc7EIEuygt/m3GIfL1UQjv4zsFBrP+7sK0qxXdfZLhR w2XQ== X-Gm-Message-State: AC+VfDxkGamUThUNlRrhZli843Jpt+RLSlQAK1n8MAIbovYEbZKjLcYv 2j46VNOBYcT8E9fK0EPZrMseifT5MI005ZOHeA== X-Google-Smtp-Source: ACHHUZ7oX0XJdw6EGaS/GrZQZU2JbZA6pR7i2na5qTVNAmby/QbfNP6VXG9VMvls4TqYrSTNJjMknJRBHdMlJ61Amg== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6a00:148d:b0:63b:234e:d641 with SMTP id v13-20020a056a00148d00b0063b234ed641mr1369654pfu.4.1686078275306; Tue, 06 Jun 2023 12:04:35 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:57 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <0c1144b9c5cd620cd0acf7ee033fef8d311b97ba.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 12/19] mm: truncate: Expose preparation steps for truncate_inode_pages_final From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This will allow preparation steps to be shared Signed-off-by: Ackerley Tng --- include/linux/mm.h | 1 + mm/truncate.c | 24 ++++++++++++++---------- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f79667824eb..7a8f6b810de0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3053,6 +3053,7 @@ extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info); extern void truncate_inode_pages(struct address_space *, loff_t); extern void truncate_inode_pages_range(struct address_space *, loff_t lstart, loff_t lend); +extern void truncate_inode_pages_final_prepare(struct address_space *mapping); extern void truncate_inode_pages_final(struct address_space *); /* generic vm_area_ops exported for stackable file systems */ diff --git a/mm/truncate.c b/mm/truncate.c index 7b4ea4c4a46b..4a7ae87e03b5 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -449,16 +449,7 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart) } EXPORT_SYMBOL(truncate_inode_pages); -/** - * truncate_inode_pages_final - truncate *all* pages before inode dies - * @mapping: mapping to truncate - * - * Called under (and serialized by) inode->i_rwsem. - * - * Filesystems have to use this in the .evict_inode path to inform the - * VM that this is the final truncate and the inode is going away. - */ -void truncate_inode_pages_final(struct address_space *mapping) +void truncate_inode_pages_final_prepare(struct address_space *mapping) { /* * Page reclaim can not participate in regular inode lifetime @@ -479,7 +470,20 @@ void truncate_inode_pages_final(struct address_space *mapping) xa_lock_irq(&mapping->i_pages); xa_unlock_irq(&mapping->i_pages); } +} +/** + * truncate_inode_pages_final - truncate *all* pages before inode dies + * @mapping: mapping to truncate + * + * Called under (and serialized by) inode->i_rwsem. + * + * Filesystems have to use this in the .evict_inode path to inform the + * VM that this is the final truncate and the inode is going away. + */ +void truncate_inode_pages_final(struct address_space *mapping) +{ + truncate_inode_pages_final_prepare(mapping); truncate_inode_pages(mapping, 0); } EXPORT_SYMBOL(truncate_inode_pages_final); From patchwork Tue Jun 6 19:03:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D06E5C77B7A for ; Tue, 6 Jun 2023 19:06:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239399AbjFFTGE (ORCPT ); Tue, 6 Jun 2023 15:06:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239398AbjFFTFA (ORCPT ); Tue, 6 Jun 2023 15:05:00 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 485C01BD2 for ; Tue, 6 Jun 2023 12:04:38 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id 98e67ed59e1d1-25669acf204so5848050a91.2 for ; Tue, 06 Jun 2023 12:04:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078277; x=1688670277; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=l7m/+H3bbKiFYeplsxMS6SzLQH+bOrcqP3X6Nh5dkpY=; b=14V+ikDqPeP4Yv1OdmN8nd7QZ5MMjeGKWKaRSYFOZag9l1JWEXWB4ucnpLMnABlXZA EKyjA/Hep4b5XPLE3jo+/oAtPajUrqdw+KeK/h9os1d0usuctIgWW9M+esmzYCfBXl4x ASFKNceH4Dvr96zHhOZ0uM2fLtsPSrNxdz17wPPKIRFX8xmJyoeEp2adnxvZEKqN8X4Y OM4Rksx4vK290PDLeLyuR9aPGnO/Zd4sMVHFMKGC9zk+GBNtZUBayI8BLZ7Yas0LD8pw GLTAE9YptousowvGxGw0V5tbYjLuLpB0CEXyROZYUG+NBDy7fuPXlo1tE0nXFLN+WEBW wc/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078277; x=1688670277; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l7m/+H3bbKiFYeplsxMS6SzLQH+bOrcqP3X6Nh5dkpY=; b=mAI82/0EHYxQlSY1318L7Y/etLfeyJi7ZMLHGm5JAWn6zf6d/iVfjccHVAuUjgS5bR caeDlAS9aAqmbEU9eJTgpVdcMUocPDUlMa9/yZpPzCz/k+CzGjluoQJGxDgddbg+9K/8 08z+sAtFo5BXlZn15G/3pKDYhLpNEU1BtLsHaSCniZRhKZLAtcQ3vM5bph5FbjQRWDp6 8Q4lcAKQaD1JeUD7sjXED7lOFKkN8oDjSgiYQuOAryj/GVMkJIHYHHq94sfWGvu4ECJP Hqm14Y1vzk9vtggIfICNyf4bBSn23/W+GB8Q/XUAHtsZUaTU2Vh1Y9w+hv3E3fCKZQ3y AyzA== X-Gm-Message-State: AC+VfDynN0eQlTrRFSIkDJoRCRvYk8RLpdTspz9ymqtZGQRbaz2+9X8O 6QtsyDMzTmGLzS5E9WMaLs40Yrt/pafNp5urRw== X-Google-Smtp-Source: ACHHUZ400PPpzSqOQyBrlXwnbK9H57gIHhF9scqTSjoAqyx8OcXSiJ7wdR2icZ3OG98VTDSuHd/HQ93iA5Sxt1cqNg== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:90b:3003:b0:253:4808:7587 with SMTP id hg3-20020a17090b300300b0025348087587mr772798pjb.7.1686078277257; Tue, 06 Jun 2023 12:04:37 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:58 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <67168b07e8d4a0c714fce5f030671b376d8ca001.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 13/19] KVM: guest_mem: Refactor kvm_gmem fd creation to be in layers From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org First create a gmem inode, then create a gmem file using the inode, then install the file into an fd. Creating the file in layers separates inode concepts (struct kvm_gmem) from file concepts and makes cleaning up in stages neater. Signed-off-by: Ackerley Tng --- virt/kvm/guest_mem.c | 86 +++++++++++++++++++++++++------------------- 1 file changed, 50 insertions(+), 36 deletions(-) diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index 8708139822d3..2f69ef666871 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -375,41 +375,27 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, }; -static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags, - struct vfsmount *mnt) +static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u64 flags, + struct vfsmount *mnt) { + int err; + struct inode *inode; + struct kvm_gmem *gmem; const char *anon_name = "[kvm-gmem]"; const struct qstr qname = QSTR_INIT(anon_name, strlen(anon_name)); - struct kvm_gmem *gmem; - struct inode *inode; - struct file *file; - int fd, err; - - fd = get_unused_fd_flags(0); - if (fd < 0) - return fd; inode = alloc_anon_inode(mnt->mnt_sb); - if (IS_ERR(inode)) { - err = PTR_ERR(inode); - goto err_fd; - } + if (IS_ERR(inode)) + return inode; err = security_inode_init_security_anon(inode, &qname, NULL); if (err) goto err_inode; - file = alloc_file_pseudo(inode, mnt, "kvm-gmem", O_RDWR, &kvm_gmem_fops); - if (IS_ERR(file)) { - err = PTR_ERR(file); - goto err_inode; - } - + err = -ENOMEM; gmem = kzalloc(sizeof(*gmem), GFP_KERNEL); - if (!gmem) { - err = -ENOMEM; - goto err_file; - } + if (!gmem) + goto err_inode; xa_init(&gmem->bindings); @@ -426,24 +412,41 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags, mapping_set_large_folios(inode->i_mapping); mapping_set_unevictable(inode->i_mapping); - file->f_flags |= O_LARGEFILE; - file->f_mapping = inode->i_mapping; - file->private_data = gmem; - - fd_install(fd, file); - return fd; + return inode; -err_file: - fput(file); err_inode: iput(inode); -err_fd: - put_unused_fd(fd); - return err; + return ERR_PTR(err); +} + + +static struct file *kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags, + struct vfsmount *mnt) +{ + struct file *file; + struct inode *inode; + + inode = kvm_gmem_create_inode(kvm, size, flags, mnt); + if (IS_ERR(inode)) + return ERR_CAST(inode); + + file = alloc_file_pseudo(inode, mnt, "kvm-gmem", O_RDWR, &kvm_gmem_fops); + if (IS_ERR(file)) { + iput(inode); + return file; + } + + file->f_flags |= O_LARGEFILE; + file->f_mapping = inode->i_mapping; + file->private_data = inode->i_mapping->private_data; + + return file; } int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) { + int fd; + struct file *file; loff_t size = gmem->size; u64 flags = gmem->flags; @@ -462,7 +465,18 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) #endif } - return __kvm_gmem_create(kvm, size, flags, kvm_gmem_mnt); + fd = get_unused_fd_flags(0); + if (fd < 0) + return fd; + + file = kvm_gmem_create_file(kvm, size, flags, kvm_gmem_mnt); + if (IS_ERR(file)) { + put_unused_fd(fd); + return PTR_ERR(file); + } + + fd_install(fd, file); + return fd; } int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, From patchwork Tue Jun 6 19:03:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 069D7C7EE31 for ; Tue, 6 Jun 2023 19:07:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239429AbjFFTHE (ORCPT ); Tue, 6 Jun 2023 15:07:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239389AbjFFTFo (ORCPT ); Tue, 6 Jun 2023 15:05:44 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DC4C1FC1 for ; Tue, 6 Jun 2023 12:04:53 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1b011901b03so24400065ad.3 for ; Tue, 06 Jun 2023 12:04:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078279; x=1688670279; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wXM9JnRppggpXADZsSU7oZSsG9y4s1gBipK1U/D+HWI=; b=VWeSODpWaBqW4mQpMFylMZsGjqz9x2cJWtAJkF8pA6jBSLPJ/ks6MKmLjF0le7g30t JnmedBVXzjwth3hzxEtFKbLpgTPpii8AIYI3pjk/s5jbK19NEFa6oqTQ13vB92hVQiL8 ViOM5skTjTy0akJXm7D62mzzCwjuX5DC11svxqniq47zqTKpG8FGJ6kwksqVcEgjbsdr Q2K/qR6ANqCdLydEwlvpleG5wsmCc5RNcdR6s83XxomLHgCDxLDz6X0Nkjd7UZ2zKFsM UtKOTh+fe/f2a9gAwgpw1N4/+Q8OfnwiS90nL6ITis3W2ib2nbVxkWVf59u885V77XDE HbzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078279; x=1688670279; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wXM9JnRppggpXADZsSU7oZSsG9y4s1gBipK1U/D+HWI=; b=d9i2aif4CFd/5uAQYXGd210IpZ1+COLetkc2bM1w43jvU0WlNKNvlLUWufXu4LxlT5 v/NWEWaQHDhy6edYqkfr1P7+fz413Tay0RriND74CRoZmUT2Aoz/2CvIKUgj5zr97ZM+ Ms6kLciXzas57uWspThrryxtzpyS89iuRVNeCW+eahXb7gGyjd4QP/+nLn3RAef+dVz/ gDANB/NY4HZFcUicAuHLJbBJHADpWWD7twIeOtzIsM4mM49UvNCmaRoNUKFpr5MJoaKh hXF/t3LCOdCDgj5vIxFebXSiYslWPXpSSGc1YPDOwLRS0tKmT/ToK1KfhsJjsvn9emXq TXmA== X-Gm-Message-State: AC+VfDwcJzmLw6sWiOWJXHLx32WgdWlRKRKtgc6DOT0++QKRf1K3pN9t 7pknVm6YJZhIS6ZjdEXwzeNKcDiqXqkE7IFSGA== X-Google-Smtp-Source: ACHHUZ7ZdtfDieT/Oervi6LGRC4PY5HZ9dIjkRC61ZAb5O1fokMpY3zuDqGBxtoVMfmrEF+ZlDZD/kzUef5/lnoKEw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:902:7c03:b0:1b2:690:23ff with SMTP id x3-20020a1709027c0300b001b2069023ffmr991531pll.11.1686078279112; Tue, 06 Jun 2023 12:04:39 -0700 (PDT) Date: Tue, 6 Jun 2023 19:03:59 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <44ac051fab6315161905949caefed78381b6c5fe.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 14/19] KVM: guest_mem: Refactor cleanup to separate inode and file cleanup From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Cleanup in kvm_gmem_release() should be the reverse of kvm_gmem_create_file(). Cleanup in kvm_gmem_evict_inode() should be the reverse of kvm_gmem_create_inode(). Signed-off-by: Ackerley Tng --- virt/kvm/guest_mem.c | 105 +++++++++++++++++++++++++++++-------------- 1 file changed, 71 insertions(+), 34 deletions(-) diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index 2f69ef666871..13253af40be6 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -247,42 +247,13 @@ static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset, static int kvm_gmem_release(struct inode *inode, struct file *file) { - struct kvm_gmem *gmem = inode->i_mapping->private_data; - struct kvm_memory_slot *slot; - struct kvm *kvm = gmem->kvm; - unsigned long index; - /* - * Prevent concurrent attempts to *unbind* a memslot. This is the last - * reference to the file and thus no new bindings can be created, but - * deferencing the slot for existing bindings needs to be protected - * against memslot updates, specifically so that unbind doesn't race - * and free the memslot (kvm_gmem_get_file() will return NULL). + * This is called when the last reference to the file is released. Only + * clean up file-related stuff. struct kvm_gmem is also referred to in + * the inode, so clean that up in kvm_gmem_evict_inode(). */ - mutex_lock(&kvm->slots_lock); - - xa_for_each(&gmem->bindings, index, slot) - rcu_assign_pointer(slot->gmem.file, NULL); - - synchronize_rcu(); - - /* - * All in-flight operations are gone and new bindings can be created. - * Free the backing memory, and more importantly, zap all SPTEs that - * pointed at this file. - */ - kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul); - truncate_inode_pages_final(file->f_mapping); - kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul); - - mutex_unlock(&kvm->slots_lock); - - WARN_ON_ONCE(!(mapping_empty(file->f_mapping))); - - xa_destroy(&gmem->bindings); - kfree(gmem); - - kvm_put_kvm(kvm); + file->f_mapping = NULL; + file->private_data = NULL; return 0; } @@ -603,11 +574,77 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, } EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); +static void kvm_gmem_evict_inode(struct inode *inode) +{ + struct kvm_gmem *gmem = inode->i_mapping->private_data; + struct kvm_memory_slot *slot; + struct kvm *kvm; + unsigned long index; + + /* + * If iput() was called before inode is completely set up due to some + * error in kvm_gmem_create_inode(), gmem will be NULL. + */ + if (!gmem) + goto basic_cleanup; + + kvm = gmem->kvm; + + /* + * Prevent concurrent attempts to *unbind* a memslot. This is the last + * reference to the file and thus no new bindings can be created, but + * deferencing the slot for existing bindings needs to be protected + * against memslot updates, specifically so that unbind doesn't race + * and free the memslot (kvm_gmem_get_file() will return NULL). + */ + mutex_lock(&kvm->slots_lock); + + xa_for_each(&gmem->bindings, index, slot) + rcu_assign_pointer(slot->gmem.file, NULL); + + synchronize_rcu(); + + /* + * All in-flight operations are gone and new bindings can be created. + * Free the backing memory, and more importantly, zap all SPTEs that + * pointed at this file. + */ + kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul); + truncate_inode_pages_final(inode->i_mapping); + kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul); + + mutex_unlock(&kvm->slots_lock); + + WARN_ON_ONCE(!(mapping_empty(inode->i_mapping))); + + xa_destroy(&gmem->bindings); + kfree(gmem); + + kvm_put_kvm(kvm); + +basic_cleanup: + clear_inode(inode); +} + +static const struct super_operations kvm_gmem_super_operations = { + /* + * TODO update statfs handler for kvm_gmem. What should the statfs + * handler return? + */ + .statfs = simple_statfs, + .evict_inode = kvm_gmem_evict_inode, +}; + static int kvm_gmem_init_fs_context(struct fs_context *fc) { + struct pseudo_fs_context *ctx; + if (!init_pseudo(fc, GUEST_MEMORY_MAGIC)) return -ENOMEM; + ctx = fc->fs_private; + ctx->ops = &kvm_gmem_super_operations; + return 0; } From patchwork Tue Jun 6 19:04:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269624 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA794C77B73 for ; Tue, 6 Jun 2023 19:07:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239422AbjFFTHZ (ORCPT ); Tue, 6 Jun 2023 15:07:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239397AbjFFTGD (ORCPT ); Tue, 6 Jun 2023 15:06:03 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1C2C1FD4 for ; Tue, 6 Jun 2023 12:04:56 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1b03f9dfd52so25771955ad.3 for ; Tue, 06 Jun 2023 12:04:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078280; x=1688670280; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qPbJS0NeJm0ebjWuWAGTg4zHB+jUkimTjx0FXzxIc84=; b=08oQhpNB5bPsCTiYZb6zzITcl3NrY7Nrj0gHP7CL4wEm89iCZHqEqZvXMnagv03rkn GPexWIBMA+li5aiwDX4ihlFm6cVZOeax1+AHpQmOU2YD/QwBli/TC1j7FXVthdfotrbj ohAd8tgwk7CZ9llZkCvOzoa+H0uBPFBXEcWSK2TcUhtWGacdRDhynTXSQrFRtxSj03ll +4JVjB3yWPlWCPldFPbqjqq8D3jeR9j4nRL0TBHcuQBOxSh7kt2ZiDSE77C+EmED3XTt Cx6sMOv8frrdr7pYnwu4yVAynavbLFmrX0Dakhf26NBqNaaItWBILzjwUpL7K2OiHx+L mttg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078280; x=1688670280; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qPbJS0NeJm0ebjWuWAGTg4zHB+jUkimTjx0FXzxIc84=; b=MI80zQlM3T8vzgRv2k2YaU+t0DHmN3vzJDJkEsxXihp4k2aRenZNzJ01Ia0r7ah5Rp DuG/HTBrFxObupWNbNxvVHoguZXban19kljFSzGeUSmgLfQ6nSplZhI0amJ+QLPDbuu8 Ai5FMxG+G2sIkbURXOMoD3kK31fduhRYME6/BvGj1SLU6mlX96inZVECwu5CAlmKIIKf L7Z399cHwyj5N4umebmx0TcLtCTtTZpGASyuouQNycP3SQzdcVu2qmrdPVAxAlMjynDj BDgyesirp6p5+94qIsYwubVzKo9wPWVYYbHCSUknRxH1XvmigazqE5NeZOsBZJnnlRVj 7Ttg== X-Gm-Message-State: AC+VfDyRzjHPW02BecrZG1wDQ/ziRy1/xceWeSR2VyqXQkhbmdz6/7ae yUGCK44DEZkNTeimmAm4XxqqomC8n33VAKr6/g== X-Google-Smtp-Source: ACHHUZ7+W5XWqHy7WGLlye5fNuolaabE941Ydb/Msxa478YJYlgHnPlrM8+Gzs9sRjg5WLNJlGVtgTPDm/CmJ9Gc2A== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:902:da8c:b0:1b0:4b1d:26e1 with SMTP id j12-20020a170902da8c00b001b04b1d26e1mr956600plx.8.1686078280564; Tue, 06 Jun 2023 12:04:40 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:00 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 15/19] KVM: guest_mem: hugetlb: initialization and cleanup From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org First stage of hugetlb support: add initialization and cleanup routines Signed-off-by: Ackerley Tng --- include/uapi/linux/kvm.h | 25 ++++++++++++ virt/kvm/guest_mem.c | 88 +++++++++++++++++++++++++++++++++++++--- 2 files changed, 108 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0fa665e8862a..1df0c802c29f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -13,6 +13,7 @@ #include #include #include +#include #define KVM_API_VERSION 12 @@ -2280,6 +2281,30 @@ struct kvm_memory_attributes { #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) #define KVM_GUEST_MEMFD_HUGE_PMD (1ULL << 0) +#define KVM_GUEST_MEMFD_HUGETLB (1ULL << 1) + +/* + * Huge page size encoding when KVM_GUEST_MEMFD_HUGETLB is specified, and a huge + * page size other than the default is desired. See hugetlb_encode.h. All + * known huge page size encodings are provided here. It is the responsibility + * of the application to know which sizes are supported on the running system. + * See mmap(2) man page for details. + */ +#define KVM_GUEST_MEMFD_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT +#define KVM_GUEST_MEMFD_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK + +#define KVM_GUEST_MEMFD_HUGE_64KB HUGETLB_FLAG_ENCODE_64KB +#define KVM_GUEST_MEMFD_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB +#define KVM_GUEST_MEMFD_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB +#define KVM_GUEST_MEMFD_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB +#define KVM_GUEST_MEMFD_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB +#define KVM_GUEST_MEMFD_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB +#define KVM_GUEST_MEMFD_HUGE_32MB HUGETLB_FLAG_ENCODE_32MB +#define KVM_GUEST_MEMFD_HUGE_256MB HUGETLB_FLAG_ENCODE_256MB +#define KVM_GUEST_MEMFD_HUGE_512MB HUGETLB_FLAG_ENCODE_512MB +#define KVM_GUEST_MEMFD_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB +#define KVM_GUEST_MEMFD_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB +#define KVM_GUEST_MEMFD_HUGE_16GB HUGETLB_FLAG_ENCODE_16GB struct kvm_create_guest_memfd { __u64 size; diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index 13253af40be6..b533143e2878 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -19,6 +19,7 @@ #include #include #include +#include #include @@ -30,6 +31,11 @@ struct kvm_gmem { struct kvm *kvm; u64 flags; struct xarray bindings; + struct { + struct hstate *h; + struct hugepage_subpool *spool; + struct resv_map *resv_map; + } hugetlb; }; static loff_t kvm_gmem_get_size(struct file *file) @@ -346,6 +352,46 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, }; +static int kvm_gmem_hugetlb_setup(struct inode *inode, struct kvm_gmem *gmem, + loff_t size, u64 flags) +{ + int page_size_log; + int hstate_idx; + long hpages; + struct resv_map *resv_map; + struct hugepage_subpool *spool; + struct hstate *h; + + page_size_log = (flags >> KVM_GUEST_MEMFD_HUGE_SHIFT) & KVM_GUEST_MEMFD_HUGE_MASK; + hstate_idx = get_hstate_idx(page_size_log); + if (hstate_idx < 0) + return -ENOENT; + + h = &hstates[hstate_idx]; + /* Round up to accommodate size requests that don't align with huge pages */ + hpages = round_up(size, huge_page_size(h)) >> huge_page_shift(h); + spool = hugepage_new_subpool(h, hpages, hpages); + if (!spool) + goto out; + + resv_map = resv_map_alloc(); + if (!resv_map) + goto out_subpool; + + inode->i_blkbits = huge_page_shift(h); + + gmem->hugetlb.h = h; + gmem->hugetlb.spool = spool; + gmem->hugetlb.resv_map = resv_map; + + return 0; + +out_subpool: + kfree(spool); +out: + return -ENOMEM; +} + static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u64 flags, struct vfsmount *mnt) { @@ -368,6 +414,12 @@ static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u64 fla if (!gmem) goto err_inode; + if (flags & KVM_GUEST_MEMFD_HUGETLB) { + err = kvm_gmem_hugetlb_setup(inode, gmem, size, flags); + if (err) + goto err_gmem; + } + xa_init(&gmem->bindings); kvm_get_kvm(kvm); @@ -385,6 +437,8 @@ static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u64 fla return inode; +err_gmem: + kfree(gmem); err_inode: iput(inode); return ERR_PTR(err); @@ -414,6 +468,8 @@ static struct file *kvm_gmem_create_file(struct kvm *kvm, loff_t size, u64 flags return file; } +#define KVM_GUEST_MEMFD_ALL_FLAGS (KVM_GUEST_MEMFD_HUGE_PMD | KVM_GUEST_MEMFD_HUGETLB) + int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) { int fd; @@ -424,8 +480,15 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) if (size < 0 || !PAGE_ALIGNED(size)) return -EINVAL; - if (flags & ~KVM_GUEST_MEMFD_HUGE_PMD) - return -EINVAL; + if (!(flags & KVM_GUEST_MEMFD_HUGETLB)) { + if (flags & ~(unsigned int)KVM_GUEST_MEMFD_ALL_FLAGS) + return -EINVAL; + } else { + /* Allow huge page size encoding in flags. */ + if (flags & ~(unsigned int)(KVM_GUEST_MEMFD_ALL_FLAGS | + (KVM_GUEST_MEMFD_HUGE_MASK << KVM_GUEST_MEMFD_HUGE_SHIFT))) + return -EINVAL; + } if (flags & KVM_GUEST_MEMFD_HUGE_PMD) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -610,7 +673,17 @@ static void kvm_gmem_evict_inode(struct inode *inode) * pointed at this file. */ kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul); - truncate_inode_pages_final(inode->i_mapping); + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) { + truncate_inode_pages_final_prepare(inode->i_mapping); + remove_mapping_hugepages( + inode->i_mapping, gmem->hugetlb.h, gmem->hugetlb.spool, + gmem->hugetlb.resv_map, inode, 0, LLONG_MAX); + + resv_map_release(&gmem->hugetlb.resv_map->refs); + hugepage_put_subpool(gmem->hugetlb.spool); + } else { + truncate_inode_pages_final(inode->i_mapping); + } kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul); mutex_unlock(&kvm->slots_lock); @@ -688,10 +761,15 @@ bool kvm_gmem_check_alignment(const struct kvm_userspace_memory_region2 *mem) { size_t page_size; - if (mem->flags & KVM_GUEST_MEMFD_HUGE_PMD) + if (mem->flags & KVM_GUEST_MEMFD_HUGETLB) { + size_t page_size_log = ((mem->flags >> KVM_GUEST_MEMFD_HUGE_SHIFT) + & KVM_GUEST_MEMFD_HUGE_MASK); + page_size = 1UL << page_size_log; + } else if (mem->flags & KVM_GUEST_MEMFD_HUGE_PMD) { page_size = HPAGE_PMD_SIZE; - else + } else { page_size = PAGE_SIZE; + } return (IS_ALIGNED(mem->gmem_offset, page_size) && IS_ALIGNED(mem->memory_size, page_size)); From patchwork Tue Jun 6 19:04:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49CF4C7EE2C for ; Tue, 6 Jun 2023 19:07:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233485AbjFFTHj (ORCPT ); Tue, 6 Jun 2023 15:07:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238606AbjFFTGP (ORCPT ); Tue, 6 Jun 2023 15:06:15 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 403951FDC for ; Tue, 6 Jun 2023 12:04:59 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bb39316a68eso1899788276.0 for ; Tue, 06 Jun 2023 12:04:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078282; x=1688670282; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZBX6a15Tnsz/uKq26LcenmelaNrPaeGwC3t0ZYHlu0o=; b=SiIGZBUhZs629JysZEAyyBg7ttu9GLTQayTGdSZGve4dLO6zhBZcJhFcd1G73mhuek LHQGWYqLVU6XGceDTMr/+ObLmxrXdmeyPE4htH53qamVc0jx3GxeXvlfVB9JCraAIS1o KEzaHjtn2Qo3RoZRS3XAY/vwbMPt0gdnui/cnheq8vpMkjsxIO1Je1R+Or+Zif0wezPw uUYILrxrg8kwyk8/xo82WMddoKWumnm5vCXiz84rf/1WA32vat2RPpMWIr80UC3bv5gu +n3kSoBT8FJFvccnZi21mVuxGSb5VrWJMuxYUlyl2Etedh0yvhauw6W7NlVmyqj5TDoC ZVSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078282; x=1688670282; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZBX6a15Tnsz/uKq26LcenmelaNrPaeGwC3t0ZYHlu0o=; b=PMrGVYQeZqYXi/cgJhwODDH+SSf5PWnYXUPTIhu+pbuo2jiNZkLeMgfXNuB8WjUy6g +mY7JbLhWLf1n8XiHAEAk+XRcaPVmYegBqX0d3KNe+26mQy4Z+1psayuBI+EVZCNn/dz SDjn80VMuwntiCHEBHQ+6H/vs0PJrusJIuMyDk2X7vKXhX1fJ0bBI387cW5qn6GwjBLN wwcdfsay+QeA+RR8FVfZaCdgEw6NF+ohpQFkRnw0Y/fTNU0ayt7wrf9kpTHUrBBlkavt 4QTjT7UIA4VtYSIoFecAh82WeDDueQGy7/vSnP28TPet2tPAQ/Ur74lYkNJBgs5tvwzB zQyg== X-Gm-Message-State: AC+VfDz10kRo2lOPX7exQ+cZcWKMt2B9arm47v46teGuH8R1qzd6Afmd inG1sRi55C/8Gi/+vsoC1fXLBGVscZ5yCixjdA== X-Google-Smtp-Source: ACHHUZ5Cpne/EIdT1I7Li9OeDmgR0RpPGqTGa8YIhJkObOQifrmAjwQv0QQeKIDO8N9Vq8SkzWw7xg09f24itua+cw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a25:2446:0:b0:b9a:703d:e650 with SMTP id k67-20020a252446000000b00b9a703de650mr1068675ybk.7.1686078282407; Tue, 06 Jun 2023 12:04:42 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:01 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <18e518695854cc7243866d7b1be2fbbb3aa87c71.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 16/19] KVM: guest_mem: hugetlb: allocate and truncate from hugetlb From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Introduce kvm_gmem_hugetlb_get_folio(), then update kvm_gmem_allocate() and kvm_gmem_truncate() to use hugetlb functions. Signed-off-by: Ackerley Tng --- virt/kvm/guest_mem.c | 215 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 188 insertions(+), 27 deletions(-) diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index b533143e2878..6271621f6b73 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -43,6 +43,95 @@ static loff_t kvm_gmem_get_size(struct file *file) return i_size_read(file_inode(file)); } +static struct folio *kvm_gmem_hugetlb_alloc_and_cache_folio( + struct file *file, pgoff_t hindex) +{ + int err; + struct folio *folio; + struct kvm_gmem *gmem; + struct hstate *h; + struct resv_map *resv_map; + unsigned long offset; + struct vm_area_struct pseudo_vma; + + gmem = file->private_data; + h = gmem->hugetlb.h; + resv_map = gmem->hugetlb.resv_map; + offset = hindex << huge_page_shift(h); + + vma_init(&pseudo_vma, NULL); + vm_flags_init(&pseudo_vma, VM_HUGETLB | VM_MAYSHARE | VM_SHARED); + /* vma infrastructure is dependent on vm_file being set */ + pseudo_vma.vm_file = file; + + /* TODO setup NUMA policy. Meanwhile, fallback to get_task_policy(). */ + pseudo_vma.vm_policy = NULL; + folio = alloc_hugetlb_folio_from_subpool( + gmem->hugetlb.spool, h, resv_map, &pseudo_vma, offset, 0); + /* Remember to take and drop refcount from vm_policy */ + if (IS_ERR(folio)) + return folio; + + /* + * FIXME: Skip clearing pages when trusted firmware will do it when + * assigning memory to the guest. + */ + clear_huge_page(&folio->page, offset, pages_per_huge_page(h)); + __folio_mark_uptodate(folio); + err = hugetlb_filemap_add_folio(file->f_mapping, h, folio, hindex); + if (unlikely(err)) { + restore_reserve_on_error(resv_map, hindex, true, folio); + folio_put(folio); + folio = ERR_PTR(err); + } + + return folio; +} + +/** + * Gets a hugetlb folio, from @file, at @index (in terms of PAGE_SIZE) within + * the file. + * + * The returned folio will be in @file's page cache, and locked. + */ +static struct folio *kvm_gmem_hugetlb_get_folio(struct file *file, pgoff_t index) +{ + struct folio *folio; + u32 hash; + /* hindex is in terms of huge_page_size(h) and not PAGE_SIZE */ + pgoff_t hindex; + struct kvm_gmem *gmem; + struct hstate *h; + struct address_space *mapping; + + gmem = file->private_data; + h = gmem->hugetlb.h; + hindex = index >> huge_page_order(h); + + mapping = file->f_mapping; + hash = hugetlb_fault_mutex_hash(mapping, hindex); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + + rcu_read_lock(); + folio = filemap_lock_folio(mapping, hindex); + rcu_read_unlock(); + if (folio) + goto folio_valid; + + folio = kvm_gmem_hugetlb_alloc_and_cache_folio(file, hindex); + /* + * TODO Perhaps the interface of kvm_gmem_get_folio should change to better + * report errors + */ + if (IS_ERR(folio)) + folio = NULL; + +folio_valid: + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + + return folio; +} + static struct folio *kvm_gmem_get_huge_folio(struct file *file, pgoff_t index) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -74,36 +163,56 @@ static struct folio *kvm_gmem_get_huge_folio(struct file *file, pgoff_t index) #endif } +/** + * Gets a folio, from @file, at @index (in terms of PAGE_SIZE) within the file. + * + * The returned folio will be in @file's page cache and locked. + */ static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index) { struct folio *folio; + struct kvm_gmem *gmem = file->private_data; - folio = kvm_gmem_get_huge_folio(file, index); - if (!folio) { - folio = filemap_grab_folio(file->f_mapping, index); + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) { + folio = kvm_gmem_hugetlb_get_folio(file, index); + + /* hugetlb gmem does not fall back to non-hugetlb pages */ if (!folio) return NULL; - } - /* - * TODO: Confirm this won't zero in-use pages, and skip clearing pages - * when trusted firmware will do it when assigning memory to the guest. - */ - if (!folio_test_uptodate(folio)) { - unsigned long nr_pages = folio_nr_pages(folio); - unsigned long i; + /* + * Don't need to clear pages because + * kvm_gmem_hugetlb_alloc_and_cache_folio() already clears pages + * when allocating + */ + } else { + folio = kvm_gmem_get_huge_folio(file, index); + if (!folio) { + folio = filemap_grab_folio(file->f_mapping, index); + if (!folio) + return NULL; + } - for (i = 0; i < nr_pages; i++) - clear_highpage(folio_page(folio, i)); - } + /* + * TODO: Confirm this won't zero in-use pages, and skip clearing pages + * when trusted firmware will do it when assigning memory to the guest. + */ + if (!folio_test_uptodate(folio)) { + unsigned long nr_pages = folio_nr_pages(folio); + unsigned long i; - /* - * filemap_grab_folio() uses FGP_ACCESSED, which already called - * folio_mark_accessed(), so we clear it. - * TODO: Should we instead be clearing this when truncating? - * TODO: maybe don't use FGP_ACCESSED at all and call __filemap_get_folio directly. - */ - folio_clear_referenced(folio); + for (i = 0; i < nr_pages; i++) + clear_highpage(folio_page(folio, i)); + } + + /* + * filemap_grab_folio() uses FGP_ACCESSED, which already called + * folio_mark_accessed(), so we clear it. + * TODO: Should we instead be clearing this when truncating? + * TODO: maybe don't use FGP_ACCESSED at all and call __filemap_get_folio directly. + */ + folio_clear_referenced(folio); + } /* * Indicate that this folio matches the backing store (in this case, has @@ -156,6 +265,44 @@ static void kvm_gmem_invalidate_end(struct kvm *kvm, struct kvm_gmem *gmem, KVM_MMU_UNLOCK(kvm); } +static void kvm_gmem_hugetlb_truncate_range(struct inode *inode, + loff_t offset, loff_t len) +{ + loff_t hsize; + loff_t full_hpage_start; + loff_t full_hpage_end; + struct kvm_gmem *gmem; + struct hstate *h; + struct address_space *mapping; + + mapping = inode->i_mapping; + gmem = mapping->private_data; + h = gmem->hugetlb.h; + hsize = huge_page_size(h); + full_hpage_start = round_up(offset, hsize); + full_hpage_end = round_down(offset + len, hsize); + + /* If range starts before first full page, zero partial page. */ + if (offset < full_hpage_start) { + hugetlb_zero_partial_page( + h, mapping, offset, min(offset + len, full_hpage_start)); + } + + /* Remove full pages from the file. */ + if (full_hpage_end > full_hpage_start) { + remove_mapping_hugepages(mapping, h, gmem->hugetlb.spool, + gmem->hugetlb.resv_map, inode, + full_hpage_start, full_hpage_end); + } + + + /* If range extends beyond last full page, zero partial page. */ + if ((offset + len) > full_hpage_end && (offset + len) > full_hpage_start) { + hugetlb_zero_partial_page( + h, mapping, full_hpage_end, offset + len); + } +} + static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len) { struct kvm_gmem *gmem = file->private_data; @@ -171,7 +318,10 @@ static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len) kvm_gmem_invalidate_begin(kvm, gmem, start, end); - truncate_inode_pages_range(file->f_mapping, offset, offset + len - 1); + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) + kvm_gmem_hugetlb_truncate_range(file_inode(file), offset, len); + else + truncate_inode_pages_range(file->f_mapping, offset, offset + len - 1); kvm_gmem_invalidate_end(kvm, gmem, start, end); @@ -183,6 +333,7 @@ static long kvm_gmem_punch_hole(struct file *file, loff_t offset, loff_t len) static long kvm_gmem_allocate(struct file *file, loff_t offset, loff_t len) { struct address_space *mapping = file->f_mapping; + struct kvm_gmem *gmem = file->private_data; pgoff_t start, index, end; int r; @@ -192,9 +343,14 @@ static long kvm_gmem_allocate(struct file *file, loff_t offset, loff_t len) filemap_invalidate_lock_shared(mapping); - start = offset >> PAGE_SHIFT; - /* Align so that at least 1 page is allocated */ - end = ALIGN(offset + len, PAGE_SIZE) >> PAGE_SHIFT; + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) { + start = offset >> huge_page_shift(gmem->hugetlb.h); + end = ALIGN(offset + len, huge_page_size(gmem->hugetlb.h)) >> PAGE_SHIFT; + } else { + start = offset >> PAGE_SHIFT; + /* Align so that at least 1 page is allocated */ + end = ALIGN(offset + len, PAGE_SIZE) >> PAGE_SHIFT; + } r = 0; for (index = start; index < end; ) { @@ -211,7 +367,7 @@ static long kvm_gmem_allocate(struct file *file, loff_t offset, loff_t len) break; } - index = folio_next_index(folio); + index += folio_nr_pages(folio); folio_unlock(folio); folio_put(folio); @@ -625,7 +781,12 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, return -ENOMEM; } - page = folio_file_page(folio, index); + /* + * folio_file_page() always returns the head page for hugetlb + * folios. Reimplement to get the page within this folio, even for + * hugetlb pages. + */ + page = folio_page(folio, index & (folio_nr_pages(folio) - 1)); *pfn = page_to_pfn(page); *order = thp_order(compound_head(page)); From patchwork Tue Jun 6 19:04:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85949C77B7A for ; Tue, 6 Jun 2023 19:07:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237862AbjFFTHw (ORCPT ); Tue, 6 Jun 2023 15:07:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238664AbjFFTGs (ORCPT ); Tue, 6 Jun 2023 15:06:48 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9870A10F2 for ; Tue, 6 Jun 2023 12:05:02 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-ba81b24b878so10281308276.3 for ; Tue, 06 Jun 2023 12:05:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078284; x=1688670284; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xVPr1dIpnxklN0GMh0WV40gxjr+XB/duCXqFm/4Eedk=; b=OpWI8RPhFgIw5MoV+RS2cmQpFpGPXsQqq6sU1K61/owOmBB1Ql0+w7FJ3nZldnf8p9 +qYBb0NtRqwQu8dWDMwOUTG0ATZVJvom27ARS7xgWW5TfboiScGSMfZ8N45qzVKLAY2H 4CU726nNQYrYfpWC/qzLp0r9J8us1SCC2lEqZrYJr9OdHyFq/TUwds9iZNdlTcobAHRr ZZALUwW7mEujbc7yxykj2/g4xPWE/5QaSH6KTAlpAQ02/2PJYX27kEua6tRoLiCjwPDV pyCH4FFtF/i8mbhld01B2RuTsSf6teTynYWziBlTcigs4W2oRa0ZqtYf/T+WFi+IMehY DIyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078284; x=1688670284; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xVPr1dIpnxklN0GMh0WV40gxjr+XB/duCXqFm/4Eedk=; b=b+FujZSPDY/Sh0KRK455cK6ijAqJiVPmWL5Xb9IuOoO9/aKBFwSReWWivKFn3WShgS kPelpBBh7jrrTgVgal45ttLBxFYce/94geAczZDpyRfiW+KSMS/TtFohgoDFb67PdGBW dAdctFEhXchetQ+7/v0CiM3/hYRE+0/l41xXwfFhEmMu4nKYzlF2nvN7X3l5lygtAV+V BnyAC/bcVDGGbSiDufTiav+wjN2Kvxn3NzptAlAqBHGmQPTUv9mDPyaxZZkU18qKezRu SmcyrbPT+iZdRvxxN9pX5Z/VUafZzit+4Fb8PF6RdZEDDpB/vogOBMDzsoFQJTIqMbPV xhKg== X-Gm-Message-State: AC+VfDyLJzcwALWyBfHdOqA9f+HTM3Wk/9zNh0fhHV1tVz2Rbja0z6dl 8EMxW4Hnunz8X8RjEsVpKyxpr7h0pOixbYadMQ== X-Google-Smtp-Source: ACHHUZ6iw5MaJvx6Biay/guXDPe54wqyOEjmtoFNQ8eDgpYM8NsdWofxvbuGdqaNO+FVNrUBL1ysZvJNR4qrxfEOBQ== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:72d:b0:ba1:d0:7f7c with SMTP id l13-20020a056902072d00b00ba100d07f7cmr1128681ybt.2.1686078284464; Tue, 06 Jun 2023 12:04:44 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:02 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <2b26bcc8b10f8a11e6405d4cea5f1235e82e83c9.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 17/19] KVM: selftests: Add basic selftests for hugetlbfs-backed guest_mem From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add tests for 2MB and 1GB page sizes. Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/guest_memfd_test.c | 33 ++++++++++++++----- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index 059b33cdecec..6e24631119c6 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -90,20 +90,14 @@ static void test_fallocate(int fd, size_t page_size, size_t total_size) TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed"); } - -int main(int argc, char *argv[]) +void test_guest_mem(struct kvm_vm *vm, uint32_t flags, size_t page_size) { - size_t page_size; - size_t total_size; int fd; - struct kvm_vm *vm; + size_t total_size; - page_size = getpagesize(); total_size = page_size * 4; - vm = vm_create_barebones(); - - fd = vm_create_guest_memfd(vm, total_size, 0); + fd = vm_create_guest_memfd(vm, total_size, flags); test_file_read_write(fd); test_mmap(fd, page_size); @@ -112,3 +106,24 @@ int main(int argc, char *argv[]) close(fd); } + +int main(int argc, char *argv[]) +{ + struct kvm_vm *vm = vm_create_barebones(); + + printf("Test guest mem 4K\n"); + test_guest_mem(vm, 0, getpagesize()); + printf(" PASSED\n"); + + printf("Test guest mem hugetlb 2M\n"); + test_guest_mem( + vm, KVM_GUEST_MEMFD_HUGETLB | KVM_GUEST_MEMFD_HUGE_2MB, 2UL << 20); + printf(" PASSED\n"); + + printf("Test guest mem hugetlb 1G\n"); + test_guest_mem( + vm, KVM_GUEST_MEMFD_HUGETLB | KVM_GUEST_MEMFD_HUGE_1GB, 1UL << 30); + printf(" PASSED\n"); + + return 0; +} From patchwork Tue Jun 6 19:04:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269626 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAF3AC7EE2C for ; Tue, 6 Jun 2023 19:07:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239085AbjFFTHq (ORCPT ); Tue, 6 Jun 2023 15:07:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239332AbjFFTHA (ORCPT ); Tue, 6 Jun 2023 15:07:00 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFF431FEB for ; Tue, 6 Jun 2023 12:05:05 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-bad475920a8so16488086276.1 for ; Tue, 06 Jun 2023 12:05:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078286; x=1688670286; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=o3QY+GjjNY/B/LKbvEsBmmion/NMqJeuEYcwlsx1w+c=; b=ga7yY2wFSjsTCtNYT2NMM1yjmokiEXDaCTuXNhL4QFIjeFOGWXP9m4xRNkNarZKOBt me9uyDk1llgo74NsZn/+85xOGgk1JymFHYuOx2vpupSEs8KBznt7C4GK7pW1At3io3MV omuLB1jzJaLc4Zg0yimBb4CzacnWix8fdVSxMBPt8YeYd8UktVq4LjjSoaTbCwhO+eXy EZa5FIg//RusZW7f2NgtTOFoqGNu6khof2qQQ8vk0r6gz2il2XRm/VN/dp/gxWzndj0K LOfFsFQnLTJrS+CT3R61MXxoPNTsaAUuUv0u7TZgtSvuJTPRds1QGGcjrMoDqzJSrIHy 9Thg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078286; x=1688670286; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=o3QY+GjjNY/B/LKbvEsBmmion/NMqJeuEYcwlsx1w+c=; b=PSPFWWupTM2FaM9vCCGgx1rt5Un+jLkHHzulwBFLMuTOZZ8zeFY5Aex29vvXvzS/bM TbBjMVcUqNXR4ytzzjbqxjSEEuV37ir0FV30W6t2iUDPK4LeK9lZBMYbtMHpbTseSzOO IKNZGHz9ONauVFgv5C8UCtlK9KNB73EpNLkSlsCesgYECe7JkCZb+MKZy4bvisvJ/o22 GJB7CQ70v9f2XOYVGqz5Wn3PeWDbKgnEZpQOcIEEhvbgU9COiBiwBP5iAQEZJPd16cmU vKMOWUg3mS4qfDYHypbgZ7Eeorqneugn1MHsJuXpQODF3em3btHLvOlYlcYC2KzWCtvZ BMbg== X-Gm-Message-State: AC+VfDw3SQd+kDxPXq/Z9NbWO/6AkPraQfYOgL0AVL+PMn0BtFwkfJ/o 1RE9mpHKA3VcIpAS0pyxoskRPCb4EsrZ13iETA== X-Google-Smtp-Source: ACHHUZ7zyWkKPX7nW+eqyXqGuqnjfZMxWtQ7sqNr3g3ix//f4WlCF2yUeIoJ+N8uNLRa6EBYQO+4YPcXyNwMiHRgqA== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a25:aea3:0:b0:bb3:9b99:f3f5 with SMTP id b35-20020a25aea3000000b00bb39b99f3f5mr1433391ybj.4.1686078286504; Tue, 06 Jun 2023 12:04:46 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:03 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <5f0d27ce06c03761974264bd8a890614ea7ecb32.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 18/19] KVM: selftests: Support various types of backing sources for private memory From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Adds support for various type of backing sources for private memory (in the sense of confidential computing), similar to the backing sources available for shared memory. Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/include/test_util.h | 14 ++++ tools/testing/selftests/kvm/lib/test_util.c | 74 +++++++++++++++++++ 2 files changed, 88 insertions(+) diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index a6e9f215ce70..899ea15ca8a9 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -122,6 +122,16 @@ struct vm_mem_backing_src_alias { uint32_t flag; }; +enum vm_pmem_backing_src_type { + VM_PMEM_SRC_GMEM, + VM_PMEM_SRC_HUGETLB, /* Use kernel default page size for hugetlb pages */ + VM_PMEM_SRC_HUGETLB_2MB, + VM_PMEM_SRC_HUGETLB_1GB, + NUM_PMEM_SRC_TYPES, +}; + +#define DEFAULT_VM_PMEM_SRC VM_PMEM_SRC_GMEM + #define MIN_RUN_DELAY_NS 200000UL bool thp_configured(void); @@ -132,6 +142,10 @@ size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); +void pmem_backing_src_help(const char *flag); +enum vm_pmem_backing_src_type parse_pmem_backing_src_type(const char *type_name); +const struct vm_mem_backing_src_alias *vm_pmem_backing_src_alias(uint32_t i); +size_t get_pmem_backing_src_pagesz(uint32_t i); long get_run_delay(void); /* diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index b772193f6c18..62efb7b8ba51 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -287,6 +288,34 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) return &aliases[i]; } +const struct vm_mem_backing_src_alias *vm_pmem_backing_src_alias(uint32_t i) +{ + static const struct vm_mem_backing_src_alias aliases[] = { + [VM_PMEM_SRC_GMEM] = { + .name = "pmem_gmem", + .flag = 0, + }, + [VM_PMEM_SRC_HUGETLB] = { + .name = "pmem_hugetlb", + .flag = KVM_GUEST_MEMFD_HUGETLB, + }, + [VM_PMEM_SRC_HUGETLB_2MB] = { + .name = "pmem_hugetlb_2mb", + .flag = KVM_GUEST_MEMFD_HUGETLB | KVM_GUEST_MEMFD_HUGE_2MB, + }, + [VM_PMEM_SRC_HUGETLB_1GB] = { + .name = "pmem_hugetlb_1gb", + .flag = KVM_GUEST_MEMFD_HUGETLB | KVM_GUEST_MEMFD_HUGE_1GB, + }, + }; + _Static_assert(ARRAY_SIZE(aliases) == NUM_PMEM_SRC_TYPES, + "Missing new backing private mem src types?"); + + TEST_ASSERT(i < NUM_PMEM_SRC_TYPES, "Private mem backing src type ID %d too big", i); + + return &aliases[i]; +} + #define MAP_HUGE_PAGE_SIZE(x) (1ULL << ((x >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK)) size_t get_backing_src_pagesz(uint32_t i) @@ -307,6 +336,20 @@ size_t get_backing_src_pagesz(uint32_t i) } } +size_t get_pmem_backing_src_pagesz(uint32_t i) +{ + uint32_t flag = vm_pmem_backing_src_alias(i)->flag; + + switch (i) { + case VM_PMEM_SRC_GMEM: + return getpagesize(); + case VM_PMEM_SRC_HUGETLB: + return get_def_hugetlb_pagesz(); + default: + return MAP_HUGE_PAGE_SIZE(flag); + } +} + bool is_backing_src_hugetlb(uint32_t i) { return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); @@ -343,6 +386,37 @@ enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name) return -1; } +static void print_available_pmem_backing_src_types(const char *prefix) +{ + int i; + + printf("%sAvailable private mem backing src types:\n", prefix); + + for (i = 0; i < NUM_PMEM_SRC_TYPES; i++) + printf("%s %s\n", prefix, vm_pmem_backing_src_alias(i)->name); +} + +void pmem_backing_src_help(const char *flag) +{ + printf(" %s: specify the type of memory that should be used to\n" + " back guest private memory. (default: %s)\n", + flag, vm_pmem_backing_src_alias(DEFAULT_VM_MEM_SRC)->name); + print_available_pmem_backing_src_types(" "); +} + +enum vm_pmem_backing_src_type parse_pmem_backing_src_type(const char *type_name) +{ + int i; + + for (i = 0; i < NUM_SRC_TYPES; i++) + if (!strcmp(type_name, vm_pmem_backing_src_alias(i)->name)) + return i; + + print_available_pmem_backing_src_types(""); + TEST_FAIL("Unknown private mem backing src type: %s", type_name); + return -1; +} + long get_run_delay(void) { char path[64]; From patchwork Tue Jun 6 19:04:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13269628 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D148AC7EE2A for ; Tue, 6 Jun 2023 19:08:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239417AbjFFTIO (ORCPT ); Tue, 6 Jun 2023 15:08:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239411AbjFFTHE (ORCPT ); Tue, 6 Jun 2023 15:07:04 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25D521FFB for ; Tue, 6 Jun 2023 12:05:10 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id d2e1a72fcca58-659bb123ccfso2697818b3a.0 for ; Tue, 06 Jun 2023 12:05:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078288; x=1688670288; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CCEiTTKrsebxDoXfSRZP/CN+WTuYdz3sOeWxyKUUyPY=; b=KFM+d/JKzgYvN5OR6kU44o+ZYxe7Q/H7MNTOnP/z0cxW3//uRTJeZxZ8dFp8Jit4Gf YquLlWVy9xCdFnBGp+gPFhLtGo2w3F50cPqXWf8JFYlZ/OGxawmSMJJ/jIE184rG6FHp ixTL8P/2ovgE/QdCx/NLmyiW9xL5hbY3itkmDOuPi4r2D/Bp6z5cypKETkTA5IN3CsDs 1vhuCHFDhZuON8z8NNnBNoq7He9ZNsDe7Xmjzq9wK+9M2GgIoXw9C4JhNu7KegBFAMS3 MULic01j2ol2uU6vWbv87Xaf10SomC0FL9lq9TfMGxJn5DnDmpEo+KEOdrVPsgYMpseZ BMRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078288; x=1688670288; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CCEiTTKrsebxDoXfSRZP/CN+WTuYdz3sOeWxyKUUyPY=; b=jKhDDHKCpKLcnMHlGSAD2ghosHzPGyfExYv4NFsd7ajuOQ812kJpghxziRZZo+XnNK E9Par/prgqzij3BrKLDo2Hbn+MA6I5j0yvoeYrEqbnbWNy36Pg9jtlg2Z1nMclxDd2kU xwRYxR/VhnRgeKzO/M5nZLIF05lm2PiAwQmx+Q9KQIex0ZQcziFBOqLd8Z6XthcDjRMg V6cHVMM/FJfxDBTEFsFWnfCL2H7KsUD1lB2wSfLqfPUIFGZBZUWer4Up9DqxbEVFNnrg KolcCbF6sXol+eENVjoPHdCdwaZmUKOB52YAOovBnKEkreIG62/xq5+fNhTgK/7IrRsr NSow== X-Gm-Message-State: AC+VfDwOqMsusCqmJSUw9qYoCnzOj/2EgwFObs3je8Cgo0g6FYpOCvgU YohznOw/PSWPXAATb3GQkoNfhJ6axeXzyDnxMg== X-Google-Smtp-Source: ACHHUZ5UZgVYlUWSYfljNriuMM/SFMypamtYGG6ajWbl9pU1XKFK+BBOu36l6aAPASb6U7Gvlg5sOoRl6fqEbUEr5Q== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:6a00:13a6:b0:657:f26e:b025 with SMTP id t38-20020a056a0013a600b00657f26eb025mr1318192pfg.6.1686078288430; Tue, 06 Jun 2023 12:04:48 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:04 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: <3ae2d02c45c5fb91b490b1674165c733efb871d6.1686077275.git.ackerleytng@google.com> Subject: [RFC PATCH 19/19] KVM: selftests: Update test for various private memory backing source types From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Update private_mem_conversions_test for various private memory backing source types Signed-off-by: Ackerley Tng --- .../kvm/x86_64/private_mem_conversions_test.c | 38 ++++++++++++++----- 1 file changed, 28 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c b/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c index 6a353cf64f52..27a7e5099b7b 100644 --- a/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c +++ b/tools/testing/selftests/kvm/x86_64/private_mem_conversions_test.c @@ -240,14 +240,15 @@ static void *__test_mem_conversions(void *__vcpu) } } -static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t nr_vcpus, - uint32_t nr_memslots) +static void test_mem_conversions(enum vm_mem_backing_src_type src_type, + enum vm_pmem_backing_src_type pmem_src_type, + uint32_t nr_vcpus, uint32_t nr_memslots) { - const size_t memfd_size = PER_CPU_DATA_SIZE * nr_vcpus; struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; pthread_t threads[KVM_MAX_VCPUS]; struct kvm_vm *vm; int memfd, i, r; + size_t pmem_aligned_size, memfd_size; size_t test_unit_size; const struct vm_shape shape = { @@ -270,21 +271,32 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t * Allocate enough memory so that each vCPU's chunk of memory can be * naturally aligned with respect to the size of the backing store. */ - test_unit_size = align_up(PER_CPU_DATA_SIZE, get_backing_src_pagesz(src_type)); + test_unit_size = align_up(PER_CPU_DATA_SIZE, + max(get_backing_src_pagesz(src_type), + get_pmem_backing_src_pagesz(pmem_src_type))); } - memfd = vm_create_guest_memfd(vm, memfd_size, 0); + pmem_aligned_size = PER_CPU_DATA_SIZE; + if (nr_memslots > 1) { + pmem_aligned_size = align_up(PER_CPU_DATA_SIZE, + get_pmem_backing_src_pagesz(pmem_src_type)); + } + + memfd_size = pmem_aligned_size * nr_vcpus; + memfd = vm_create_guest_memfd(vm, memfd_size, + vm_pmem_backing_src_alias(pmem_src_type)->flag); for (i = 0; i < nr_memslots; i++) { uint64_t gpa = BASE_DATA_GPA + i * test_unit_size; - uint64_t npages = PER_CPU_DATA_SIZE / vm->page_size; + uint64_t npages = pmem_aligned_size / vm->page_size; /* Make sure the memslot is large enough for all the test units */ if (nr_memslots == 1) npages *= nr_vcpus; + /* Offsets must be aligned to private mem's page size */ vm_mem_add(vm, src_type, gpa, BASE_DATA_SLOT + i, npages, - KVM_MEM_PRIVATE, memfd, PER_CPU_DATA_SIZE * i); + KVM_MEM_PRIVATE, memfd, pmem_aligned_size * i); } for (i = 0; i < nr_vcpus; i++) { @@ -324,10 +336,12 @@ static void test_mem_conversions(enum vm_mem_backing_src_type src_type, uint32_t static void usage(const char *cmd) { puts(""); - printf("usage: %s [-h] [-m] [-s mem_type] [-n nr_vcpus]\n", cmd); + printf("usage: %s [-h] [-m] [-s mem_type] [-p pmem_type] [-n nr_vcpus]\n", cmd); puts(""); backing_src_help("-s"); puts(""); + pmem_backing_src_help("-p"); + puts(""); puts(" -n: specify the number of vcpus (default: 1)"); puts(""); puts(" -m: use multiple memslots (default: 1)"); @@ -337,6 +351,7 @@ static void usage(const char *cmd) int main(int argc, char *argv[]) { enum vm_mem_backing_src_type src_type = DEFAULT_VM_MEM_SRC; + enum vm_pmem_backing_src_type pmem_src_type = DEFAULT_VM_PMEM_SRC; bool use_multiple_memslots = false; uint32_t nr_vcpus = 1; uint32_t nr_memslots; @@ -345,11 +360,14 @@ int main(int argc, char *argv[]) TEST_REQUIRE(kvm_has_cap(KVM_CAP_EXIT_HYPERCALL)); TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_PROTECTED_VM)); - while ((opt = getopt(argc, argv, "hms:n:")) != -1) { + while ((opt = getopt(argc, argv, "hms:p:n:")) != -1) { switch (opt) { case 's': src_type = parse_backing_src_type(optarg); break; + case 'p': + pmem_src_type = parse_pmem_backing_src_type(optarg); + break; case 'n': nr_vcpus = atoi_positive("nr_vcpus", optarg); break; @@ -365,7 +383,7 @@ int main(int argc, char *argv[]) nr_memslots = use_multiple_memslots ? nr_vcpus : 1; - test_mem_conversions(src_type, nr_vcpus, nr_memslots); + test_mem_conversions(src_type, pmem_src_type, nr_vcpus, nr_memslots); return 0; }