From patchwork Wed Jan 29 22:41:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frank van der Linden X-Patchwork-Id: 13954224 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 123D5C0218D for ; Wed, 29 Jan 2025 22:43:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 84F88440147; Wed, 29 Jan 2025 17:43:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FEB828026C; Wed, 29 Jan 2025 17:43:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B734440147; Wed, 29 Jan 2025 17:43:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3686328026C for ; Wed, 29 Jan 2025 17:43:02 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DAC8180493 for ; Wed, 29 Jan 2025 22:43:01 +0000 (UTC) X-FDA: 83061966162.11.5B46F36 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf11.hostedemail.com (Postfix) with ESMTP id 155E140008 for ; Wed, 29 Jan 2025 22:42:59 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=VfmqcxKG; spf=pass (imf11.hostedemail.com: domain of 38q6aZwQKCP0k0iqlttlqj.htrqnsz2-rrp0fhp.twl@flex--fvdl.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=38q6aZwQKCP0k0iqlttlqj.htrqnsz2-rrp0fhp.twl@flex--fvdl.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738190580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=L0ZG8uhT3OL4gz53zsZU4uOQev18QCwSpJ1z8Lyz6lA=; b=j03UrDlgJMJieR4XQdXIIrDLC7LDC0nOxo5V64laXN84yadH2Tv0izXWuMiC4aHSNWggOm +v63lkquDyVy+AstRMdTSvMx/We7mdALjaFILkOwQ1ZRcJR4YNfV0y9VaP62OT3LJtv8Tb sm1jHLprYUNxFW/1bDDKL6DKrkm02RM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=VfmqcxKG; spf=pass (imf11.hostedemail.com: domain of 38q6aZwQKCP0k0iqlttlqj.htrqnsz2-rrp0fhp.twl@flex--fvdl.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=38q6aZwQKCP0k0iqlttlqj.htrqnsz2-rrp0fhp.twl@flex--fvdl.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738190580; a=rsa-sha256; cv=none; b=FfTSjjWC+RerspEPmlal1bTyPG6lufYKvpH4sH6yzCsjEeILviyck1CMSXnhMScBi87EA3 qd20XqPqTJa3+rEmKOGJKpD4VNWFVpBYD3a15eBJXBZXzZCNsXZL4HgPW/PTlCKVLd775E gru4UsZQ1cMDGbsOhTn5eeqDdswWnGI= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ef9da03117so318337a91.1 for ; Wed, 29 Jan 2025 14:42:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738190579; x=1738795379; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=L0ZG8uhT3OL4gz53zsZU4uOQev18QCwSpJ1z8Lyz6lA=; b=VfmqcxKGdF0ZBCoQ245NUVoXFq/JwKnnTIakAE2wAHfd1L0tVg8p0i/GUouE9sR1Qz Vc1f5V7cvZlMpXk2lVnIYcmQLYQU2U7pBgS8b4rL1cR/AqsI4FwBB7Iha1TX7uPN7+M1 FpHdQvWaqnab0zuUd1KFGjAtgE9tLTpoqIRczSGGu15EKYMG5ymU26wFLlJbTxzkwpyr 9VJ2bdxNFmPzopI20mcpqBKKRHQZtOmitpXYYKiCuHXDUATVKUgDzhKXV7y9hH5d/dlB 8TcFuRzEgJE5Yhi0G6vqka/Ygd1GSGDTqecXUvIkL301W7YBY++i/fVred2hKk8g9+br 9KKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738190579; x=1738795379; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L0ZG8uhT3OL4gz53zsZU4uOQev18QCwSpJ1z8Lyz6lA=; b=e/4YGpyYXoFhNgsyWsDwL2U/XuoSnwqu3c5SSvMrCHpEnFEiUdfdMsNt7nESkI4a10 TP1ThEPvtj71Q/izFA/F6CXdNByWLt77st++kcU6yxGBdwbf+MHYsdYLc89e5PIZnR8j cUZWpAx7cJwuWag4RPzvbek8IFws7IqICKXPPnK/wSri+N9MJed3sVw4p8dGD3ZTGOEu B47POl+PDmmviWdn9BruAThHX55IlWwRKsY8hLZfEqIsIIi987g035/WSA51CS4/HzEp mvRfzlJAbgOygFB9jVon6oHs5RcGiOzYru2xgdc4gsY5bEGesCWuX0/7lTxuzB+xBqeb 9X/w== X-Forwarded-Encrypted: i=1; AJvYcCXPcr3+1hMegPP4aN+a6n5Av22PgqvkscFoY6eLNpDI8Lk3SI9JaSModNW1OVxAhCVNpBnpwMSIzw==@kvack.org X-Gm-Message-State: AOJu0YwSFvdBOueldMSSQ4yssMbafySznGmKVB03338oSIdqVhDTZV6w Uw2GDUD51F2iVolX/s2vXc7JtBKvNao2sSbUeknWKWaOHuIjVuEFX1cnIt8mtWI3Zk/wlw== X-Google-Smtp-Source: AGHT+IEuTmiv3H9rTX6kMAO3mmM4IDiqwCIIyWovC1QOoLLo+K5CwA7MUZZn4zxCiCikjDN8GtWu9oQz X-Received: from pfbcl4.prod.google.com ([2002:a05:6a00:32c4:b0:725:d033:af87]) (user=fvdl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3919:b0:725:df1a:282 with SMTP id d2e1a72fcca58-72fd0be3556mr6904654b3a.10.1738190578851; Wed, 29 Jan 2025 14:42:58 -0800 (PST) Date: Wed, 29 Jan 2025 22:41:56 +0000 In-Reply-To: <20250129224157.2046079-1-fvdl@google.com> Mime-Version: 1.0 References: <20250129224157.2046079-1-fvdl@google.com> X-Mailer: git-send-email 2.48.1.262.g85cc9f2d1e-goog Message-ID: <20250129224157.2046079-28-fvdl@google.com> Subject: [PATCH v2 27/28] mm/hugetlb: enable bootmem allocation from CMA areas From: Frank van der Linden To: akpm@linux-foundation.org, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: yuzhao@google.com, usamaarif642@gmail.com, joao.m.martins@oracle.com, roman.gushchin@linux.dev, Frank van der Linden , Madhavan Srinivasan , Michael Ellerman , linuxppc-dev@lists.ozlabs.org X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 155E140008 X-Stat-Signature: aka65g8b1pyf85dckq8f7161yxsq8g8g X-HE-Tag: 1738190579-918576 X-HE-Meta: U2FsdGVkX19qyU3kGkvXYIgordyN3ZPjwVd5voPzgh6G3KCD9CVG4qrzDxsAx/2bVxM2JFsvk2aX6+sCtFHZKDfz4ENYq1L2YVQv4MwMeRn9NkXVrYvS4liJZFpZvewyftvobikNUpN39xmIY+JDvSeqMGMGRduQa2x25Ldg3s8N2etfYtEC4TkKh5MOzmHnhnDuw4TNqr1PgslbQRtppVtbqa7KtHOEstmvaX0loqA+iTfpmNix+Kn6MvY4Vya4Z//RiqNItgTzI45nyu10udmaGuSsagLmpQU1cCO5ihU9wzY314tI01VLuFRiHrYriYSwgAEOZn+zx2S/ygkhDCjEYJrpP5VlegvjXefIC9lMwG7nyVKz40ALsRSX8Cg9hXIGaaC+MYtIb6pzXUtNxCZ8OSIZwDLvRajQSLvksRbIaswwusdY/vuj55m7RycuEzoKl3eZrinbTrtPMyGvmDpngo0hLJz+2U7vDfYUd7JZOyOWvgUhdvt4xIjKqeEjm+q9slllxHm0b8Kp+KMgpuPoohhzSD8HFrowyCj7HlyVnIN6v9RoSKVANgEG432U98AnB2rl6lgFm2XU2quTeZ06V7K4l28HWKeWnLNQV90opwGX4CRoLBwhnh40pnrXL/4p2rISdgnQdiTrGQrcZZrK45XOyrQjDAhho29anwLMHz0WhA8aED5hiMzzoHp9THTuKC2fI62g9EaGBBifn4UU/E7A0CnQR384HzBf6NbI7neImmaSgc15JaJLzCUD0bSpF4hCuZBxK0Y6r5xbqcc1ms7zr7YHFVUMHvugN73XjaNm8cN4HKwX3SEmezk3GCIxbkJPi6olDdh5+HxIUkXrmnRJ0n4mkgwaSLeptlwmWpMd5569r8Qdy9/QGYjZkubKPCd9dkkmOo2VOzaIt1Fq4AVjVj/JJTj/N03xiv3miDsxnERs9A/nrRG5t1A7Nus5F/gSTeTjkDfAa3+ Z+s2f2Vy z9gi1o/LGLLg99/9cmTHBu+c9e3+NaF0/lMGIt761OJDeUjsprik7Bf+3CFn7LDx1DpcHSjk3CnsdPCtZcyNA01c9LmH3JwQ/6mFnL3hZgXhTYSn7JdvHPPIYR9WKzMorKlc0Fl9uC13Nsyncc0QDOVkX3CoWxsBLMukrOyzcJfpNojlg9BeHZPK92xXJYfdUhFfa34Bc2olGPMjeCBMLVM9SqDMG3XR7V7/NsHU87VNS3iq9u7Xm4ERL1PJ5aF66tHbdxwG+JGwDSshFuIKspKfN9T3sFLSn1ELuBTtwT9oeQ0Veu3Ol9g+f2lssaKfobsw7lsColHEP1t9k7F9Lfa9FnOlYWnaKEc4nGbzQlH2ZXMZX2zojrM89MJ/eRIqH4vQg4mLstYsHiy2kBSvC6cNLqGWM8ClOrneVOYK4UnIcXkdGeaFVgqdnykEFaDkSQNov+m83ZdbFWldyQto3xV1/vHnMkB87FvDge+YriO6Ta9SGAT/yh/5B2vuaQZb4OwawFSlwsXVDBvQfXSji7rKUcDtHubskBcQbLGDHAjx5RHwyfM8gXiRLjSUS7Hqc5K2OWiurlJYCb4m92XZ/CuxwYkUtVXTUVze95dx275D7UnM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: If hugetlb_cma_only is enabled, we know that hugetlb pages can only be allocated from CMA. Now that there is an interface to do early reservations from a CMA area (returning memblock memory), it can be used to allocate hugetlb pages from CMA. This also allows for doing pre-HVO on these pages (if enabled). Make sure to initialize the page structures and associated data correctly. Create a flag to signal that a hugetlb page has been allocated from CMA to make things a little easier. Some configurations of powerpc have a special hugetlb bootmem allocator, so introduce a boolean arch_specific_huge_bootmem_alloc that returns true if such an allocator is present. In that case, CMA bootmem allocations can't be used, so check that function before trying. Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Frank van der Linden --- arch/powerpc/include/asm/book3s/64/hugetlb.h | 6 + include/linux/hugetlb.h | 17 +++ mm/hugetlb.c | 121 ++++++++++++++----- 3 files changed, 113 insertions(+), 31 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h b/arch/powerpc/include/asm/book3s/64/hugetlb.h index f0bba9c5f9c3..bb786694dd26 100644 --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h @@ -94,4 +94,10 @@ static inline int check_and_get_huge_psize(int shift) return mmu_psize; } +#define arch_has_huge_bootmem_alloc arch_has_huge_bootmem_alloc + +static inline bool arch_has_huge_bootmem_alloc(void) +{ + return (firmware_has_feature(FW_FEATURE_LPAR) && !radix_enabled()); +} #endif diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 2512463bca49..6c6546b54934 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -591,6 +591,7 @@ enum hugetlb_page_flags { HPG_freed, HPG_vmemmap_optimized, HPG_raw_hwp_unreliable, + HPG_cma, __NR_HPAGEFLAGS, }; @@ -650,6 +651,7 @@ HPAGEFLAG(Temporary, temporary) HPAGEFLAG(Freed, freed) HPAGEFLAG(VmemmapOptimized, vmemmap_optimized) HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable) +HPAGEFLAG(Cma, cma) #ifdef CONFIG_HUGETLB_PAGE @@ -678,14 +680,18 @@ struct hstate { char name[HSTATE_NAME_LEN]; }; +struct cma; + struct huge_bootmem_page { struct list_head list; struct hstate *hstate; unsigned long flags; + struct cma *cma; }; #define HUGE_BOOTMEM_HVO 0x0001 #define HUGE_BOOTMEM_ZONES_VALID 0x0002 +#define HUGE_BOOTMEM_CMA 0x0004 bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); @@ -823,6 +829,17 @@ static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, } #endif +#ifndef arch_has_huge_bootmem_alloc +/* + * Some architectures do their own bootmem allocation, so they can't use + * early CMA allocation. + */ +static inline bool arch_has_huge_bootmem_alloc(void) +{ + return false; +} +#endif + static inline struct hstate *folio_hstate(struct folio *folio) { VM_BUG_ON_FOLIO(!folio_test_hugetlb(folio), folio); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c227d0b9cf1e..5a3e9f7deaba 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -132,8 +132,10 @@ static void hugetlb_free_folio(struct folio *folio) #ifdef CONFIG_CMA int nid = folio_nid(folio); - if (cma_free_folio(hugetlb_cma[nid], folio)) + if (folio_test_hugetlb_cma(folio)) { + WARN_ON_ONCE(!cma_free_folio(hugetlb_cma[nid], folio)); return; + } #endif folio_put(folio); } @@ -1509,6 +1511,9 @@ static struct folio *alloc_gigantic_folio(struct hstate *h, gfp_t gfp_mask, break; } } + + if (folio) + folio_set_hugetlb_cma(folio); } #endif if (!folio) { @@ -3175,6 +3180,53 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, return ERR_PTR(-ENOSPC); } +static bool __init hugetlb_early_cma(struct hstate *h) +{ + if (arch_has_huge_bootmem_alloc()) + return false; + + return (hstate_is_gigantic(h) && hugetlb_cma_only); +} + +static __init void *alloc_bootmem(struct hstate *h, int nid) +{ + struct huge_bootmem_page *m; + unsigned long flags; + struct cma *cma; + +#ifdef CONFIG_CMA + if (hugetlb_early_cma(h)) { + flags = HUGE_BOOTMEM_CMA; + cma = hugetlb_cma[nid]; + m = cma_reserve_early(cma, huge_page_size(h)); + } else +#endif + { + flags = 0; + cma = NULL; + m = memblock_alloc_try_nid_raw(huge_page_size(h), + huge_page_size(h), 0, MEMBLOCK_ALLOC_ACCESSIBLE, nid); + } + + if (m) { + /* + * Use the beginning of the huge page to store the + * huge_bootmem_page struct (until gather_bootmem + * puts them into the mem_map). + * + * Put them into a private list first because mem_map + * is not up yet. + */ + INIT_LIST_HEAD(&m->list); + list_add(&m->list, &huge_boot_pages[nid]); + m->hstate = h; + m->flags = flags; + m->cma = cma; + } + + return m; +} + int alloc_bootmem_huge_page(struct hstate *h, int nid) __attribute__ ((weak, alias("__alloc_bootmem_huge_page"))); int __alloc_bootmem_huge_page(struct hstate *h, int nid) @@ -3184,17 +3236,14 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) /* do node specific alloc */ if (nid != NUMA_NO_NODE) { - m = memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h), - 0, MEMBLOCK_ALLOC_ACCESSIBLE, nid); + m = alloc_bootmem(h, node); if (!m) return 0; goto found; } /* allocate from next node when distributing huge pages */ for_each_node_mask_to_alloc(&h->next_nid_to_alloc, nr_nodes, node, &node_states[N_ONLINE]) { - m = memblock_alloc_try_nid_raw( - huge_page_size(h), huge_page_size(h), - 0, MEMBLOCK_ALLOC_ACCESSIBLE, node); + m = alloc_bootmem(h, node); if (m) break; } @@ -3203,7 +3252,6 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) return 0; found: - /* * Only initialize the head struct page in memmap_init_reserved_pages, * rest of the struct pages will be initialized by the HugeTLB @@ -3213,18 +3261,6 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) */ memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE), huge_page_size(h) - PAGE_SIZE); - /* - * Use the beginning of the huge page to store the - * huge_bootmem_page struct (until gather_bootmem - * puts them into the mem_map). - * - * Put them into a private list first because mem_map - * is not up yet. - */ - INIT_LIST_HEAD(&m->list); - list_add(&m->list, &huge_boot_pages[node]); - m->hstate = h; - m->flags = 0; return 1; } @@ -3265,13 +3301,25 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio, prep_compound_head((struct page *)folio, huge_page_order(h)); } +static bool __init hugetlb_bootmem_page_prehvo(struct huge_bootmem_page *m) +{ + return m->flags & HUGE_BOOTMEM_HVO; +} + +static bool __init hugetlb_bootmem_page_earlycma(struct huge_bootmem_page *m) +{ + return m->flags & HUGE_BOOTMEM_CMA; +} + /* * memblock-allocated pageblocks might not have the migrate type set * if marked with the 'noinit' flag. Set it to the default (MIGRATE_MOVABLE) - * here. + * here, or MIGRATE_CMA if this was a page allocated through an early CMA + * reservation. * - * Note that this will not write the page struct, it is ok (and necessary) - * to do this on vmemmap optimized folios. + * In case of vmemmap optimized folios, the tail vmemmap pages are mapped + * read-only, but that's ok - for sparse vmemmap this does not write to + * the page structure. */ static void __init hugetlb_bootmem_init_migratetype(struct folio *folio, struct hstate *h) @@ -3280,9 +3328,13 @@ static void __init hugetlb_bootmem_init_migratetype(struct folio *folio, WARN_ON_ONCE(!pageblock_aligned(folio_pfn(folio))); - for (i = 0; i < nr_pages; i += pageblock_nr_pages) - set_pageblock_migratetype(folio_page(folio, i), + for (i = 0; i < nr_pages; i += pageblock_nr_pages) { + if (folio_test_hugetlb_cma(folio)) + init_cma_pageblock(folio_page(folio, i)); + else + set_pageblock_migratetype(folio_page(folio, i), MIGRATE_MOVABLE); + } } static void __init prep_and_add_bootmem_folios(struct hstate *h, @@ -3328,10 +3380,16 @@ bool __init hugetlb_bootmem_page_zones_valid(int nid, return true; } + if (hugetlb_bootmem_page_earlycma(m)) { + valid = cma_validate_zones(m->cma); + goto out; + } + start_pfn = virt_to_phys(m) >> PAGE_SHIFT; valid = !pfn_range_intersects_zones(nid, start_pfn, pages_per_huge_page(m->hstate)); +out: if (!valid) hstate_boot_nrinvalid[hstate_index(m->hstate)]++; @@ -3360,11 +3418,6 @@ static void __init hugetlb_bootmem_free_invalid_page(int nid, struct page *page, } } -static bool __init hugetlb_bootmem_page_prehvo(struct huge_bootmem_page *m) -{ - return (m->flags & HUGE_BOOTMEM_HVO); -} - /* * Put bootmem huge pages into the standard lists after mem_map is up. * Note: This only applies to gigantic (order > MAX_PAGE_ORDER) pages. @@ -3414,6 +3467,9 @@ static void __init gather_bootmem_prealloc_node(unsigned long nid) */ folio_set_hugetlb_vmemmap_optimized(folio); + if (hugetlb_bootmem_page_earlycma(m)) + folio_set_hugetlb_cma(folio); + list_add(&folio->lru, &folio_list); /* @@ -3606,8 +3662,11 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) { unsigned long allocated; - /* skip gigantic hugepages allocation if hugetlb_cma enabled */ - if (hstate_is_gigantic(h) && hugetlb_cma_size) { + /* + * Skip gigantic hugepages allocation if early CMA + * reservations are not available. + */ + if (hstate_is_gigantic(h) && hugetlb_cma_size && !hugetlb_early_cma(h)) { pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n"); return; }