From patchwork Thu Oct 10 09:58:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 13829876 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAE82CF11C4 for ; Thu, 10 Oct 2024 09:58:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C98676B0085; Thu, 10 Oct 2024 05:58:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C22696B0088; Thu, 10 Oct 2024 05:58:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A277C6B0089; Thu, 10 Oct 2024 05:58:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7821C6B0085 for ; Thu, 10 Oct 2024 05:58:26 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 025DC160C8D for ; Thu, 10 Oct 2024 09:58:22 +0000 (UTC) X-FDA: 82657242612.13.1042E11 Received: from out30-113.freemail.mail.aliyun.com (out30-113.freemail.mail.aliyun.com [115.124.30.113]) by imf20.hostedemail.com (Postfix) with ESMTP id 309961C000B for ; Thu, 10 Oct 2024 09:58:22 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=MJOTpeqd; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf20.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.113 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728554153; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0/z9eme036ELYbZ0x9xP/AM4BDPvXGK6kIU0Y8j+uE0=; b=Ei8twEKJQk+lAhxNvR5T7hKxBnngYuad797Svp5PvaTP/tJX9PEwToF90dISBNkl1U8p1N d1E82NuLNaNB23FbU/PTCnb8sbKzpCGP/ZN3wjGN7CYjTHw9XEd9yNk+FZJ93BNgUSaeqp V+lhoULlVjDnTeDCQZGNtXZxA6zVCFk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728554153; a=rsa-sha256; cv=none; b=Oc6/Nm5Gs4CuQCC0ZedJM7Ns327iafuNFuS+5+uFdAwpuYt+VqgwNYfhHqjLecznBIcRDh aI1I14b1l6DyALdT+sIzZl6FlPKA4ZJ1NRHPK7eHGEn0sHubHSq91q1hNmERnKt8rB7V9Q NYSjRv7KpxwNs8SZUJGFcqY996AVxs4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=MJOTpeqd; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf20.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.113 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1728554301; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=0/z9eme036ELYbZ0x9xP/AM4BDPvXGK6kIU0Y8j+uE0=; b=MJOTpeqdUIM1gHVYFhoTu1LGb/DBkio11X/zqrYHztptKTVtt8qDSIIL1mo02PnzAL8nqZ3ELKhu7XrROgG/xjvoTwZ5+ySXagzRDEnOYoGi4QNBpxAwuPpxiTM75OyyQ8ppyUCMsCPKQ7LgrV+0C/rMrnSh7VXtNbIlKAxFyfg= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WGm.dgu_1728554300 cluster:ay36) by smtp.aliyun-inc.com; Thu, 10 Oct 2024 17:58:21 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v3 3/4] mm: shmem: add large folio support to the write and fallocate paths for tmpfs Date: Thu, 10 Oct 2024 17:58:13 +0800 Message-Id: <252c5999f8789d4f511e8e1466414238990f7e18.1728548374.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 309961C000B X-Stat-Signature: qeojqs18hsg9rfsbo99m7ggcpaymu5mf X-Rspam-User: X-HE-Tag: 1728554302-319305 X-HE-Meta: U2FsdGVkX19XamZqtHQyhvjjG2rl0TezbjoqhxMLTT4RaPB2HzBhF8Ibe+i/HPV25zLSBJxjKiCYy4bJgh+iVok54cSvA8xO2oVxKxx4F7P754Qj5NcL7DP5Q/AycWRhjVaNMRMsJkWZnH5X6PEua1+bcz6+GhtEfip2SgC+VkPVYEVlzQNSiPvOw8Yr/ruYWOVQnsrmA7Jc3qxjGQlaznClKjc7UoFzxuk8ywMSkCf+gd4vAxsaBbNIvtvdpnDCwWn9vk9mQaU+Q8miLENLYlN5bn8or+mjrOyK5YnMorTeWpHMKxlyRWt5fxTIpZSSQFibwc3i9wicImYmWsFn8AUG3eug0mV9W2XsGZIhO3+myTtlxhihQ2ikMlymQGrcCaejB0yS1HZdg1A0RMFRXpB5r8cM5mWChRv7o6oHkSyiXkR4mWQaBaJn5xmevwzYoIJtLpk7y+OETKzkhZddmSoROMCtiiP3F3tzyO+o4D1K6guE8KaOzKG83TshEPeJZ7UD5GF1nPp6foZoaPAYB3vkyCnrHlG0KSNHUMPDeKdKDbxDon90jYWxmdPENgR7OQpA14F5NNSpES54FNmfLzMsXlrRQcZIaUrj1/sa70XOiPrtjO/jNyoyoG8lArETwe3It5gJF6SZ/tAMoCbAt6sbKny04dYs1gBiK1XHpDztV1JbUFJVnrXrxZ2g4aF1SEvzShVEGS0F8c5UZhn6bjlc4/njhDE+A0Jw2IQkaLmer+lNae9+kQI8Kd5ffxXjw7NCPqmWLxpv1ip0kRalVGA4uotu8RN2hboBsvBk61WRmEPTnhfbrQ1uABKTuTx4IDr8pR4TbxT6Rspwl0zQ25XzZfa1Q34TxQ7v5y+dvs8kcrSMceiZq4wF/oNrE78Box7hUVAhDtc9L7Pxs1a5QPrLKUJpK912QSH0PlksV4rwTueMscbBC4maievHLqHJufKX0jxrcF6mbl5g3P5 CTkde77r STZhhcz06L6dlCpdKmooIHVwwFG2N5PJO2COi/hXLVoj+2ESiNPOnzoY8GuCstJ3hli7QzncaIdcs4e2XyTs+MjpXgY8okm1MI0HRtOzUFn65NUkwTDlC/KlB3i3c+SWsOxZ1022I0WB8UueMGooYeitdKCDFePSCq8yhnmEhRbXAJRkZ+rq3uMA31HpPmVkFkx4AyKcOB6yhCeIrxqzUDQmQiOKFKd+ftorvrX2jUJefXsr8SgYIXA0AS7Ei5yIPi3g8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add large folio support for tmpfs write and fallocate paths matching the same high order preference mechanism used in the iomap buffered IO path as used in __filemap_get_folio(). Add shmem_mapping_size_order() to get a hint for the order of the folio based on the file size which takes care of the mapping requirements. Considering that tmpfs already has the 'huge=' option to control the huge pages allocation, it is necessary to maintain compatibility with the 'huge=' option, as well as considering the 'deny' and 'force' option controlled by '/sys/kernel/mm/transparent_hugepage/shmem_enabled'. Add a new huge option 'write_size' to support large folio allocation based on the write size for tmpfs write and fallocate paths. So the huge pages allocation strategy for tmpfs is that, if the 'huge=' option (huge=always/within_size/advise) is enabled or the 'shmem_enabled' option is 'force', it need just allow PMD sized THP to keep backward compatibility for tmpfs. While 'huge=' option is disabled (huge=never) or the 'shmem_enabled' option is 'deny', it will still disable any large folio allocations. Only when the 'huge=' option is 'write_size', it will allow allocating large folios based on the write size. Co-developed-by: Daniel Gomez Signed-off-by: Daniel Gomez Signed-off-by: Baolin Wang --- mm/shmem.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 55 insertions(+), 7 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index f04935722457..66f1cf5b1645 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -523,12 +523,15 @@ static bool shmem_confirm_swap(struct address_space *mapping, * also respect fadvise()/madvise() hints; * SHMEM_HUGE_ADVISE: * only allocate huge pages if requested with fadvise()/madvise(); + * SHMEM_HUGE_WRITE_SIZE: + * only allocate huge pages based on the write size. */ #define SHMEM_HUGE_NEVER 0 #define SHMEM_HUGE_ALWAYS 1 #define SHMEM_HUGE_WITHIN_SIZE 2 #define SHMEM_HUGE_ADVISE 3 +#define SHMEM_HUGE_WRITE_SIZE 4 /* * Special values. @@ -548,12 +551,46 @@ static bool shmem_confirm_swap(struct address_space *mapping, static int shmem_huge __read_mostly = SHMEM_HUGE_NEVER; +/** + * shmem_mapping_size_order - Get maximum folio order for the given file size. + * @mapping: Target address_space. + * @index: The page index. + * @size: The suggested size of the folio to create. + * + * This returns a high order for folios (when supported) based on the file size + * which the mapping currently allows at the given index. The index is relevant + * due to alignment considerations the mapping might have. The returned order + * may be less than the size passed. + * + * Return: The order. + */ +static inline unsigned int +shmem_mapping_size_order(struct address_space *mapping, pgoff_t index, size_t size) +{ + unsigned int order; + + if (!mapping_large_folio_support(mapping)) + return 0; + + order = filemap_get_order(size); + if (!order) + return 0; + + /* If we're not aligned, allocate a smaller folio */ + if (index & ((1UL << order) - 1)) + order = __ffs(index); + + return min_t(size_t, order, MAX_PAGECACHE_ORDER); +} + static unsigned int __shmem_huge_global_enabled(struct inode *inode, pgoff_t index, loff_t write_end, bool shmem_huge_force, struct vm_area_struct *vma, unsigned long vm_flags) { struct mm_struct *mm = vma ? vma->vm_mm : NULL; + unsigned int order; + size_t len; loff_t i_size; if (!S_ISREG(inode->i_mode)) @@ -568,6 +605,17 @@ static unsigned int __shmem_huge_global_enabled(struct inode *inode, pgoff_t ind switch (SHMEM_SB(inode->i_sb)->huge) { case SHMEM_HUGE_ALWAYS: return BIT(HPAGE_PMD_ORDER); + /* + * If the huge option is SHMEM_HUGE_WRITE_SIZE, it will allow + * getting a highest order hint based on the size of write and + * fallocate paths, then will try each allowable huge orders. + */ + case SHMEM_HUGE_WRITE_SIZE: + if (!write_end) + return 0; + len = write_end - (index << PAGE_SHIFT); + order = shmem_mapping_size_order(inode->i_mapping, index, len); + return order > 0 ? BIT(order + 1) - 1 : 0; case SHMEM_HUGE_WITHIN_SIZE: index = round_up(index + 1, HPAGE_PMD_NR); i_size = max(write_end, i_size_read(inode)); @@ -624,6 +672,8 @@ static const char *shmem_format_huge(int huge) return "always"; case SHMEM_HUGE_WITHIN_SIZE: return "within_size"; + case SHMEM_HUGE_WRITE_SIZE: + return "write_size"; case SHMEM_HUGE_ADVISE: return "advise"; case SHMEM_HUGE_DENY: @@ -1694,13 +1744,9 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, global_order = shmem_huge_global_enabled(inode, index, write_end, shmem_huge_force, vma, vm_flags); - if (!vma || !vma_is_anon_shmem(vma)) { - /* - * For tmpfs, we now only support PMD sized THP if huge page - * is enabled, otherwise fallback to order 0. - */ + /* Tmpfs huge pages allocation? */ + if (!vma || !vma_is_anon_shmem(vma)) return global_order; - } /* * Following the 'deny' semantics of the top level, force the huge @@ -2851,7 +2897,8 @@ static struct inode *__shmem_get_inode(struct mnt_idmap *idmap, cache_no_acl(inode); if (sbinfo->noswap) mapping_set_unevictable(inode->i_mapping); - mapping_set_large_folios(inode->i_mapping); + if (sbinfo->huge) + mapping_set_large_folios(inode->i_mapping); switch (mode & S_IFMT) { default: @@ -4224,6 +4271,7 @@ static const struct constant_table shmem_param_enums_huge[] = { {"always", SHMEM_HUGE_ALWAYS }, {"within_size", SHMEM_HUGE_WITHIN_SIZE }, {"advise", SHMEM_HUGE_ADVISE }, + {"write_size", SHMEM_HUGE_WRITE_SIZE }, {} };