From patchwork Thu Jul 4 11:23:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8315C30653 for ; Thu, 4 Jul 2024 11:23:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65D286B00B9; Thu, 4 Jul 2024 07:23:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 60B246B00BA; Thu, 4 Jul 2024 07:23:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4ADD36B00BB; Thu, 4 Jul 2024 07:23:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2DE206B00B9 for ; Thu, 4 Jul 2024 07:23:37 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D4990411D7 for ; Thu, 4 Jul 2024 11:23:36 +0000 (UTC) X-FDA: 82301834832.22.FD01F7F Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf17.hostedemail.com (Postfix) with ESMTP id 07E3140018 for ; Thu, 4 Jul 2024 11:23:34 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=zbhlPMoo; spf=pass (imf17.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092182; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gGaRULCf/0Znuut1zwezXAjhqO6uznnYGs8trvEvW+Q=; b=CF87wZFF1fSDA9p/7zYyu+MINO/AwcrelCFvUYBSrdHcutP31A/lBWB+QBfO+nW0SSIGzG uD+iVSlhjfhz2KG25emEypn+79Von/K+JEFIl7YiRA/25alDIM5OXGSL3c1bNV/sIUP1Wd RM5H0eEEQncGEnqI/KP12RoQaxe5CrI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=zbhlPMoo; spf=pass (imf17.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092182; a=rsa-sha256; cv=none; b=fba1M6AEekGu9a8hdCWZS+ln1kx5WoAMsh77Ki1z390AG502lyo179A/CTt1UxCmypaZ4Y VAp31VjpeZi9+j7d6aoHFI/XgK6JKj4rU1CWpR57YtVjYTf3dsFqIHxXBIjkRt+svf5F3v +wQUun8FXABhiZtHKvevkGohs7BMg5Y= Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4WFDmM0m9Rz9tFw; Thu, 4 Jul 2024 13:23:31 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092211; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gGaRULCf/0Znuut1zwezXAjhqO6uznnYGs8trvEvW+Q=; b=zbhlPMoo+Kc6QqQkvpg3AiBhIoDzRSZKeRNc1tiK/e4y/vyiTtTBHWvN5uPaqpzwTBt9YY 8A0IILfpGeff7e8FiFjazSW2fcFcGbOMO32GnvXTjf/BoblJkkE4CJaLSEGik9WSZnPMHb TkqAiPR4b4xxE2Paf6qNUyUBEugSGPp8e+Ju816iPdKChC28FYDooBcHifawwYotKUEuE0 OsifWhMfKanNWR4595SZ158peVOCd/SsAWx9v/nooXSsghieIneC37aTEwN4zhS0FAUddp yWYX1c6Q+P5vIwGabzEsO7uooLW4v0SBXU4m2zzOQkwEyMqab8eSbPI9/TguWw== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v9 01/10] fs: Allow fine-grained control of folio sizes Date: Thu, 4 Jul 2024 11:23:11 +0000 Message-ID: <20240704112320.82104-2-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 07E3140018 X-Stat-Signature: ocwjmbpn57nnbk1coozj9f4zhojwgxtr X-Rspam-User: X-HE-Tag: 1720092214-630406 X-HE-Meta: U2FsdGVkX1/SciQfm6zDwHrIo90GGt3mvCDTClP/w8EbBrw5rmOT3ZOwhVjFW/zILUf4Yw9Uqx/NOkaBdFJ+qQlkvwaP8VKmhPYSt4kkjsz8coOTz/4axb4pbGyRMGGdIx+ZeCd7Kz8b5LSUXbWED3ja4xfeom/j0fFGDGrMEzqikf8otDcBFUEaXoR2jnb+VWck3eCdZ1+UUA/tbbQ3LUFoXNp5njz+5EH9taAgTdNAlLW6JGdx+KGbplAdbLfrZDlHDE+zWWp3oXLmkNgtuMBHNQdNhGvgkMHkraJ7kDA8WJQQs42GLx2SgRinP5TNjIoizx2ettfbX/pQBQpEU25vVErNUx27lC9XtgcwKv1xlGVrZNaFLSjCP4qjPYOrRfYmrvt1LFoHSxlCLqXlBdOY1T2coNNv9E+8VmYihDz1v38IajybMl83Fo1Caa44IpsX+bQebJ7j0ZofDe5A9h24M7JSXh20U7Wx35u+kOzjJhdnfYQNCBPKIjMRRy6aiGadBRGaXR9jlMN8dbktLb14U8kz9Jg8o5ZyC21HSL7yfqaVQv9U+uZMwh+DUe6WRAdVKiMMBtJfThZ4iMcsQi2ZvjBGNe5AlQUtYET5RBGX0SFqcEq7HD3x0oCEzHHiM5fLbrnO0ie1POg2oCZ84cYYMFrJTUARDFRvDPDyeimEt4wNbQtazd3KWuqD0jNSEPPVMrt4PENIrhMoWM1gt36HOcg1/PLl7pAr//SMmJe60OjRC3uHxLh3NVSGnD5BzRdzgibMrQmpXRt+/8bjlRLRaFcVp1KvTuhBgky1qz0qvws3i5tZeyFPuatANjt6ie4UoTVO5zFP2vXNuV0yIIXWeE8xcv699Oxp6c65KnhvJD6IaMIgI4Dxo9I1+liXnAAE1Qx5BWfxwhadaHngkNNmPnXLo/373I+m8I3jlD8YoJt3IftFLDmvWpWX4LhG6QETEbo7jcAoXUHjR5c UMasgb0G ghwQfaBuVtWkPUXiBKtop85XzkKN4stnmdIk45iJkXVqqtCeC/lXEfBqpkiBINN2tgSZggI4QB6VFR9ks/uIFl94WVxYMaC4RkUGjPstjTXu1f6hPrNggZxuxIKk526RifHlQTctAK/gWxOpLfaOFuCy/vkV9P4nOjGFfs8yaiRw+CAE16bDtiMsrpJX3f0mU5BVf5jwYJzyYcrseCRZOU+LgO12G78AnVoCEDyZOPJlc0KRYGjmXM+fzMFlJg8rQUm4q0Wj9CmPNXN4Wqgoi9SfYOgIdfYQJy7KHkNzaTDE199bZtZgWeV2kui2XETAMQpsAfieEWaJptzftWvsuV/eKqEZ+jhbTrYkwri/iO1EK8OlBXP2UPIDmJQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: "Matthew Wilcox (Oracle)" We need filesystems to be able to communicate acceptable folio sizes to the pagecache for a variety of uses (e.g. large block sizes). Support a range of folio sizes between order-0 and order-31. Signed-off-by: Matthew Wilcox (Oracle) Co-developed-by: Pankaj Raghav Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Darrick J. Wong --- include/linux/pagemap.h | 87 +++++++++++++++++++++++++++++++++++------ mm/filemap.c | 6 +-- mm/readahead.c | 4 +- 3 files changed, 78 insertions(+), 19 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 8026a8a433d36..71ad63e5fc061 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -204,14 +204,21 @@ enum mapping_flags { AS_EXITING = 4, /* final truncate in progress */ /* writeback related tags are not used */ AS_NO_WRITEBACK_TAGS = 5, - AS_LARGE_FOLIO_SUPPORT = 6, - AS_RELEASE_ALWAYS, /* Call ->release_folio(), even if no private data */ - AS_STABLE_WRITES, /* must wait for writeback before modifying + AS_RELEASE_ALWAYS = 6, /* Call ->release_folio(), even if no private data */ + AS_STABLE_WRITES = 7, /* must wait for writeback before modifying folio contents */ - AS_UNMOVABLE, /* The mapping cannot be moved, ever */ - AS_INACCESSIBLE, /* Do not attempt direct R/W access to the mapping */ + AS_UNMOVABLE = 8, /* The mapping cannot be moved, ever */ + AS_INACCESSIBLE = 9, /* Do not attempt direct R/W access to the mapping */ + /* Bits 16-25 are used for FOLIO_ORDER */ + AS_FOLIO_ORDER_BITS = 5, + AS_FOLIO_ORDER_MIN = 16, + AS_FOLIO_ORDER_MAX = AS_FOLIO_ORDER_MIN + AS_FOLIO_ORDER_BITS, }; +#define AS_FOLIO_ORDER_MASK ((1u << AS_FOLIO_ORDER_BITS) - 1) +#define AS_FOLIO_ORDER_MIN_MASK (AS_FOLIO_ORDER_MASK << AS_FOLIO_ORDER_MIN) +#define AS_FOLIO_ORDER_MAX_MASK (AS_FOLIO_ORDER_MASK << AS_FOLIO_ORDER_MAX) + /** * mapping_set_error - record a writeback error in the address_space * @mapping: the mapping in which an error should be set @@ -367,9 +374,50 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask) #define MAX_XAS_ORDER (XA_CHUNK_SHIFT * 2 - 1) #define MAX_PAGECACHE_ORDER min(MAX_XAS_ORDER, PREFERRED_MAX_PAGECACHE_ORDER) +/* + * mapping_set_folio_order_range() - Set the orders supported by a file. + * @mapping: The address space of the file. + * @min: Minimum folio order (between 0-MAX_PAGECACHE_ORDER inclusive). + * @max: Maximum folio order (between @min-MAX_PAGECACHE_ORDER inclusive). + * + * The filesystem should call this function in its inode constructor to + * indicate which base size (min) and maximum size (max) of folio the VFS + * can use to cache the contents of the file. This should only be used + * if the filesystem needs special handling of folio sizes (ie there is + * something the core cannot know). + * Do not tune it based on, eg, i_size. + * + * Context: This should not be called while the inode is active as it + * is non-atomic. + */ +static inline void mapping_set_folio_order_range(struct address_space *mapping, + unsigned int min, + unsigned int max) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return; + + if (min > MAX_PAGECACHE_ORDER) + min = MAX_PAGECACHE_ORDER; + if (max > MAX_PAGECACHE_ORDER) + max = MAX_PAGECACHE_ORDER; + if (max < min) + max = min; + + mapping->flags = (mapping->flags & ~AS_FOLIO_ORDER_MASK) | + (min << AS_FOLIO_ORDER_MIN) | (max << AS_FOLIO_ORDER_MAX); +} + +static inline void mapping_set_folio_min_order(struct address_space *mapping, + unsigned int min) +{ + mapping_set_folio_order_range(mapping, min, MAX_PAGECACHE_ORDER); +} + + /** * mapping_set_large_folios() - Indicate the file supports large folios. - * @mapping: The file. + * @mapping: The address space of the file. * * The filesystem should call this function in its inode constructor to * indicate that the VFS can use large folios to cache the contents of @@ -380,7 +428,23 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask) */ static inline void mapping_set_large_folios(struct address_space *mapping) { - __set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); + mapping_set_folio_order_range(mapping, 0, MAX_PAGECACHE_ORDER); +} + +static inline +unsigned int mapping_max_folio_order(const struct address_space *mapping) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return 0; + return (mapping->flags & AS_FOLIO_ORDER_MAX_MASK) >> AS_FOLIO_ORDER_MAX; +} + +static inline +unsigned int mapping_min_folio_order(const struct address_space *mapping) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return 0; + return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN; } /* @@ -393,16 +457,13 @@ static inline bool mapping_large_folio_support(struct address_space *mapping) VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON, "Anonymous mapping always supports large folio"); - return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && - test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); + return mapping_max_folio_order(mapping) > 0; } /* Return the maximum folio size for this pagecache mapping, in bytes. */ -static inline size_t mapping_max_folio_size(struct address_space *mapping) +static inline size_t mapping_max_folio_size(const struct address_space *mapping) { - if (mapping_large_folio_support(mapping)) - return PAGE_SIZE << MAX_PAGECACHE_ORDER; - return PAGE_SIZE; + return PAGE_SIZE << mapping_max_folio_order(mapping); } static inline int filemap_nr_thps(struct address_space *mapping) diff --git a/mm/filemap.c b/mm/filemap.c index d62150418b910..ad5e4a848070e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1933,10 +1933,8 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP)))) fgp_flags |= FGP_LOCK; - if (!mapping_large_folio_support(mapping)) - order = 0; - if (order > MAX_PAGECACHE_ORDER) - order = MAX_PAGECACHE_ORDER; + if (order > mapping_max_folio_order(mapping)) + order = mapping_max_folio_order(mapping); /* If we're not aligned, allocate a smaller folio */ if (index & ((1UL << order) - 1)) order = __ffs(index); diff --git a/mm/readahead.c b/mm/readahead.c index 517c0be7ce665..3e5239e9e1777 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -449,10 +449,10 @@ void page_cache_ra_order(struct readahead_control *ractl, limit = min(limit, index + ra->size - 1); - if (new_order < MAX_PAGECACHE_ORDER) + if (new_order < mapping_max_folio_order(mapping)) new_order += 2; - new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order); + new_order = min(mapping_max_folio_order(mapping), new_order); new_order = min_t(unsigned int, new_order, ilog2(ra->size)); /* See comment in page_cache_ra_unbounded() */ From patchwork Thu Jul 4 11:23:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723622 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CDD7C3271E for ; Thu, 4 Jul 2024 11:23:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D886A6B00BB; Thu, 4 Jul 2024 07:23:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D15186B00BD; Thu, 4 Jul 2024 07:23:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B63176B00BE; Thu, 4 Jul 2024 07:23:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9383A6B00BB for ; Thu, 4 Jul 2024 07:23:39 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 34F7580DFC for ; Thu, 4 Jul 2024 11:23:39 +0000 (UTC) X-FDA: 82301834958.13.1E691F8 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf18.hostedemail.com (Postfix) with ESMTP id 681F01C0020 for ; Thu, 4 Jul 2024 11:23:37 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=gv0aoJqj; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf18.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092189; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4y/mFE3QL/7SMPfc+1G2e9dMn71QU2+dCfxgGf4VoQs=; b=OkZ5z1tm28cpiBmZ9gr4lpJ35E8eiRHxQxSMD/IHKBWwB306zGMiQMRvpN/tKeWkMAcadY jJYkzu2+N5mDpUWqSaa1u5uv0RhaA8sKC03QiQ2oWkMInfV6cLuGDo0sE8K6nJ9bt7ctDZ GpbuSzoUjkUkgpIy2MI2t4iJcaY6cyk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092189; a=rsa-sha256; cv=none; b=KF/gkznA30MtfTCYS8s0lKKxTvFsmbVURSEFF+W5rzuqlAOhnE1oF2xEtKkHz+glKJw5aM tKXmATaQ8FdtNo7n7aYShmK0VYmG+Zk8sM1KPX6oX/Gw1CXWgq9rT1u+dgrpIhidDn6wQj RGxtZfTp9rNTW5wUAlQ0sxZ9hyJuZG4= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=gv0aoJqj; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf18.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4WFDmQ22yQz9tFC; Thu, 4 Jul 2024 13:23:34 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092214; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4y/mFE3QL/7SMPfc+1G2e9dMn71QU2+dCfxgGf4VoQs=; b=gv0aoJqjabzSAn/27gwhD6VhqU00qiOS0d6upVARzhrn16iT+PjRsj+3i42uqPVVM7SEtc fIWCkjUIERotle1o7yfHUsm46E0490fZvY1gNEcMuIGDG20p9qN41c4JNlN7zSvTSBxSb0 HfWEhFgYVRC+SMaAHEgWhaYYfRu9RpSHmzr0WeDlmYNsufue6Zrdkk21cIYHc9aR+FOAmy yi9P8c1AHWpuyTzYy/y8IRj+m9kMakVAZ3mZhgMiR0Y1mKVb76HzwQtRCqtPKzW/hiN95f umLUGYoKOcODembmdR86qzLMHv55urTvHEdIyM54dmplQh7h1EfPqerV/0KiLg== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v9 02/10] filemap: allocate mapping_min_order folios in the page cache Date: Thu, 4 Jul 2024 11:23:12 +0000 Message-ID: <20240704112320.82104-3-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 681F01C0020 X-Stat-Signature: kwo64w3hgju68ixh8tojmsn5uhqadgrf X-Rspam-User: X-HE-Tag: 1720092217-271622 X-HE-Meta: U2FsdGVkX1/hmFqZqOXHqUAPdyq0AKXauSg4u+RDfYgf69JDieOmVZnkUrhgdLKsn7ocwonVQxXE3nrkdeia0ifURCx23VKuJ4lEY/LlO5M7FrGMN67C7jxm8qu7mXDjm1U8IgRf1WZjDPc6diloOJ9WlXCUVIBB+xn/qO+zn/tOyA/CQJdBH+KGdZu4Z+OqAD7JIvxsq+c4Yly5zombx4KNrAh/z93Ek5sVIR4ndNDbzlrAVrwitbavKZb1ZqvhpsJho9hWzoGvTekawoOYwYlSh3lm6GKDErruuKq9vd2U+GC51p32S/EyzQQlTUwRqw2mo1POWf8v61jCcd0QopsvrhghAYu3nvipKaf62cmh0jEpLJZRuwMNRHjqtHrdrVPx7Lb39kTnvwclV4sxOGLu4yhq5i/W/p3zyJsUAUkb/QLkloQhPZpzkNAFpwINkAhJfEjWNM2qEvEKDVRnjqeJ8fV/BcUOjp0pI2l470z6xitXSQ0dD2KmPUidQoBDBX74nbMVwq9UN+E62/5FSJKGyaR7o6DQrMsiiFNTlQx7hev8FzDG1YBmRzzgtkGyF8N6jWUJ3DUmfLD9uYcTndcMHG1N43vgfopGhQlnryRsDggrfAxHl5tBBXAR/DPDhYvVgi+5AC2NSLl3zMlLuOF0FMPURxPqaRoPMRMlO1KI8xj4z2C48nWH3rkWWK56lfLRqquT+i/hXm6BxKSLdP1DQFZeHciJFbeZuydkBdOx+vIinCmU4vG8FvtWBczrVgP11+7ZTsTCQBpvXplD9qiwN5QxZ6xzPh3uc4w01QjGk2W2o43FROVAFEnOGO9ALgnVE5m3ojFMtZr0ScC0tamyqSQ3a/ka8EQb6SaOu3SdxFA02mfDUGQ7LKHQP40KAq9XcMHku0WllTJXxGvsrFuHOYKj9Is3R/w5afvymOjgGVRtG9vnGN0aFwieITshSgAylK35NDvNgh8LUOu Whw60vhs /RP6k7v2bc4yGwnvbg8xn4v0ge49zXSqvTm+9IwyoJPIjxlAT81vURm4gj5JNeuox0OVEKjPIu1CaOyrtxW39vHfRhsuTGub0aljBxAM8n5N15HMVAtNhiRoWeSB1KJeV1GKAQq59xEMtyJ29LKCwqOQNIR74OWRXsFV2DpLPdwOiRgLkFrYUsku+LKvXg1onSDSMK2MmBKg3Xc1rd0YUZpunIctwEJE6zfu6niTxusOWdiMPiYKS3L98paS19GjHCoXL3KDmG6tl6u00SxZ4fyLuxjcgwm65pzWtjXkA8KlEgz9zH3GoBfhMCh16QhdHG5l3RjmV7nt2JhwwvxM/L3XbgvofPWPRaTOvlb4sJKxBzClHqT+78reDNsFEXcEGPPvmjQZWOWf7ICCQUJCmYQPwrA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav filemap_create_folio() and do_read_cache_folio() were always allocating folio of order 0. __filemap_get_folio was trying to allocate higher order folios when fgp_flags had higher order hint set but it will default to order 0 folio if higher order memory allocation fails. Supporting mapping_min_order implies that we guarantee each folio in the page cache has at least an order of mapping_min_order. When adding new folios to the page cache we must also ensure the index used is aligned to the mapping_min_order as the page cache requires the index to be aligned to the order of the folio. Co-developed-by: Luis Chamberlain Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Darrick J. Wong Reviewed-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 20 ++++++++++++++++++++ mm/filemap.c | 24 ++++++++++++++++-------- 2 files changed, 36 insertions(+), 8 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 71ad63e5fc061..14e1415f7dcf4 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -447,6 +447,26 @@ unsigned int mapping_min_folio_order(const struct address_space *mapping) return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN; } +static inline +unsigned long mapping_min_folio_nrpages(struct address_space *mapping) +{ + return 1UL << mapping_min_folio_order(mapping); +} + +/** + * mapping_align_index() - Align index for this mapping. + * @mapping: The address_space. + * + * The index of a folio must be naturally aligned. If you are adding a + * new folio to the page cache and need to know what index to give it, + * call this function. + */ +static inline pgoff_t mapping_align_index(struct address_space *mapping, + pgoff_t index) +{ + return round_down(index, mapping_min_folio_nrpages(mapping)); +} + /* * Large folio support currently depends on THP. These dependencies are * being worked on but are not yet fixed. diff --git a/mm/filemap.c b/mm/filemap.c index ad5e4a848070e..d27e9ac54309d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -859,6 +859,8 @@ noinline int __filemap_add_folio(struct address_space *mapping, VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio); + VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping), + folio); mapping_set_update(&xas, mapping); VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); @@ -1919,8 +1921,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, folio_wait_stable(folio); no_page: if (!folio && (fgp_flags & FGP_CREAT)) { - unsigned order = FGF_GET_ORDER(fgp_flags); + unsigned int min_order = mapping_min_folio_order(mapping); + unsigned int order = max(min_order, FGF_GET_ORDER(fgp_flags)); int err; + index = mapping_align_index(mapping, index); if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping)) gfp |= __GFP_WRITE; @@ -1943,7 +1947,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, gfp_t alloc_gfp = gfp; err = -ENOMEM; - if (order > 0) + if (order > min_order) alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; folio = filemap_alloc_folio(alloc_gfp, order); if (!folio) @@ -1958,7 +1962,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, break; folio_put(folio); folio = NULL; - } while (order-- > 0); + } while (order-- > min_order); if (err == -EEXIST) goto repeat; @@ -2447,13 +2451,15 @@ static int filemap_update_page(struct kiocb *iocb, } static int filemap_create_folio(struct file *file, - struct address_space *mapping, pgoff_t index, + struct address_space *mapping, loff_t pos, struct folio_batch *fbatch) { struct folio *folio; int error; + unsigned int min_order = mapping_min_folio_order(mapping); + pgoff_t index; - folio = filemap_alloc_folio(mapping_gfp_mask(mapping), 0); + folio = filemap_alloc_folio(mapping_gfp_mask(mapping), min_order); if (!folio) return -ENOMEM; @@ -2471,6 +2477,7 @@ static int filemap_create_folio(struct file *file, * well to keep locking rules simple. */ filemap_invalidate_lock_shared(mapping); + index = (pos >> (PAGE_SHIFT + min_order)) << min_order; error = filemap_add_folio(mapping, folio, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error == -EEXIST) @@ -2531,8 +2538,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count, if (!folio_batch_count(fbatch)) { if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ)) return -EAGAIN; - err = filemap_create_folio(filp, mapping, - iocb->ki_pos >> PAGE_SHIFT, fbatch); + err = filemap_create_folio(filp, mapping, iocb->ki_pos, fbatch); if (err == AOP_TRUNCATED_PAGE) goto retry; return err; @@ -3748,9 +3754,11 @@ static struct folio *do_read_cache_folio(struct address_space *mapping, repeat: folio = filemap_get_folio(mapping, index); if (IS_ERR(folio)) { - folio = filemap_alloc_folio(gfp, 0); + folio = filemap_alloc_folio(gfp, + mapping_min_folio_order(mapping)); if (!folio) return ERR_PTR(-ENOMEM); + index = mapping_align_index(mapping, index); err = filemap_add_folio(mapping, folio, index, gfp); if (unlikely(err)) { folio_put(folio); From patchwork Thu Jul 4 11:23:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3B61C31D97 for ; Thu, 4 Jul 2024 11:23:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D4A16B00C0; Thu, 4 Jul 2024 07:23:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 65A9F6B00C1; Thu, 4 Jul 2024 07:23:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D3586B00C3; Thu, 4 Jul 2024 07:23:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2A7F26B00C0 for ; Thu, 4 Jul 2024 07:23:45 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C076C1C0D89 for ; Thu, 4 Jul 2024 11:23:44 +0000 (UTC) X-FDA: 82301835168.08.979E8EB Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by imf24.hostedemail.com (Postfix) with ESMTP id 0320718001F for ; Thu, 4 Jul 2024 11:23:42 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=VaD3BKQj; spf=pass (imf24.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092204; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+r0q1aMOVXTo8fjQ8nnAWfzQgFhULOwQhra2cgoZxE8=; b=21tPrusB5wCfpSlP7duk8NorM72erB1a/II6yxeB0cHWwnkM5nfiVfQrfIWlevi03maB20 PSdMbblXS59X3oSSJVgRzX7srtCna7X4C1KwjsVGuypVl1VsaoNqNlByQN4ggwH4PgQ/Uo UYfFAVUUB91D5VdKTH/SiJbWpQPXBDw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=VaD3BKQj; spf=pass (imf24.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092204; a=rsa-sha256; cv=none; b=iA1gn9GPdtuMvfDoto1x0C/6FnabqB5pKpHDKVydz9w8zj2xLN/Ksc31U1BARVlUp0ncew x6dgjHgAlB26S9PUeb7UHrGKI6WC7Q8LC3j87/2O7mCygPBMh+k5DLizABcRMvjAG3Q+cp cgWrR/5cYQmyIf9UrvrWJVsvuymic7Y= Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4WFDmV67htz9sqN; Thu, 4 Jul 2024 13:23:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+r0q1aMOVXTo8fjQ8nnAWfzQgFhULOwQhra2cgoZxE8=; b=VaD3BKQj69hz/1ndUVBPJSQmhDUJZVT1EoGZZzC5nn3Fo2jKrtk2k6yJDBTMVDqfxR30is nEV5MAey8ElHyeBzRRmsrvNE3fCwlhsJrXgB2fbPM9ejTfCTFzy00Pa67YMc80jQ4Gufaw RvcD18td5Dbp4wPwMjo07I91CfP/ayUdOn181ToE2gUoDINdUfU94p4xLnIFo6ZhXmg7gz iqPLKjL2jYUMA7gEq2hvm0ahgT4wxwdOPCf8BKosDMDVDSgGiFY6aY83rYAQYQOTToYyIS DzvjJPDQNJI1zSFk3fddA8DmQqtyf+hGBHE9xBR5EFiDlNYDj2XoozUlRjs9Ug== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v9 03/10] readahead: allocate folios with mapping_min_order in readahead Date: Thu, 4 Jul 2024 11:23:13 +0000 Message-ID: <20240704112320.82104-4-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Stat-Signature: ks15cpsodbdjazyxpppj3n4mzo71rkfr X-Rspam-User: X-Rspamd-Queue-Id: 0320718001F X-Rspamd-Server: rspam02 X-HE-Tag: 1720092222-929418 X-HE-Meta: U2FsdGVkX1+k3/8OvgV7xW1vJ2tRh1xe6EjUiWayRLHO+067OOsdt5vjPJRe+dCm+HxUJy0dSb5jXkjtCgWO/BARDzSfRjyod06zTCqvvL52hnJNCH0YkNwBHnn0YynifEhxyNfxt7QS+tLs//IlYIHHEs1r/YFulAA/AS7238znDQ5ETYngLWjf6AZ8Qjlgh5MNLAMjeKfCFHnUMZRj5x/NBO+cRrA/dkNuhYVMmzMf/ktFZn6FANLcXcnrVNzcwRfEyy0qoWAD5YfBedeiSrE6e1xOGlekIrlbs7X+tBj/OokQ4SjH2iUysonQt0AEowQSyalekmp3K3nnvQKG9iW+SrsBNZdCZ7WC8h943XwvZ4vLaTVbNSvQ+NYAZtTCmoI3ncdwiuFkjXNH/pqFh58+AsHZe9XxqlgdDclwh1l5lCzP5u475mTH9Q+d/XfKBSSp8QaaLLB53mo4yKK2arXPt5ZVwRMs/KgToNJrTNWGFgQbCLtPoMdVl4/pB3rrMsa0ymby1WToEWvArzicWAlJKlRJ4wVZKsdPkRPCOQyeSkvZ9tDQvq+9dEIcDUdHtgdQRysCwyQwMAJjUEHnhE4dOscubX3FPMbWPuS3X/rBz5b4cTbYEufPN088nKeSpUB8LYDKdX72cM0UKk3LUaE0r4OVbZcutVN6jWAK2nuYzXusySonAhyuQiCy7cjZ7S1SunurT7vkvii6QmyanGr9KzUKjgBpwUx7WkKFgq4VjhKpFnnYRiFul/03WtWQr7CSIElMu4gj2URK/1PeAMeJZ0fHdgBKgsbH2McGZ9b8gzXJatva0HJhwp/BUCwrOjfvAj2acEuNWErnHwQ1S4Z2/Paa4sngedzfkcRTNRImEcqE8ugxcSFblHTbEZrRqoa4wrcMMnZEPdTMGxN3X1NVCwRD5PGFrWpQbwSMVVL/+ukteAR10cirlNLqw+k5RTX2rDeOKYKEe3THdQt CqCr+DWD 2b5Ym+SrYUrUfq5KcGG08lCLxaf+bu63NM87hvZUAZA4G1a/LjdAXcFEcXSfkaU6wgP16qmCbzPPvh1nOuUPgs7DnIuSag3olKpDhbuX0uBp/JvGcAjkhZIQBc5e0dCNU8brPL61/nyriLJs/yHiQAsAFLA1ik8XOkxkLds1cMqEJPtA2IghPnLK/cwtUA1yy07PguqGCQZ0yv+m3w2XuEfw0Ux5Arj/U6ddpM93nfZKC98BBzMI6kkfY3lRm54kwwC516ybrUu80LSN5VqiJ3JXf+KVEDkE2DKG4Ws/Ty+yCQSSQ8K3Noqs+4Nr88JDpdTzI5ngK4X+u9BLVDm1/6xx0xg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav page_cache_ra_unbounded() was allocating single pages (0 order folios) if there was no folio found in an index. Allocate mapping_min_order folios as we need to guarantee the minimum order if it is set. page_cache_ra_order() tries to allocate folio to the page cache with a higher order if the index aligns with that order. Modify it so that the order does not go below the mapping_min_order requirement of the page cache. This function will do the right thing even if the new_order passed is less than the mapping_min_order. When adding new folios to the page cache we must also ensure the index used is aligned to the mapping_min_order as the page cache requires the index to be aligned to the order of the folio. readahead_expand() is called from readahead aops to extend the range of the readahead so this function can assume ractl->_index to be aligned with min_order. Signed-off-by: Pankaj Raghav Co-developed-by: Hannes Reinecke Signed-off-by: Hannes Reinecke Acked-by: Darrick J. Wong --- mm/readahead.c | 79 ++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 61 insertions(+), 18 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 3e5239e9e1777..2078c42777a62 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -206,9 +206,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, unsigned long nr_to_read, unsigned long lookahead_size) { struct address_space *mapping = ractl->mapping; - unsigned long index = readahead_index(ractl); + unsigned long ra_folio_index, index = readahead_index(ractl); gfp_t gfp_mask = readahead_gfp_mask(mapping); - unsigned long i; + unsigned long mark, i = 0; + unsigned int min_nrpages = mapping_min_folio_nrpages(mapping); /* * Partway through the readahead operation, we will have added @@ -223,10 +224,24 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, unsigned int nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); + index = mapping_align_index(mapping, index); + + /* + * As iterator `i` is aligned to min_nrpages, round_up the + * difference between nr_to_read and lookahead_size to mark the + * index that only has lookahead or "async_region" to set the + * readahead flag. + */ + ra_folio_index = round_up(readahead_index(ractl) + nr_to_read - lookahead_size, + min_nrpages); + mark = ra_folio_index - index; + nr_to_read += readahead_index(ractl) - index; + ractl->_index = index; + /* * Preallocate as many pages as we will need. */ - for (i = 0; i < nr_to_read; i++) { + while (i < nr_to_read) { struct folio *folio = xa_load(&mapping->i_pages, index + i); int ret; @@ -240,12 +255,13 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, * not worth getting one just for that. */ read_pages(ractl); - ractl->_index++; - i = ractl->_index + ractl->_nr_pages - index - 1; + ractl->_index += min_nrpages; + i = ractl->_index + ractl->_nr_pages - index; continue; } - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, + mapping_min_folio_order(mapping)); if (!folio) break; @@ -255,14 +271,15 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, if (ret == -ENOMEM) break; read_pages(ractl); - ractl->_index++; - i = ractl->_index + ractl->_nr_pages - index - 1; + ractl->_index += min_nrpages; + i = ractl->_index + ractl->_nr_pages - index; continue; } - if (i == nr_to_read - lookahead_size) + if (i == mark) folio_set_readahead(folio); ractl->_workingset |= folio_test_workingset(folio); - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; + i += min_nrpages; } /* @@ -438,13 +455,19 @@ void page_cache_ra_order(struct readahead_control *ractl, struct address_space *mapping = ractl->mapping; pgoff_t start = readahead_index(ractl); pgoff_t index = start; + unsigned int min_order = mapping_min_folio_order(mapping); pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; pgoff_t mark = index + ra->size - ra->async_size; unsigned int nofs; int err = 0; gfp_t gfp = readahead_gfp_mask(mapping); + unsigned int min_ra_size = max(4, mapping_min_folio_nrpages(mapping)); - if (!mapping_large_folio_support(mapping) || ra->size < 4) + /* + * Fallback when size < min_nrpages as each folio should be + * at least min_nrpages anyway. + */ + if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size) goto fallback; limit = min(limit, index + ra->size - 1); @@ -454,10 +477,19 @@ void page_cache_ra_order(struct readahead_control *ractl, new_order = min(mapping_max_folio_order(mapping), new_order); new_order = min_t(unsigned int, new_order, ilog2(ra->size)); + new_order = max(new_order, min_order); /* See comment in page_cache_ra_unbounded() */ nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); + /* + * If the new_order is greater than min_order and index is + * already aligned to new_order, then this will be noop as index + * aligned to new_order should also be aligned to min_order. + */ + ractl->_index = mapping_align_index(mapping, index); + index = readahead_index(ractl); + while (index <= limit) { unsigned int order = new_order; @@ -465,7 +497,7 @@ void page_cache_ra_order(struct readahead_control *ractl, if (index & ((1UL << order) - 1)) order = __ffs(index); /* Don't allocate pages past EOF */ - while (index + (1UL << order) - 1 > limit) + while (order > min_order && index + (1UL << order) - 1 > limit) order--; err = ra_alloc_folio(ractl, index, mark, order, gfp); if (err) @@ -703,8 +735,15 @@ void readahead_expand(struct readahead_control *ractl, struct file_ra_state *ra = ractl->ra; pgoff_t new_index, new_nr_pages; gfp_t gfp_mask = readahead_gfp_mask(mapping); + unsigned long min_nrpages = mapping_min_folio_nrpages(mapping); + unsigned int min_order = mapping_min_folio_order(mapping); new_index = new_start / PAGE_SIZE; + /* + * Readahead code should have aligned the ractl->_index to + * min_nrpages before calling readahead aops. + */ + VM_BUG_ON(!IS_ALIGNED(ractl->_index, min_nrpages)); /* Expand the leading edge downwards */ while (ractl->_index > new_index) { @@ -714,9 +753,11 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, min_order); if (!folio) return; + + index = mapping_align_index(mapping, index); if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { folio_put(folio); return; @@ -726,7 +767,7 @@ void readahead_expand(struct readahead_control *ractl, ractl->_workingset = true; psi_memstall_enter(&ractl->_pflags); } - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; ractl->_index = folio->index; } @@ -741,9 +782,11 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, min_order); if (!folio) return; + + index = mapping_align_index(mapping, index); if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { folio_put(folio); return; @@ -753,10 +796,10 @@ void readahead_expand(struct readahead_control *ractl, ractl->_workingset = true; psi_memstall_enter(&ractl->_pflags); } - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; if (ra) { - ra->size++; - ra->async_size++; + ra->size += min_nrpages; + ra->async_size += min_nrpages; } } } From patchwork Thu Jul 4 11:23:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723624 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1E28C30653 for ; Thu, 4 Jul 2024 11:23:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 683956B00C3; Thu, 4 Jul 2024 07:23:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 634116B00C5; Thu, 4 Jul 2024 07:23:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4AD276B00C8; Thu, 4 Jul 2024 07:23:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2C9A56B00C3 for ; Thu, 4 Jul 2024 07:23:49 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7F8D816119F for ; Thu, 4 Jul 2024 11:23:48 +0000 (UTC) X-FDA: 82301835336.12.D6F8C64 Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by imf01.hostedemail.com (Postfix) with ESMTP id B90AD40026 for ; Thu, 4 Jul 2024 11:23:46 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=niHhGXXL; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf01.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092208; a=rsa-sha256; cv=none; b=uQa8H79xpBAlLGR//ltGItlpmt3ZBvOof48C2ryahy7brmyNPpLdpGgzZM2177uLJYXvA8 h0GFXB7J4iXVHbYpeznfu41IzsI6VU5LYfbrdbIPYwNUTuq9/elr5mCGb22BdyzF54Eb78 Q+5QdkfTjHahcId1qmh+0GH7sq4Sx7E= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=niHhGXXL; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf01.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092208; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/8WfUkcPSa7GHZg7/Pn66GCljLM5KeUQHmwEzlJNP7Q=; b=IJJxujBE9LSnx2t1GquJCrstO8b+DzDzqySjFZKtHujk+3r/oO3TJWEsJee7MfLXaqLPGg rsLvDIZU/IdLHhDXjcq4Xzf3V9YJq2EjaFpeViE+KplJW/gzr0XOP6Ga2rgYQoYMu4vT+c 1riad2WKO3WenEoAzZcO7MF+pnS9DiM= Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4WFDmb3QZCz9tVW; Thu, 4 Jul 2024 13:23:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092223; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/8WfUkcPSa7GHZg7/Pn66GCljLM5KeUQHmwEzlJNP7Q=; b=niHhGXXLlGfBFEJjR7uSrNiVvSPbFHkCpsbWC7Q43zY0FGd9RzTzh0H9SncE1I//MU/tM8 lZUFPg1YI2YDmp2QSUggek1zSomko9Z9Rx9j15dDPqJ+nYQRytcbL5pffF6zoyfyE9+x5y V/kEZW5KC9gJJy7fBzJ4zaoW8xKsJGB6E5+JVx23Uy9gJKV1KYySvp2kP/5DVS6frkfHSo +7uD3pxnLlSluHwxygKmybygOmxpTqlBP9iKKWtPf2KhdThATzmJnXKvEM5c17VQVGXPZq AVdpjRWG2HM+18T8ruhyhr1v9DseqUimBzIa+yfipmogY1CwvXDG5/RQgRGZVw== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v9 04/10] mm: split a folio in minimum folio order chunks Date: Thu, 4 Jul 2024 11:23:14 +0000 Message-ID: <20240704112320.82104-5-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: B90AD40026 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: sh1h84rod1scwc51473rsxz31ncaxf31 X-HE-Tag: 1720092226-615013 X-HE-Meta: U2FsdGVkX1+SKe0bFFYNyJrupH49M/6AB5Iecn7zhAHMV0Spm6jL+VGol6IO+6/4bYQsSiqnAw5oETHDef35MLtlwacJ/B/J7jZGCgx9AaWYiuxtBesRtss1lccFETwkk3B58rYT3lueymlQyvGAwpSu8ucM19Vr0Wlaaoil4loiYuA3xtX+LJP/z4F2a9KpQ4zHw58U/yFbxrP54io9SZKaj0bAxGLpH0S0l5VzVXiI/HjUdoIosI08L2Zz1ASZae80moDlIBcaM7IyYgiUzr3Y6z4s9Ws9WqUJCrUb6wGq0X4pqFz3UH8stftJQvZKLWqx2Bo1glFEEOcMTtRflzdXK93lZN7j/WuIComyy/XP+cir9wIWM2KmusdQ8FypFhjlHapJI/iWcZXcJJNWbZ5xMGG3RW5xzl8TGrRiW4LgBMv650dajA58IarF/uatgBieoiJw5TEJuqnfHhOYzkz3hHTe5/z7rTnZMvym43Ftd8WKhVnEpvgtGb01pU/y1UveB7WKAXzUo8htl5xxbOGx4XbVvoISv3tNrw89F6iIoTfgxm7Ecm5PVpffl0PcOsSZ0Z8Si60GIisL+PeljqPalP0mIYzcwWMpMAPbx/ZmSK2F6X77XABTaceelKDZ+FCwmRYMgnfs5TQBJkS3CyQBNfvg9LMp9vk9ZOQg/X/4dlggDOFmLaoV9Zr0nlWDcH/w7zHm8QtmW+bShBeSpgrQfrh6hGg9U/N4AJ5qiuGPoSGMJ3XRTurwnEG1biWyXmKOhYkQs7tff9zM8f7c8R+HIhoA25YTvKARzWX9i6cgT8EZQFCX70bvrh44S6XDxPU2k+zkTgoPsUwI0mznligl13qD4uUwpO9/bhi6I8zXOi+2nNW+9NQRdzwg5ripX2A05X08qbSS6s8vJwFao4dAa4ZKrs+kts9TI3BcEbnB6Vxm66WyViqDeq3wSqsaqEJK/+J2f+OmtG4iwz/ alhW0IjF l3ale9sU3dViQDu8Icebu0K5BWpllEuhfZmZQfRfb9MIzEaJD31YDjJ+PzBroH8DxGRxxKf3dY9sTMu+yvr17xaM6s2AUV6R/SP2isr6g0Gt8l1h8tB39aFBNFzk9kEsStmEAm7L1C8Sb/6w/mYtI7W9oVSChoEoAyDHh7weZa9xlpsJmi7gN7YY2A/b5bWj1nWrgZCQgCIIZzmvKa2BbbcmeA4psmDW3jtI6/i58aVNoCmBWdC0VWa15D+merT4ZmXukU7CuawrLLtCmk4VTkQLDpaAICt9rOFaZ31tdfaw8Aftzp3NxnDaBsLH8t9Rl1332 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Luis Chamberlain split_folio() and split_folio_to_list() assume order 0, to support minorder for non-anonymous folios, we must expand these to check the folio mapping order and use that. Set new_order to be at least minimum folio order if it is set in split_huge_page_to_list() so that we can maintain minimum folio order requirement in the page cache. Update the debugfs write files used for testing to ensure the order is respected as well. We simply enforce the min order when a file mapping is used. Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke --- include/linux/huge_mm.h | 14 ++++++++--- mm/huge_memory.c | 55 ++++++++++++++++++++++++++++++++++++++--- 2 files changed, 61 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4d155c7a47922..b320a246293ec 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -90,6 +90,8 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) +#define split_folio(f) split_folio_to_list(f, NULL) + #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES #define HPAGE_PMD_SHIFT PMD_SHIFT #define HPAGE_PUD_SHIFT PUD_SHIFT @@ -327,9 +329,10 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add bool can_split_folio(struct folio *folio, int *pextra_pins); int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order); +int split_folio_to_list(struct folio *folio, struct list_head *list); static inline int split_huge_page(struct page *page) { - return split_huge_page_to_list_to_order(page, NULL, 0); + return split_folio(page_folio(page)); } void deferred_split_folio(struct folio *folio); @@ -501,6 +504,12 @@ static inline int split_huge_page(struct page *page) { return 0; } + +static inline int split_folio_to_list(struct folio *folio, struct list_head *list) +{ + return 0; +} + static inline void deferred_split_folio(struct folio *folio) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) @@ -615,7 +624,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) return split_folio_to_list_to_order(folio, NULL, new_order); } -#define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) -#define split_folio(f) split_folio_to_order(f, 0) - #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a633206375af1..2fd45d8730bab 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3062,6 +3062,9 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) * released, or if some unexpected race happened (e.g., anon VMA disappeared, * truncation). * + * Callers should ensure that the order respects the address space mapping + * min-order if one is set for non-anonymous folios. + * * Returns -EINVAL when trying to split to an order that is incompatible * with the folio. Splitting to order 0 is compatible with all folios. */ @@ -3143,6 +3146,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, mapping = NULL; anon_vma_lock_write(anon_vma); } else { + unsigned int min_order; gfp_t gfp; mapping = folio->mapping; @@ -3153,6 +3157,14 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, goto out; } + min_order = mapping_min_folio_order(folio->mapping); + if (new_order < min_order) { + VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u", + min_order); + ret = -EINVAL; + goto out; + } + gfp = current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); @@ -3265,6 +3277,21 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, return ret; } +int split_folio_to_list(struct folio *folio, struct list_head *list) +{ + unsigned int min_order = 0; + + if (!folio_test_anon(folio)) { + if (!folio->mapping && folio_test_pmd_mappable(folio)) { + count_vm_event(THP_SPLIT_PAGE_FAILED); + return -EBUSY; + } + min_order = mapping_min_folio_order(folio->mapping); + } + + return split_huge_page_to_list_to_order(&folio->page, list, min_order); +} + void __folio_undo_large_rmappable(struct folio *folio) { struct deferred_split *ds_queue; @@ -3496,6 +3523,8 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, struct vm_area_struct *vma = vma_lookup(mm, addr); struct page *page; struct folio *folio; + struct address_space *mapping; + unsigned int target_order = new_order; if (!vma) break; @@ -3516,7 +3545,13 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, if (!is_transparent_hugepage(folio)) goto next; - if (new_order >= folio_order(folio)) + if (!folio_test_anon(folio)) { + mapping = folio->mapping; + target_order = max(new_order, + mapping_min_folio_order(mapping)); + } + + if (target_order >= folio_order(folio)) goto next; total++; @@ -3532,9 +3567,13 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, if (!folio_trylock(folio)) goto next; - if (!split_folio_to_order(folio, new_order)) + if (!folio_test_anon(folio) && folio->mapping != mapping) + goto unlock; + + if (!split_folio_to_order(folio, target_order)) split++; +unlock: folio_unlock(folio); next: folio_put(folio); @@ -3559,6 +3598,7 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, pgoff_t index; int nr_pages = 1; unsigned long total = 0, split = 0; + unsigned int min_order; file = getname_kernel(file_path); if (IS_ERR(file)) @@ -3572,9 +3612,11 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, file_path, off_start, off_end); mapping = candidate->f_mapping; + min_order = mapping_min_folio_order(mapping); for (index = off_start; index < off_end; index += nr_pages) { struct folio *folio = filemap_get_folio(mapping, index); + unsigned int target_order = new_order; nr_pages = 1; if (IS_ERR(folio)) @@ -3583,18 +3625,23 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, if (!folio_test_large(folio)) goto next; + target_order = max(new_order, min_order); total++; nr_pages = folio_nr_pages(folio); - if (new_order >= folio_order(folio)) + if (target_order >= folio_order(folio)) goto next; if (!folio_trylock(folio)) goto next; - if (!split_folio_to_order(folio, new_order)) + if (folio->mapping != mapping) + goto unlock; + + if (!split_folio_to_order(folio, target_order)) split++; +unlock: folio_unlock(folio); next: folio_put(folio); From patchwork Thu Jul 4 11:23:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9019BC31D97 for ; Thu, 4 Jul 2024 11:23:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 264E26B00CB; Thu, 4 Jul 2024 07:23:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 214916B00CC; Thu, 4 Jul 2024 07:23:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 03FA56B00CE; Thu, 4 Jul 2024 07:23:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D5D856B00CB for ; Thu, 4 Jul 2024 07:23:52 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8D13C411ED for ; Thu, 4 Jul 2024 11:23:52 +0000 (UTC) X-FDA: 82301835504.30.69C6706 Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by imf23.hostedemail.com (Postfix) with ESMTP id C71DF14000D for ; Thu, 4 Jul 2024 11:23:50 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=Y9I71uW1; spf=pass (imf23.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092219; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xbld/pizLMEJfrPYBqwQ81gjcSKIsZj5n7TDu8kDUY0=; b=dxXsbWePiykiUvkigRuEZYsl8BjFGvaBfGgw5Y6qU0icSfFFuyn3YFQzcMzP9rBlEfCBzn al0Xr38utl+HTxl6/eJPgIlbjrKZJgiWddQFdXMbI8oxX2VCvx+67f87Q3ptO9CJdbUPku BNPQm8aqeSGLREE26BK4835ck13empY= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=Y9I71uW1; spf=pass (imf23.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092219; a=rsa-sha256; cv=none; b=aUdwsBcz+m91j2rbldpzHOA4gK7r6Hubu8ZC+QnSOSTk6iGvVQ72GHTvpv6jubMqU/JcTs k0lpccqTdAzCTz4pUOkZGg+2QHVuLYkyUJNdDpLvkC/LVxQNLzIYqrNV56Z5CrO6IQoR5E qKRW33TbRv+ksNRQ6K1HWjdlr5EjI1w= Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4WFDmg3MR7z9tVf; Thu, 4 Jul 2024 13:23:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092227; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xbld/pizLMEJfrPYBqwQ81gjcSKIsZj5n7TDu8kDUY0=; b=Y9I71uW1qPx88w5xzTVV6UMtblN7zGG5xvC8xb6Mz8YbcDkhM23RoZtzGBKmOssWQWqr5e N6p5+KEKnZsKkXHz3za1P9qz7JBW77U+ZFe7QzIJO+J0nJ+V0mnEaDOs3CFqZYzpgmtYcm E+MIyO1Go3GbYRC2VFtt+CyVFvNEE0u/TW7ll43usBLUFNcNXfQeiPY25ndN7uRo0g4jAD G1//e2FtVmsyQLz0x6pcNksViZ1gK9tFOppFcYZ7sFI5rjPhYBDhK9UkBn0jay5rTelnLO smWW/vns2NyOtHQs56Cts2K2VIKIPMjrJsOQ+gug2XmGzfm/7PoYv2Dv9l+DOA== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v9 05/10] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Date: Thu, 4 Jul 2024 11:23:15 +0000 Message-ID: <20240704112320.82104-6-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: C71DF14000D X-Stat-Signature: 3jh8961hkype7ydopebki5oaghokius4 X-HE-Tag: 1720092230-722752 X-HE-Meta: U2FsdGVkX1+14ticH9FpY2LNo5rMxtujqP5MIbguIRsxU3D2Ia8P5lUUGNjfV9mzb29ktAqfl0IeittWEOWqiw3siEugU40fuXqqprPOETxUMFTJoG6P1i/ejqZh+zp45iW8HQsVnqbBo4Hj/cuTEuRNzD37iop43TG6BY/1RGOMVhgdRr9EFn1GnlhwnfDPK9hMUhHvcSCmDn4UJ8PasWIQWkeqT4YqvpKR5pFxYXWrT610/nX1iuTKxjYAmAXYpg3HMjYl7kATKMEXOHetjPKvIe2IaWoIsl3LCk8Mefey+Ywf0otpCX1CK/yKrJFtiPhGeAyJi+HZvdqvoy0B8c8tFd9SmOB5eKxTg/ddULF53mNtDHY+Ik4QHkOZDXQ+TQqOxy7+U2tUvrXuoEsuivEy7S9tDora2bZT9jiaWe8k3Z1Tqqs2mD6kkjCrjOUqZKT4KnyEK76kmtkhp3i18jfF0qOI67I3i3iTRbAZQlRVl7FEEhpHZG+FUIoi8SqoDwsFquVkyOqzNDBCAl57uuhumym7N5hMo7gZbeOam70fYXyxS5OuN3uaZPqWizpUoPnK8efoXSNXDaNrSON+qr6GysPwSF6lCLxSs46xBHXdgBOwHFm+DI/v5iqlDNNAneVkXdBahszSW1elIqeWuxQAhroCFQ7vptQdFcnLr2PEAswOftCW8QQUKFNx310qhTwKfaYIdZddRVYgjXDa0/Mpl+BGT752zoRaQmkLVw/hCICIhdlPju74UIGoW3Nzy0frYUYfWqoPZL1V/2xA1FaRRIXe3Lilh2XSAuoVLdkXd3daeF+S01Ttg9huFuu2e/iFfbnDdfa693PTaGS/4DaUVZY1lh/t83iH14+DiKQgTFKtpCZqFKNLEm1FeWb58Tun9riJjUAt89yLeIkf3NSCWDScuLLoaqjU9tW+CZ/VSPoTCmdQz9deI12X5E8A5PuXQ9e6Uin7SHRLHNG V5THB+fV 58BTPS7jybn1nlpgA/2wEXlizvOh35dKbiJdDt1/pgBs+VxKfvNfZ2GJj2fpbPP0AbGT6f1+Dp9neUa/YDdaezsHReIA2ymcfYN5KnxkrtOSeH/4e4hP24GY48vC9Q8c9qTd2OUxM1dcNpnfzdmMQBwwMSisSWsrL2QgY5KNqW7p+fzI2JVoXmhljMMPRswQFmUxN2OHXWSzkc8sM+HGq5FPq1vLPlqXWg2H8qf92CIrn/1/cGv1jBfpgvhNBvZNNlCpijBTMLGCpSsHx9soWdCmO/ki916LWafoZg92BQgquyBSe0xPw3msYp5pplFM82xH2D9EDs3rNO+QR8lMBfje3MYrEv5lpmdCBOhE9d6qgW1L/aILP8fWg9GVqJn+GdbCA/jLLgB3i6V7q5qc3C4OapSbHnIBTH0O1/Rw08F4eEfU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Usually the page cache does not extend beyond the size of the inode, therefore, no PTEs are created for folios that extend beyond the size. But with LBS support, we might extend page cache beyond the size of the inode as we need to guarantee folios of minimum order. While doing a read, do_fault_around() can create PTEs for pages that lie beyond the EOF leading to incorrect error return when accessing a page beyond the mapped file. Cap the PTE range to be created for the page cache up to the end of file(EOF) in filemap_map_pages() so that return error codes are consistent with POSIX[1] for LBS configurations. generic/749(currently in xfstest-dev patches-in-queue branch [0]) has been created to trigger this edge case. This also fixes generic/749 for tmpfs with huge=always on systems with 4k base page size. [0] https://lore.kernel.org/all/20240615002935.1033031-3-mcgrof@kernel.org/ [1](from mmap(2)) SIGBUS Attempted access to a page of the buffer that lies beyond the end of the mapped file. For an explanation of the treatment of the bytes in the page that corresponds to the end of a mapped file that is not a multiple of the page size, see NOTES. Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong --- mm/filemap.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index d27e9ac54309d..d322109274532 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3608,7 +3608,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, struct vm_area_struct *vma = vmf->vma; struct file *file = vma->vm_file; struct address_space *mapping = file->f_mapping; - pgoff_t last_pgoff = start_pgoff; + pgoff_t file_end, last_pgoff = start_pgoff; unsigned long addr; XA_STATE(xas, &mapping->i_pages, start_pgoff); struct folio *folio; @@ -3634,6 +3634,10 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, goto out; } + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; + if (end_pgoff > file_end) + end_pgoff = file_end; + folio_type = mm_counter_file(folio); do { unsigned long end; From patchwork Thu Jul 4 11:23:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723626 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3B81C30653 for ; Thu, 4 Jul 2024 11:23:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 60CEC6B0093; Thu, 4 Jul 2024 07:23:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5983D6B00CF; Thu, 4 Jul 2024 07:23:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39BB96B00D1; Thu, 4 Jul 2024 07:23:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1A4BC6B0093 for ; Thu, 4 Jul 2024 07:23:56 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 919451A0DB0 for ; Thu, 4 Jul 2024 11:23:55 +0000 (UTC) X-FDA: 82301835630.30.A0DE55C Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by imf02.hostedemail.com (Postfix) with ESMTP id 9D25880012 for ; Thu, 4 Jul 2024 11:23:53 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b="1c/bZo2D"; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092208; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Gfht+/mcjWMyUHlg9kllIj+qSNuX0vmSWQJZg9AkiWg=; b=PA2gwQ9AM+g8dbCivT+gb12tMm43nv/ZiTkir4THv6k2zm1li6+wTGkSTQ1sBAzeAhuTnq WPWflxdYiUlLkf0Iwy7hRvFbp+qzr93vDJsuHTE7HZL2o5R0ITbq9jkkLna1wDAQA1LUM3 +usyOXLoBdwNVHa6Fc0NHuue087W+2A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092208; a=rsa-sha256; cv=none; b=V7V0IjhEMHC6q+Acr+V3FMPrd6mHC613yC7gUdlDXYvh9OVbadzq8p/spSFkn9qQIy2UsX fJ4TGMApvJS1m/3BIPW7U3Q8L2tc7taR07sZAqXSv0iXx8Lg7CR2ECmb2gS5Q/p6sw2Gw9 wsL0NumCJ3xu/9ZlYni8Wcf7LHfIs/A= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b="1c/bZo2D"; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4WFDmk32Xtz9tVk; Thu, 4 Jul 2024 13:23:50 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Gfht+/mcjWMyUHlg9kllIj+qSNuX0vmSWQJZg9AkiWg=; b=1c/bZo2DFVFSh00aNqJfY3XUAtqix7N6b+uVXOKdORYGdc1xjJy3rTLQ5W8llczuo5DVgR /QYzeBYGO9EZUMHjLGiOg9T/NSGNIRjczm0WXDKG8IY4TiXehlnoyXlL1Xg/cef9ImXWtv 3paeM8k5/0NdSc1jmbFS2KNDToj0ZgKnnmY8x6qe47cVI9S+owu51lBGQM7rHZrg+vXQuz IqGwrPi+6BUvEh5AuiJF/DDwXDM8pFo00jsT9wD5G2xyaL6zb3lYAx6JxmPqluUU2mJGv0 7o/1N6pOtzmJ6yTRQZdO8lezDbwD4sJfBkA83zROIyjkuulMCleUtdZlft397w== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan Subject: [PATCH v9 06/10] iomap: fix iomap_dio_zero() for fs bs > system page size Date: Thu, 4 Jul 2024 11:23:16 +0000 Message-ID: <20240704112320.82104-7-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Stat-Signature: g3tju3u7qcw8hxr9rtiqknbypxhs5f3y X-Rspamd-Queue-Id: 9D25880012 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1720092233-759987 X-HE-Meta: U2FsdGVkX18MMOEYUXZOAWIDrG59ucr/OdU/3ijSlY5KeYGQ5t+m5DxLqRO/RBLHR9S/ORukbgS0ob2SlB+NyrkKarOZo6c3xTXkHMwLRABoYhb1oThfaHYORnVs+4c/ZbwoAQnV1ZMnl/SWKryOTklvcbeJkuHYOaZZWca+c7J+GpiYj9LyyhQ49u1Kv+azUbC5wXmUCPt2AA+3eYRZLrQIMpTWD/LMgkV980dzzZadi4t+vL0iIM0JHK856rCYa2VNDcznOGCfpeuPIHW26Xx40+7Kj3xRyQt0jGr6oj5WVjDXcl1lh1G2UKDV09AVjUqGRToSIlJdlFrNdyvRU87fWL5S88RJ89+7wa7cGBpyThdoJt89sOqdFxxK5ouPiIzIoh75o29712kblxh3vdiXGiIcGFFCxpCPIWzkacAswN9Ba+bGCxtk2niCX0k7AIikLBxRq5LhA3l4Q23VJj7xY7vPuypCwlpyGwl+zEPYEl0CP4nOQq9UyJtTZQgzMzpQNZaOZNhlb4K6ZwHwDhAjG9RVqVvxR+7dX4VWE5w7sxWhkAwzpy+jNw+DezTOTDUMcuXhfg4EGcz3zQNG58aGNBQflBfTZF9dEiATN88l5noVVpCcB5/wZpO7JkkezMsjfv6bK0imqwpj38lHq/oeoxr9qcPz7IIZ/HryHODzKps7QpXxH5XoDuHHncIh/+K1CbdgKJT67u3Hyf41J+inbxKWuQxY0aiEzq9DpOFiVkOu9OP0QYe23HDLLPueax+MO3ZAR3h0bGB2gvmiLTkOnnHf+IqdBxW8npq6wwEylXwRyCWvfJWjZechcoDnXLIWYTj4+UEZzqBKC4dvcHY09z5WhlMw1speBJpHWxFM1GbJ9ihpZT3ObipTs99fIY6NHr3L4sNhPwkIvyJReI7irlFHt/rheh93AzrjC0cE3COKYr0/cisFKBkr7y0KDgISv1i5IHLVPa0V6Dv b3FK5niM x+1uC6PcpNo2dBbaJXZZXoXwcwS1HcfNxBvxuQNTJyjaJeO66mBIvOWB/yRNbU53T0gsYcwTJNeObJ/fVzghQlrO8As3hk+WFMMGkyx8kEf0w/B9JKmHBZZ1+aFstKQntc28FWsOuXXt8ia1idK2HRPEvXDerp2NhRErO5wEknOxCG3t/BFyWdILCqIlOr6uXS4e1W/BTxlr9RgP8nZwwld+f6rHRzwbF8k+B4712OapBUGjCjOXzoe7cLPOx7C/0nY6TjL8LFSx0oDqxWBiAGC2jTPxA7JWLu6NRBrDKqxdMU+0sM77S6zVEbQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav iomap_dio_zero() will pad a fs block with zeroes if the direct IO size < fs block size. iomap_dio_zero() has an implicit assumption that fs block size < page_size. This is true for most filesystems at the moment. If the block size > page size, this will send the contents of the page next to zero page(as len > PAGE_SIZE) to the underlying block device, causing FS corruption. iomap is a generic infrastructure and it should not make any assumptions about the fs block size and the page size of the system. Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/iomap/buffered-io.c | 4 ++-- fs/iomap/direct-io.c | 45 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 41 insertions(+), 8 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index f420c53d86acc..d745f718bcde8 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -2007,10 +2007,10 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, } EXPORT_SYMBOL_GPL(iomap_writepages); -static int __init iomap_init(void) +static int __init iomap_buffered_init(void) { return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE), offsetof(struct iomap_ioend, io_bio), BIOSET_NEED_BVECS); } -fs_initcall(iomap_init); +fs_initcall(iomap_buffered_init); diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index f3b43d223a46e..c02b266bba525 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "trace.h" @@ -27,6 +28,13 @@ #define IOMAP_DIO_WRITE (1U << 30) #define IOMAP_DIO_DIRTY (1U << 31) +/* + * Used for sub block zeroing in iomap_dio_zero() + */ +#define IOMAP_ZERO_PAGE_SIZE (SZ_64K) +#define IOMAP_ZERO_PAGE_ORDER (get_order(IOMAP_ZERO_PAGE_SIZE)) +static struct page *zero_page; + struct iomap_dio { struct kiocb *iocb; const struct iomap_dio_ops *dops; @@ -232,13 +240,20 @@ void iomap_dio_bio_end_io(struct bio *bio) } EXPORT_SYMBOL_GPL(iomap_dio_bio_end_io); -static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, +static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, loff_t pos, unsigned len) { struct inode *inode = file_inode(dio->iocb->ki_filp); - struct page *page = ZERO_PAGE(0); struct bio *bio; + if (!len) + return 0; + /* + * Max block size supported is 64k + */ + if (WARN_ON_ONCE(len > IOMAP_ZERO_PAGE_SIZE)) + return -EINVAL; + bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); @@ -246,8 +261,9 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - __bio_add_page(bio, page, len, 0); + __bio_add_page(bio, zero_page, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); + return 0; } /* @@ -356,8 +372,10 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, if (need_zeroout) { /* zero out from the start of the block to the write offset */ pad = pos & (fs_block_size - 1); - if (pad) - iomap_dio_zero(iter, dio, pos - pad, pad); + + ret = iomap_dio_zero(iter, dio, pos - pad, pad); + if (ret) + goto out; } /* @@ -431,7 +449,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, /* zero out from the end of the write to the end of the block */ pad = pos & (fs_block_size - 1); if (pad) - iomap_dio_zero(iter, dio, pos, fs_block_size - pad); + ret = iomap_dio_zero(iter, dio, pos, + fs_block_size - pad); } out: /* Undo iter limitation to current extent */ @@ -753,3 +772,17 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, return iomap_dio_complete(dio); } EXPORT_SYMBOL_GPL(iomap_dio_rw); + +static int __init iomap_dio_init(void) +{ + zero_page = alloc_pages(GFP_KERNEL | __GFP_ZERO, + IOMAP_ZERO_PAGE_ORDER); + + if (!zero_page) + return -ENOMEM; + + set_memory_ro((unsigned long)page_address(zero_page), + 1U << IOMAP_ZERO_PAGE_ORDER); + return 0; +} +fs_initcall(iomap_dio_init); From patchwork Thu Jul 4 11:23:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34F10C30653 for ; Thu, 4 Jul 2024 11:24:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB6586B00D3; Thu, 4 Jul 2024 07:23:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B652F6B00D4; Thu, 4 Jul 2024 07:23:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B7ED6B00D5; Thu, 4 Jul 2024 07:23:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 78AD26B00D3 for ; Thu, 4 Jul 2024 07:23:59 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EC59B140DD7 for ; Thu, 4 Jul 2024 11:23:58 +0000 (UTC) X-FDA: 82301835756.29.C62E817 Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by imf14.hostedemail.com (Postfix) with ESMTP id 1E97A100012 for ; Thu, 4 Jul 2024 11:23:56 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=RqsX1L+R; spf=pass (imf14.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092218; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D613TVQCptAW9VynAGuVSOK0s4oeJXQu1SypShzHmM4=; b=4aDrl/sy5zZHTJ5jkdlthBK9giYuQeWmAwM4M+iIdieeJQObPowq3W1dfvaenMRfYN+bXD fkZTjmsp85+MjoRQKs+QEcVPYqjY9VKWJMHwa8/IJNeZasXJ/jvDLO9kkB6UTZQLb3GCMG 0hRsMISy7I9TVoDFbBX2Gcvt3uZN+uE= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=RqsX1L+R; spf=pass (imf14.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092218; a=rsa-sha256; cv=none; b=qFxK+Faqhmqxby1PUhMzI4i7lFJcmIGd7/Ldd36o0HEkTej4GD9LEPN2STAVV9NkJwaHgG 2NhqJT2Q9zu2lxZQNBna8vLt5IvY240xrzYviwLXMXGVpJaYYUATw5WAiBIZKOs7hpeUZy NNmlNgQzaUN3fyaNxdXWN5eXf07GIGk= Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4WFDmn5pzdz9sqG; Thu, 4 Jul 2024 13:23:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D613TVQCptAW9VynAGuVSOK0s4oeJXQu1SypShzHmM4=; b=RqsX1L+RvCdjBZycaeWBvEWTaV3SYjEP7XegGBKhcmMCtOiOSmxoMzmYYuCkvFVshaczSy PQnwsecosj2dPaGVL6Y4I1/FJqCLOsarUwSnwjfB2FDVtLuAt6/Inww2x3vpbmwQhpJroF vgtUaC7DfS8locLUgvYuaecnTEHrwYKREpRna5EypLLSaEXJhYFoaZcenAITFqyev+BWpd J6cuoxBI7poy5xGJ21ERYEOUtKn+3O8aowAOjCKwXwJSwAeyRUj99QeDntaATsCMBe7Iwy 5YAOpzBaeAc6DcTyi9dXZruGWF9rCgPwsvKlPMkiAW5SBQOU9tnyCCITPgSOQA== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan , Dave Chinner Subject: [PATCH v9 07/10] xfs: use kvmalloc for xattr buffers Date: Thu, 4 Jul 2024 11:23:17 +0000 Message-ID: <20240704112320.82104-8-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Stat-Signature: 8iu4ikh84979kfw4frbybjr3n99urepz X-Rspam-User: X-Rspamd-Queue-Id: 1E97A100012 X-Rspamd-Server: rspam02 X-HE-Tag: 1720092236-337174 X-HE-Meta: U2FsdGVkX1/DT+0cWepgfiQJ0fgJCUOURU3YCp05uPc/2HmaGIUpoQMU1xBWdCzpJQ31Z2WGXr8/2uQZKE5LCzYVE/zBFkvoobsSRmXoREmRgwTsZSx+AsrGyDn4HwqDXAycKtXYaybPwT2X9VEI8cx4UUY7p+P0s8rTHUH7N6qYGuBIkkS4TK4NHgyJoKgDKj5L1yO44jzWxnQR2IepXGtSXluAyePlfx+ROBc6TNN9Vrgr4WOnpHPTKhrFFPfnvU3d+N0T+ksK+kE4QIdtPudbTvwHbGP7jUBZDcWAKAZScpVRQNUOUq5q7EoBtLkHS+N6h7+WJnECf2TQ+KbYP+wbteflMPcoAsWPytvvG2etVHp9Wt5Cwq1Yg8Kmjtg89hFOS/rAMSfN0xlMjPNtVXuONNwCX8gvSoK7xDeXrAswEycW/Sz+8qgbzs24z4+2G444zPFq+9Ix/tvmB/qQY0RoxSSk363QJn6qdgxj+t6KVdU1RAaP+pJZDAQk8jT6PtCOp1TZhyI76lZ9N1/QcYc7EYj3bBGAoYPrRBybUN1YUcM80ffwpoXeyDLKxXzHUAen/9E1itMMUHKHAK70oNsDJevrXWV/yP2Fu8dqseUIGBcN2PYOMxkgWC59SlwKc9V9MSIvI8EW4qNecRXqDAXr72/RFutH96DLXNW9ywa12E+z6B6VeGe8pcjpVtCrgw81Ntjos+LR8v3HthrAlutdghOwqaf80GHhzCeblBDFsJs6nmrvZU10dmlbNV81vq3tEtTj35A+U3f02P13mjKiULp25qxjHuAqy1SwH8zdNEVem5S2KKIrqB6nerDGsO40M2jGYykRmEyLNvz79vLfM94SgkQlQe+H9sOoEnmuuiGtx2m/SftbFereZ04rGnnoOYQ6Y+4TDDXXVZEEJtii/t4NHtHTSSph6n40koobb9dlm243qMQpM9jAbQ6xhbacOh77jn+PhpSBC4C MN7jj6tr xuquJgS5c5z/YK6cgI4GWuXKLwWAF4UYF5Fs3kGpFdZhfy/3MtmpSLcPOHaIvaL9U/8nPJkpdEJWri9hD2yCzSzRF4CVv1ALb0oeCWTN+mSSxKD0ATTCoOJeFdHiENyN5ffVMngTN42DXo/qBa4MjrLahXdxAZksP2tbtkwEiQscfoxwfX1zvcjd60ujoW+xE3QchnugQTMAJw6NbHa9QVHBD6oNgDo7MUTqOlwaP27tYU1HwyqX4R4afn+hAp7UPzWEfYDfpXap/N6VpPJ8YXOQZXHWa7K0ltA6GKcgCZpAmAowYtQt7PIcvfqtT1R3XjV04jyL/G6BMruA01T1GidqcbttWVyGyDHTiAdR/xNFj8924KOBiUooP/LEicEAcBko0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Dave Chinner Pankaj Raghav reported that when filesystem block size is larger than page size, the xattr code can use kmalloc() for high order allocations. This triggers a useless warning in the allocator as it is a __GFP_NOFAIL allocation here: static inline struct page *rmqueue(struct zone *preferred_zone, struct zone *zone, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags, int migratetype) { struct page *page; /* * We most definitely don't want callers attempting to * allocate greater than order-1 page units with __GFP_NOFAIL. */ >>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); ... Fix this by changing all these call sites to use kvmalloc(), which will strip the NOFAIL from the kmalloc attempt and if that fails will do a __GFP_NOFAIL vmalloc(). This is not an issue that productions systems will see as filesystems with block size > page size cannot be mounted by the kernel; Pankaj is developing this functionality right now. Reported-by: Pankaj Raghav Fixes: f078d4ea8276 ("xfs: convert kmem_alloc() to kmalloc()") Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Pankaj Raghav --- fs/xfs/libxfs/xfs_attr_leaf.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index b9e98950eb3d8..09f4cb061a6e0 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -1138,10 +1138,7 @@ xfs_attr3_leaf_to_shortform( trace_xfs_attr_leaf_to_sf(args); - tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); - if (!tmpbuffer) - return -ENOMEM; - + tmpbuffer = kvmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); memcpy(tmpbuffer, bp->b_addr, args->geo->blksize); leaf = (xfs_attr_leafblock_t *)tmpbuffer; @@ -1205,7 +1202,7 @@ xfs_attr3_leaf_to_shortform( error = 0; out: - kfree(tmpbuffer); + kvfree(tmpbuffer); return error; } @@ -1613,7 +1610,7 @@ xfs_attr3_leaf_compact( trace_xfs_attr_leaf_compact(args); - tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); + tmpbuffer = kvmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); memcpy(tmpbuffer, bp->b_addr, args->geo->blksize); memset(bp->b_addr, 0, args->geo->blksize); leaf_src = (xfs_attr_leafblock_t *)tmpbuffer; @@ -1651,7 +1648,7 @@ xfs_attr3_leaf_compact( */ xfs_trans_log_buf(trans, bp, 0, args->geo->blksize - 1); - kfree(tmpbuffer); + kvfree(tmpbuffer); } /* @@ -2330,7 +2327,7 @@ xfs_attr3_leaf_unbalance( struct xfs_attr_leafblock *tmp_leaf; struct xfs_attr3_icleaf_hdr tmphdr; - tmp_leaf = kzalloc(state->args->geo->blksize, + tmp_leaf = kvzalloc(state->args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); /* @@ -2371,7 +2368,7 @@ xfs_attr3_leaf_unbalance( } memcpy(save_leaf, tmp_leaf, state->args->geo->blksize); savehdr = tmphdr; /* struct copy */ - kfree(tmp_leaf); + kvfree(tmp_leaf); } xfs_attr3_leaf_hdr_to_disk(state->args->geo, save_leaf, &savehdr); From patchwork Thu Jul 4 11:23:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723628 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B331FC30653 for ; Thu, 4 Jul 2024 11:24:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 461F36B00D6; Thu, 4 Jul 2024 07:24:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 40FFE6B00D7; Thu, 4 Jul 2024 07:24:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2633D6B00D8; Thu, 4 Jul 2024 07:24:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 04DBE6B00D6 for ; Thu, 4 Jul 2024 07:24:03 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B6C5FA0DE9 for ; Thu, 4 Jul 2024 11:24:03 +0000 (UTC) X-FDA: 82301835966.15.288FE2A Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by imf05.hostedemail.com (Postfix) with ESMTP id CE7E8100027 for ; Thu, 4 Jul 2024 11:24:01 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=00kjlnBk; spf=pass (imf05.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092222; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1hpvVZhxiYoy4pHnc2n7weXG4jb43hSIPB2EVq7bVws=; b=WEC8sXeyx/6+UA6W5w43MsquqAxm5p4BOmS7Yc1/XO3rv4y3RNzGDgS9Ppn8T9qvQfbtkh stge9TIsm6sVCVx7ziYviLt79kX94kN0pqbR3UBZ5Pl8SuTjHBGsHMzlOqWPcHRLvFDiFs KWCpE9bZc3CghZ7MFxpTaYvOvmMGp7Q= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=00kjlnBk; spf=pass (imf05.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092222; a=rsa-sha256; cv=none; b=yGbtMMcaBs3soJVeX5RHC/shns4FxzMWdqASjJpwdU8W5tKz4cqPEwje7aIK5tTrjL5dMr eenCwzLRYn1mB6w/lYUlr7g3dKiC5RYADEFw9jYQD63gDPoarG5PgJfZKXXKAILky5ycSh Jyj+A4mY0dsnoWa1edj5A2BUHMZNUVY= Received: from smtp102.mailbox.org (smtp102.mailbox.org [IPv6:2001:67c:2050:b231:465::102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4WFDmt3YWKz9sTk; Thu, 4 Jul 2024 13:23:58 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1hpvVZhxiYoy4pHnc2n7weXG4jb43hSIPB2EVq7bVws=; b=00kjlnBkN+LFlFpx4Eq52EeamnWssvWwfljypGOm40D8wTb6EFYbFsfO4ZTKtAU+tkUyLD imYXE7KY44rOVP91eM+h1IFMWXnWBKPm71KVJaME2OcNTGJAnYRtVHREDEwyK2/Ozp4AIh HGejFkLqr9Jh3YeQEVFddQ2fM53b/Aakqd7mMAW1yONbIbr3IYBwcBpb7ULEWNivySKijP dBMS+vcw0iEwr6ViD07vvA9/F2AzHkarFSpNnn2fgyWvMFK8N1/T/4AX26a8PJ2RkWRrVG 3q83gwfRzypqiqPaZxRQZga3SmyJr6cl0RP3UEebVQXHww4CBZ/FCZeVp6shLw== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan , Dave Chinner Subject: [PATCH v9 08/10] xfs: expose block size in stat Date: Thu, 4 Jul 2024 11:23:18 +0000 Message-ID: <20240704112320.82104-9-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Stat-Signature: 7kszsxu9odwdh1mrj9ckxnwq7mzcoxm9 X-Rspam-User: X-Rspamd-Queue-Id: CE7E8100027 X-Rspamd-Server: rspam02 X-HE-Tag: 1720092241-305043 X-HE-Meta: U2FsdGVkX19EE+0Rxa8EYWmzR8typqT8EfXSEt8lIxgCnOAGGdoja9PmKPXLtnFwT7ywrMT9qM4wnaW6+Y70u+Th/yU4YJ4d9POPLODZs0oWClKkWFNgRLZziJEUGtWI/bMDzjOpT90+1VEtyA9J82dpcXW6XVtOJbQhffSXobIvJDfbOAeXvz8MY2Ex8Oa54adGd1tiUzLS8EresiHOpv4O0cYdgq2n4tHjW7ARFuMgj3XwLWbk9+9dFFOA9yyRR50Pwe++fWLXdXJMWYAJgCFGUib1LnW2S/v3Zgi08FyiA5GvxuPPlCQOP1SoXFgZsKmyenHhbutEuPJUxHJ97fLgWveGFzpBErP9t41yySHLJbYIeE/X09Yo5deLNNkASApNpZq41ioWXl7BB+yhZP/thpQbUZ7ry8RLk/4SjGw0T3kq+kEnQthky7NRxl6LY6OQ9Wf5esbFUstzbX8t+RBY6alDyWb+i5BN498Eh3uNiLE60tOA+DRdTwf3niYGRroag6+dD5+c5sWmQON1FGRHygf2icOHn8ZUEvtj7nryS6ex5mNA59ptH3qKPoM5z6cWEMPcV8NHJlwKKV96DsoXbBp4xLJhFFep0xp5BMIzmRylIW8vEhUVfWLZuVkYm/0zTF39FORhHC5EWrVGu04Knp6EzBJ6lmdnkME9c8eyDlCqQYnlDwyYhMDYxpXCsjSTetKKTxgcMAL3xYWqWVEw6oG+tMV28nuejtVQyGYn2vZ3uz/7gQaKoflZMu40pzqt+2WMmKnCgndRWiZjO6Zc8RGhopR1fNmaQfn9/5BW5BDH70YTR237VJOgpDkjT4SYjjeS3u3S/jn0r6vxiHYlH7LeCAKvk8FpRXRH6S1stKD2Y/+EIF5iYjOCrvQJ6jvinzOI+7UUevcey7kn3xHoJ7LtnngnDiDW9KTYOMte8ATOMjWU4JN0yY4xkaCiJ8aWUqXP8pTTIZuI0tE RRNW+cjL 88Y+JPUpJt3mFnuxl9+nCSxXqqWwQEsNWR9ygMBA+aodqUe9E/d/OoxqHxeYAWYMoi2JEiHLX/QECvTWErPCNNgy2vxlvFixoz+1UgopHakUGICrElMl3lkXH/Y92Oo7HFbgnsHP7jfkcng6WGE1ycs4ekTQfhrOY01p8Ze3UT30dM7l/c4+H7Awz8ibRAVIEh7yFxzj2Cy0q1bb7jtRSVAYwOwSCqhLg/jrRQ5grob5V7yVBrtsQQNmUw4hjaM/9vslpU9KBLM/c11QLT/BrdOMWU+Kkj95hdpMDFo4uvsBI3M/cxYHbS5V+2EA+XI68l26I0CiQxGTzgLqDOvW8BgpHoQB5GtaIleJbznVtor4hgFnpT5zj1T6a6xERCQt7S/T+yyygNgbJJlbl6okNFkpkGwXXdkL3FlQpinEhcDN5ZHO0ndIbSrjtXGwFkhhtROWkwCN71b1ypii2zVgnyK1xog== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav For block size larger than page size, the unit of efficient IO is the block size, not the page size. Leaving stat() to report PAGE_SIZE as the block size causes test programs like fsx to issue illegal ranges for operations that require block size alignment (e.g. fallocate() insert range). Hence update the preferred IO size to reflect the block size in this case. This change is based on a patch originally from Dave Chinner.[1] [1] https://lwn.net/ml/linux-fsdevel/20181107063127.3902-16-david@fromorbit.com/ Signed-off-by: Pankaj Raghav Signed-off-by: Luis Chamberlain Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/xfs_iops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index a00dcbc77e12b..da5c13150315e 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -562,7 +562,7 @@ xfs_stat_blksize( return 1U << mp->m_allocsize_log; } - return PAGE_SIZE; + return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize); } STATIC int From patchwork Thu Jul 4 11:23:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 049F2C31D97 for ; Thu, 4 Jul 2024 11:24:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9212C6B00DA; Thu, 4 Jul 2024 07:24:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8CD756B00DB; Thu, 4 Jul 2024 07:24:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 746A06B00DC; Thu, 4 Jul 2024 07:24:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 50B5D6B00DA for ; Thu, 4 Jul 2024 07:24:07 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0F94381258 for ; Thu, 4 Jul 2024 11:24:07 +0000 (UTC) X-FDA: 82301836134.11.80C3B2A Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by imf23.hostedemail.com (Postfix) with ESMTP id 656BC140008 for ; Thu, 4 Jul 2024 11:24:05 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=MAKSFZlI; spf=pass (imf23.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092220; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8rdOu+47Xth5hXKbmOu06fRFiMFY0Pwf2tkrEBizRPk=; b=0bYRmlZ5qMLukdHRrGxIiUVqhHrjPbnNAlZt18v7lpRBPqqzaeWZGrC0X3NSsSxavpVhZo kfvUwtG+bTbv0JyzvAO9KUPqb6Mj0+F7laqcZKJAJ//i75CYjDB8YbrYXekGYDaGyo8eO0 sfkkEC/q/DRnn8Hm2G/tdU3jmrcLlWM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092220; a=rsa-sha256; cv=none; b=GZ6i8bAOJi9skSzJdSjbZvkYLVgdbs5jvaQkr8Z+70l9bZBQ0s8E6s8IMLoaClJsJnRpsY Guj+YHeRgLE5z02Ue8Fw9jt2cVInu9QIj884xBIgLOmLQ0iSLTENOlFftuwR7zEfUR6+Su 8y8nVqEYQ+5ygJcC0QBvYuf8q2keZXU= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=MAKSFZlI; spf=pass (imf23.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4WFDmy24b9z9tVW; Thu, 4 Jul 2024 13:24:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092242; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8rdOu+47Xth5hXKbmOu06fRFiMFY0Pwf2tkrEBizRPk=; b=MAKSFZlI7ZaAEYyHpa4m+b6rHiU+P6QyeLCjZz23UGlI5WftoQSBRtyro7pc9FWT+MVPTZ kfKxyyXsGG9jZvTH11CKc9vmmNALVet9Y+7E4B5uSi4gC6mPXQUtTJnRM/JHxSECvsY+gU pGJWw57f7jZQbI1thzo6E30jiHbzhQ/iRu5h5APgIs+q8rRhj+zrnctYxJphX872+JkCg2 rycOGh+225y6SHWO9xoN2nzzQ1LpK3UVAqWeLnjHldadfIH2mjbp5D7bcFTv2OpMzmgqWO VYqqRkLd0wF9NvRr5dW0ElnU0AKRa+o0BgzLlkMmwEW5UVXUk0QDizQhRF4Tgg== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan , Dave Chinner Subject: [PATCH v9 09/10] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Date: Thu, 4 Jul 2024 11:23:19 +0000 Message-ID: <20240704112320.82104-10-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 656BC140008 X-Stat-Signature: ea7kipneeo8xscmq9co9m4znow9pub86 X-HE-Tag: 1720092245-724238 X-HE-Meta: U2FsdGVkX19KIZ+qMNLqmMzxulkzn11pD0hgmGNAl/fYm4h9VGOxiWEQvY4Tn1cry+Mnk++kKpQ8nHYnvhPLFrHmJxmkDILc7CvIAo2PfmSmCHnkEwpOfkMwFF30R+1l+ZyVcXKPi1ij3X0/J1mi1hZvHJfoqBrczErB1g5oxeQa5ZfCnw5iLK5GuBZF+KppX0yXfERox/auTLosZW0M5LRcZYyCDDC04ehllZ5HJhxCPTtCvvcx5hIIr+SdEzDJud+W2YkH+PLO59AjRw5qqYiEuP+G0V5XTi9IypES/9ML93tr965zwrM6+69GoCcLIUn/1kLveMJkm0m75uyR6HAkYA58Y7tiL8LZVGVPbMx4D0XjOvQ3krmkPtyBOBDGtEgUYju6bgZvZlOaLMTiRLIagTMzP0SbIzoCwfT/fRIubZR29i/cXcIXvtk0Vo0/ICMggUIyP8Ikto+uv4AV7gueGF6wZ6v6G4wESSOlkVNEDfrBY39pAePPioB9USRE999knztEuSY3XJt07qwcKiuxc3YgHFx4RKx4NauoDQoO+GYSSt8ASprP154drKaMamW0FXBiqM0u8FExZbXAJZV0Re/rnSdrlSTC/9w+6Dci5fHbaoiX/kibl4nGg8MwoV6YgEzJiXj9mqkpiOckit2VHy1HiSp8iRO5vEZ4KHxJy7EDBc9tR0EUUGMUx2Wjwys4J9Ds18esZ0i/6LpCRzUQ4dBcsw8qASOT0CZeMlUY4N8psJGybGMqVQkTH1pUH4lg5w/XwDhlJjmfJ6GoYUh1FXnaYPm6uSc1w4NjE3go2bzh03ygWhfaunkd4w0WT3iIqX7zhVfxiGn9QwEMYC1iIEKVgBVEhZ9MDFpq/PaauG8C/T7hc9DkYDyOauadR6QmVAOfioEzL9HRIQQ79RIkSyoDIt2Vj4YCfIjwvw8niIt2FNVnjwSXhmCj9pfHOfog5mnBrY+EgeWhcvA eZ6OgUN6 WJAwbqoisHlkSacBbPXqbxTMa9SBSOLgrBIz20lenWdVgd8UZhcEBAkay4WXtCXA9jTiaqP6UCiI9O27czGvfsEFjCaOtRsy4oIHAAXC8WrwP8XM3zReuoQ0mTt+9EhHmSSCaRjlS4hmrLqeLdXS7e3RZTyDhu6/XrzjWlrBtPqpheIu86KaVi0HPHGSDH0ZO21oHR2eeSP2sMLS5SgaJyy/feRbzDFmq+SRCFRoZd2N0qcw/vToD6SVCqK5cSSicSCPNhjwll961MkSRcKPeIg9yATF7ph6CFcwALd3pndDCWvdyIKaoDy/IvPNArEq0TgXlAQCsQTjGc5F2fnsF6EeoZzAIPCHlz5+jXmp9kCgH7zX06AO+9HnFBw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Instead of assuming that PAGE_SHIFT is always higher than the blocklog, make the calculation generic so that page cache count can be calculated correctly for LBS. Signed-off-by: Pankaj Raghav Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/xfs_mount.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 09eef1721ef4f..3949f720b5354 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -132,11 +132,16 @@ xfs_sb_validate_fsb_count( xfs_sb_t *sbp, uint64_t nblocks) { + uint64_t max_bytes; + ASSERT(PAGE_SHIFT >= sbp->sb_blocklog); ASSERT(sbp->sb_blocklog >= BBSHIFT); + if (check_shl_overflow(nblocks, sbp->sb_blocklog, &max_bytes)) + return -EFBIG; + /* Limited by ULONG_MAX of page cache index */ - if (nblocks >> (PAGE_SHIFT - sbp->sb_blocklog) > ULONG_MAX) + if (max_bytes >> PAGE_SHIFT > ULONG_MAX) return -EFBIG; return 0; } From patchwork Thu Jul 4 11:23:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav \\(Samsung\\)" X-Patchwork-Id: 13723630 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CEFDC30653 for ; Thu, 4 Jul 2024 11:24:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A3376B00B6; Thu, 4 Jul 2024 07:24:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12D596B00DC; Thu, 4 Jul 2024 07:24:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE8F76B00DD; Thu, 4 Jul 2024 07:24:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CDCBA6B00B6 for ; Thu, 4 Jul 2024 07:24:11 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4E571C0E52 for ; Thu, 4 Jul 2024 11:24:11 +0000 (UTC) X-FDA: 82301836302.04.289CF5D Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by imf30.hostedemail.com (Postfix) with ESMTP id 836D18001A for ; Thu, 4 Jul 2024 11:24:09 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=oBjypVax; spf=pass (imf30.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720092225; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nv7t1+SeZeFQOoZa3NhjLEVFIBlYMe3kqcm8FqgaxW0=; b=mBEypkvzBClv7KJxkSkBrlyXIarFzJYRyybWPhJjbZEV+qiiUlFqLT++Wk3lYCN4q/rlO5 g7KGA6KTqsQ0a3RC6xS291MsKjjPpGYuT+qO1MNOTOqaLq/W0YV/e1DVF7Sli5Y3+GNib2 3uEhRelSxXDSuE7Gryh6CTVQ5NQVifY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720092225; a=rsa-sha256; cv=none; b=3ulAEKPg8BGNHdVJhCJdpeZ0PKwaLUbQddLc2dNuirEIFjytUVHRHL15V3Hkam5ft5G4sJ rAheybDeUTRpz+aeDch9LrKAwfCwBVKPhXMCPq9y+GWxLLHnI/I+LLLqYsj82TyxzS/936 GMlg51p56unJD3di4jy+xt2dG/PLeQI= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=oBjypVax; spf=pass (imf30.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com Received: from smtp2.mailbox.org (smtp2.mailbox.org [10.196.197.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4WFDn23JWSz9tKk; Thu, 4 Jul 2024 13:24:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1720092246; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nv7t1+SeZeFQOoZa3NhjLEVFIBlYMe3kqcm8FqgaxW0=; b=oBjypVaxqJ1LHw3Y9yNKjm7fdKkn3SIqDFYY6OEEqE+hDdibf+1lIRmUFuqrzhEiB5Vs+h 4DPN1I9ZeW4p3vRCe7DycD+Be8/UfCPP1PirdIVgPMrsV1g8SRlYGsHAR4ZgknqqqQuIbq Vn4dm8CMLGtq5aiEFEOkqeq2+DzBNitczvEVvwmfX5kraSdYuvhu/WlwTCp0xTl6l5LUyb brq1I1JnBgyGWOQd5z3M/2FQwpNcq3Wzr/Z/zbr4y9F5JRRroT9IBQW6J63SX4ddhIpY/Y C6LnNjsYXkFuk/HdxqO9So4eOxecg4OmSvD/FO6s8uVE1NF3gPQbvL9VhxSb6w== From: "Pankaj Raghav (Samsung)" To: david@fromorbit.com, willy@infradead.org, chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org, akpm@linux-foundation.org Cc: yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org, hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org, gost.dev@samsung.com, cl@os.amperecomputing.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, Zi Yan , Dave Chinner Subject: [PATCH v9 10/10] xfs: enable block size larger than page size support Date: Thu, 4 Jul 2024 11:23:20 +0000 Message-ID: <20240704112320.82104-11-kernel@pankajraghav.com> In-Reply-To: <20240704112320.82104-1-kernel@pankajraghav.com> References: <20240704112320.82104-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 836D18001A X-Stat-Signature: 3cie9zdskjj4dpixozmekwkwhgo4i745 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1720092249-179799 X-HE-Meta: U2FsdGVkX1/DxMlf83q2/FkoMszKLD/KwAS/FEIkmRu1lxYPl51N1cjaH3GNwOSljdooxxR3ohbNLZbjKQ/PSGks8CpLaty54bsmjIB8PJQquc9Ibxh6oLsPjx4pC7JPQswK7pJBZ5FWU+Ptx5eynp9eIsDzIMFqk+De+8uYGuEEDVv1185UHM3KCCrft2quu3gfYXGOFJQ9fY/FqIMTxRr4AqdmD8tn2JRNw0L1y79YXjSSVHAs7VfMkAe+BHgyP9+pnFN56IhKEoGeZjS6xoZvEi6zS9hdEbWsfbgtjmiNf1pe92SqGylSJvrPxT39FkHDXIsRjXnisCi+oAYl1cr1J4clhq+MNHE/O9WQHYwJjayn0foJrVQglzcXM/J5ZKf8n+XS7rhemlJMr27qqkYS4sgqpCz0akTS6nQKQJDBJXwxAuNS+Oy0zNviCmxQi4jyb21sU8QPCeNbGURn6uaBuraAb/Cy1fKn9QsVjEwouuQi3pAWb30DR4HOVHvNtpDZJaQH4PwiRe+/EMh/HS8DojruincSV4NFxEpraXa1essJTxjWwRSm8t/rW5uynbKJovOVo+Zj0XC39lunbjMZnI4HEkL6EzHrIGRLvareGWg3Zl1ppZ9K2EiU7cRZk8X0yUf+xf7VrITpIV0IlfiZz5/BDEQdf4p49+o2T0YvqN7WDf6tn5YBOmSvBwqdOFgE/Ra7R09KCTurblGYp0OvV++QBqBxys/mdlrFLImlqFBk+kF4GgvlxD4W4ayIs4eWspZ47uKC2s3HoLvooEIdH2VF4oU2UluR3GjH6nGPQ1gEnW3XZ1hcRYT200KpTQkWBKmIkl672Y+kg9d86boS+am3RzEwMt2HRJEVC56txMmSXhj/Hcu5i2Q8MuakhHhv5rO3E//VpB/2o+IKGB+qWRxF79mHf2BHiP8JjYsmo1UgoMzpR9IR5Ag4FPbAP/ZAvtaVzXE+6jrgk1o e1px9kjV hFKmDH95X8hZU1girSvrpCPWk3pky9UU4l1QpAK1Uv2+az+kzPxHPwraj9PQ9PSCHQiuXuFOoVkSGvJoXIICD0MYO+FSzU/4ZL6vvXsU3bfKJc3d1VZLR4QHbZy8pEGpvjGlaoFm5j9J9A4UoZQOYElpEgpDCZVH1gbU2EJ3gG1pwq4fEy1uvkqgZM//n14Y3QPNMXHrgAIZWg0QMeJ6s67/QdRQmr58gEC7POu0+867w1MY0WHEkyig9eqqZVUR1x6l29ehZ81/+JJmEGRlBoCzm9tCZE6YTCJuvaO+6CIZtgcHFdy8r2d+jZZAX5r9wZwF+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Page cache now has the ability to have a minimum order when allocating a folio which is a prerequisite to add support for block size > page size. Signed-off-by: Pankaj Raghav Signed-off-by: Luis Chamberlain Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/libxfs/xfs_ialloc.c | 5 +++++ fs/xfs/libxfs/xfs_shared.h | 3 +++ fs/xfs/xfs_icache.c | 6 ++++-- fs/xfs/xfs_mount.c | 1 - fs/xfs/xfs_super.c | 18 ++++++++++-------- 5 files changed, 22 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 14c81f227c5bb..1e76431d75a4b 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -3019,6 +3019,11 @@ xfs_ialloc_setup_geometry( igeo->ialloc_align = mp->m_dalign; else igeo->ialloc_align = 0; + + if (mp->m_sb.sb_blocksize > PAGE_SIZE) + igeo->min_folio_order = mp->m_sb.sb_blocklog - PAGE_SHIFT; + else + igeo->min_folio_order = 0; } /* Compute the location of the root directory inode that is laid out by mkfs. */ diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 34f104ed372c0..e67a1c7cc0b02 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -231,6 +231,9 @@ struct xfs_ino_geometry { /* precomputed value for di_flags2 */ uint64_t new_diflags2; + /* minimum folio order of a page cache allocation */ + unsigned int min_folio_order; + }; #endif /* __XFS_SHARED_H__ */ diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index cf629302d48e7..0fcf235e50235 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -88,7 +88,8 @@ xfs_inode_alloc( /* VFS doesn't initialise i_mode! */ VFS_I(ip)->i_mode = 0; - mapping_set_large_folios(VFS_I(ip)->i_mapping); + mapping_set_folio_min_order(VFS_I(ip)->i_mapping, + M_IGEO(mp)->min_folio_order); XFS_STATS_INC(mp, vn_active); ASSERT(atomic_read(&ip->i_pincount) == 0); @@ -325,7 +326,8 @@ xfs_reinit_inode( inode->i_uid = uid; inode->i_gid = gid; inode->i_state = state; - mapping_set_large_folios(inode->i_mapping); + mapping_set_folio_min_order(inode->i_mapping, + M_IGEO(mp)->min_folio_order); return error; } diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 3949f720b5354..c6933440f8066 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -134,7 +134,6 @@ xfs_sb_validate_fsb_count( { uint64_t max_bytes; - ASSERT(PAGE_SHIFT >= sbp->sb_blocklog); ASSERT(sbp->sb_blocklog >= BBSHIFT); if (check_shl_overflow(nblocks, sbp->sb_blocklog, &max_bytes)) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 27e9f749c4c7f..b8a93a8f35cac 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1638,16 +1638,18 @@ xfs_fs_fill_super( goto out_free_sb; } - /* - * Until this is fixed only page-sized or smaller data blocks work. - */ if (mp->m_sb.sb_blocksize > PAGE_SIZE) { - xfs_warn(mp, - "File system with blocksize %d bytes. " - "Only pagesize (%ld) or less will currently work.", + if (!xfs_has_crc(mp)) { + xfs_warn(mp, +"V4 Filesystem with blocksize %d bytes. Only pagesize (%ld) or less is supported.", mp->m_sb.sb_blocksize, PAGE_SIZE); - error = -ENOSYS; - goto out_free_sb; + error = -ENOSYS; + goto out_free_sb; + } + + xfs_warn(mp, +"EXPERIMENTAL: V5 Filesystem with Large Block Size (%d bytes) enabled.", + mp->m_sb.sb_blocksize); } /* Ensure this filesystem fits in the page cache limits */