From patchwork Thu Aug 22 13:50:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23C87C52D7C for ; Thu, 22 Aug 2024 13:50:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A68F88002B; Thu, 22 Aug 2024 09:50:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A184B8001E; Thu, 22 Aug 2024 09:50:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 81D108002B; Thu, 22 Aug 2024 09:50:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5DAF48001E for ; Thu, 22 Aug 2024 09:50:43 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C10511A167C for ; Thu, 22 Aug 2024 13:50:42 +0000 (UTC) X-FDA: 82480016724.24.AB0A589 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf25.hostedemail.com (Postfix) with ESMTP id E7AADA000F for ; Thu, 22 Aug 2024 13:50:40 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=jfXFi1QO; spf=pass (imf25.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334624; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bCh2Mfck7pG+h4bk7FvrFyJ1HyWBpBOqTfsdjvoqNHw=; b=QJqp5HgGDjO3RNinZhe3uzW3EOzCBvBB//V1Lqz94zW6Ykha1AVCa9I8OEZ7bI2Sc6OPIW slDMbKXO4cwRnqrv/NyoSKZFTWyCrLMLUylevN7dU99nXYDfRvg0Lg78eyeukSo4YMChkT ApsnM6VRXS/M9lAnUKnR0LyCofBzyWE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=jfXFi1QO; spf=pass (imf25.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334624; a=rsa-sha256; cv=none; b=46q8jjki96Gtx0J3Kuu99yxlEHEpNAF6cDZzouG+Eha+04IfjoayLPMYmntqlsM7Y0lXX3 azH/Ptfzj0hvOGOCBTeAE7+eDgTrKqSF5oCo4+UNOqEeT/LMyDbS2e19NomU74gKTatz/Z iVOzHVvosI7/zpbZi9FEAHu4sxCRxMc= Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:b231:465::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4WqPjS6Rdkz9stn; Thu, 22 Aug 2024 15:50:36 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334636; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bCh2Mfck7pG+h4bk7FvrFyJ1HyWBpBOqTfsdjvoqNHw=; b=jfXFi1QOy2xu2P8kWGm+or+GzRmOhe624PufF6krIKSFYWzrR1hIWMJuHtdKa46+5ZLQVu HoYvm2kVAqvaOhi1P9SEM7wwXs0ZqfaHwFajS5mtzldEl32u0wB519WfZEkEYfcv4Q0fe1 vEpQnUxVAkmci7r0IgZyq48pYgMRnZwCWQsNtiyepw1M5cLeAZHcjvc96RfMtfI30YbIki /J4pPwCLFCXGjpB6qkQ0iIxxxxx+z/4/qA2YMgF17Y6iISWAS8vUtaakEFsdQYw78Unz/l XncodzkjmCzfOKn7QhlBCQ6sLBGPjlHynhkfkFogXUOrG09PR9MWnNJ3VH2jIQ== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, David Howells Subject: [PATCH v13 01/10] fs: Allow fine-grained control of folio sizes Date: Thu, 22 Aug 2024 15:50:09 +0200 Message-ID: <20240822135018.1931258-2-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: eqzwdzfaiqu6q5gkisuaebesubiwypdz X-Rspamd-Queue-Id: E7AADA000F X-Rspamd-Server: rspam11 X-HE-Tag: 1724334640-983826 X-HE-Meta: U2FsdGVkX1+IkGLLpLo2k8dPoAs5jov15M03XFd89s+i8+YMiWERlTZFbl9LMOspfqocgVz3v1+W5GuvtBSncK/THbfpMRHRi5c6CMp7ePFWSsorB2OxGy6ixeuXU6pGkg3UmQ8+OieGLaBXXCCnXG1NSS1p0U1H+ha3FzlxoEgBf7WyrvuXrUM4bP4XHPEbIRLZbCUBb6FLkcIu9uvwm9/s3PHYQh+LJ2YvohdeBD0+hGWI3Aw6ieQabo0P8xLLQeDM6Mp+/WS9XmMcYyv110fM+cOdFkvbUL22+unfj3U9KnPbmm5U9EIlPeeX+vzRqG5l/GP7jqzfMzyqyRRII2vjCYqVd7PWlfI2IUAbIu/w/NU4cQDrXpI/i0imtU0JbfwyP32988Cx+eK3ANuTEquvm8VhPscm68zWa+Dml+lDaWeEYU0wFQAoyH4JjM7j31IjMN9hxSmMgeY9PHQbm0DM+KnMmzzSO12WHjPSVDRMexNiGtCfOh3O93HrbtcQPkR+rj5iC27CY/dU0/VkL1946Rul/RY4pxsMNzcTp76XXF4kz6DvJvhTDkBZ7nMfDcPxHFD1B74o8vI2sCXP4IkPubuzK65qJglwfuN0JjvlsPWnq2dSOpwSKDz9nSBOuGSS9HayDmNser6FiWtEJzBSubQCn90Zm/1YOVDQf1qPgmcTTDRnpWQuttzlXZqgyMbROhm4cezRKv6Qx+cYKrH5UfIzGf21CBuWAlu3OJyoe1dCzZRERTvMZUYsJWehOz22W/bbaQiGIK/wN1b2DAaLRsHdyXJb5XKyC9M9fpRrD22G+6qU6a5jPSg36RxrOSnlwm6glr7sBDlxjF3MWgdVpvd6muyXx8aA4omHb7zmAGMEa8pZv6ozznKKFUQUac1eDkGw+ujYRoierWB8jbPRejTcDJpqmArk/j9V+K8pcqKyht1zJUCXTXg0CGI+903retqmB8pOP1W/jNk qAlgH8NF eVCCkmOowUiGWl3TnSZysxLd+YiXrC1y78E5JCkkng3gw3xIQWr16C2mAQugHqIYVdsvf06ichX5Fst+gQyRzR+eEGCN0MEuHfjBIwij+z2XoDC91KJ++svBQDuoBBLSJPD143TwmQOXRBbJuxULeQQyxUMO7QcnFBxlPEB0/ExGcWQyOBrj7jjwMXfCfSmunk1FJ7RpYDoYjq9s4nw6NAV3cOHtleGQ7v11WQk3rBwoL16hVl6FUMOLi8rG7oWG2K34oXawKJRayEK5PXBwM1JyruFMMKnRVZVW/+ZoroJI8rNqp0ETuAP7++Iwg4qm6wHyktkDNkDniipYod9HQVxiJcCnDcwXzgR3lTmKvbSLKOVzPefFlxAakagQ9dAV/fTm55DQulrpLoZRDwpRFEcWRVQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: "Matthew Wilcox (Oracle)" We need filesystems to be able to communicate acceptable folio sizes to the pagecache for a variety of uses (e.g. large block sizes). Support a range of folio sizes between order-0 and order-31. Signed-off-by: Matthew Wilcox (Oracle) Co-developed-by: Pankaj Raghav Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Darrick J. Wong Tested-by: David Howells Reviewed-by: Daniel Gomez --- include/linux/pagemap.h | 90 ++++++++++++++++++++++++++++++++++------- mm/filemap.c | 6 +-- mm/readahead.c | 4 +- 3 files changed, 80 insertions(+), 20 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index d9c7edb6422bd..c60025bb584c5 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -204,14 +204,21 @@ enum mapping_flags { AS_EXITING = 4, /* final truncate in progress */ /* writeback related tags are not used */ AS_NO_WRITEBACK_TAGS = 5, - AS_LARGE_FOLIO_SUPPORT = 6, - AS_RELEASE_ALWAYS, /* Call ->release_folio(), even if no private data */ - AS_STABLE_WRITES, /* must wait for writeback before modifying + AS_RELEASE_ALWAYS = 6, /* Call ->release_folio(), even if no private data */ + AS_STABLE_WRITES = 7, /* must wait for writeback before modifying folio contents */ - AS_INACCESSIBLE, /* Do not attempt direct R/W access to the mapping, - including to move the mapping */ + AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */ + /* Bits 16-25 are used for FOLIO_ORDER */ + AS_FOLIO_ORDER_BITS = 5, + AS_FOLIO_ORDER_MIN = 16, + AS_FOLIO_ORDER_MAX = AS_FOLIO_ORDER_MIN + AS_FOLIO_ORDER_BITS, }; +#define AS_FOLIO_ORDER_BITS_MASK ((1u << AS_FOLIO_ORDER_BITS) - 1) +#define AS_FOLIO_ORDER_MIN_MASK (AS_FOLIO_ORDER_BITS_MASK << AS_FOLIO_ORDER_MIN) +#define AS_FOLIO_ORDER_MAX_MASK (AS_FOLIO_ORDER_BITS_MASK << AS_FOLIO_ORDER_MAX) +#define AS_FOLIO_ORDER_MASK (AS_FOLIO_ORDER_MIN_MASK | AS_FOLIO_ORDER_MAX_MASK) + /** * mapping_set_error - record a writeback error in the address_space * @mapping: the mapping in which an error should be set @@ -367,9 +374,51 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask) #define MAX_XAS_ORDER (XA_CHUNK_SHIFT * 2 - 1) #define MAX_PAGECACHE_ORDER min(MAX_XAS_ORDER, PREFERRED_MAX_PAGECACHE_ORDER) +/* + * mapping_set_folio_order_range() - Set the orders supported by a file. + * @mapping: The address space of the file. + * @min: Minimum folio order (between 0-MAX_PAGECACHE_ORDER inclusive). + * @max: Maximum folio order (between @min-MAX_PAGECACHE_ORDER inclusive). + * + * The filesystem should call this function in its inode constructor to + * indicate which base size (min) and maximum size (max) of folio the VFS + * can use to cache the contents of the file. This should only be used + * if the filesystem needs special handling of folio sizes (ie there is + * something the core cannot know). + * Do not tune it based on, eg, i_size. + * + * Context: This should not be called while the inode is active as it + * is non-atomic. + */ +static inline void mapping_set_folio_order_range(struct address_space *mapping, + unsigned int min, + unsigned int max) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return; + + if (min > MAX_PAGECACHE_ORDER) + min = MAX_PAGECACHE_ORDER; + + if (max > MAX_PAGECACHE_ORDER) + max = MAX_PAGECACHE_ORDER; + + if (max < min) + max = min; + + mapping->flags = (mapping->flags & ~AS_FOLIO_ORDER_MASK) | + (min << AS_FOLIO_ORDER_MIN) | (max << AS_FOLIO_ORDER_MAX); +} + +static inline void mapping_set_folio_min_order(struct address_space *mapping, + unsigned int min) +{ + mapping_set_folio_order_range(mapping, min, MAX_PAGECACHE_ORDER); +} + /** * mapping_set_large_folios() - Indicate the file supports large folios. - * @mapping: The file. + * @mapping: The address space of the file. * * The filesystem should call this function in its inode constructor to * indicate that the VFS can use large folios to cache the contents of @@ -380,7 +429,23 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask) */ static inline void mapping_set_large_folios(struct address_space *mapping) { - __set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); + mapping_set_folio_order_range(mapping, 0, MAX_PAGECACHE_ORDER); +} + +static inline unsigned int +mapping_max_folio_order(const struct address_space *mapping) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return 0; + return (mapping->flags & AS_FOLIO_ORDER_MAX_MASK) >> AS_FOLIO_ORDER_MAX; +} + +static inline unsigned int +mapping_min_folio_order(const struct address_space *mapping) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return 0; + return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN; } /* @@ -389,20 +454,17 @@ static inline void mapping_set_large_folios(struct address_space *mapping) */ static inline bool mapping_large_folio_support(struct address_space *mapping) { - /* AS_LARGE_FOLIO_SUPPORT is only reasonable for pagecache folios */ + /* AS_FOLIO_ORDER is only reasonable for pagecache folios */ VM_WARN_ONCE((unsigned long)mapping & PAGE_MAPPING_ANON, "Anonymous mapping always supports large folio"); - return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && - test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); + return mapping_max_folio_order(mapping) > 0; } /* Return the maximum folio size for this pagecache mapping, in bytes. */ -static inline size_t mapping_max_folio_size(struct address_space *mapping) +static inline size_t mapping_max_folio_size(const struct address_space *mapping) { - if (mapping_large_folio_support(mapping)) - return PAGE_SIZE << MAX_PAGECACHE_ORDER; - return PAGE_SIZE; + return PAGE_SIZE << mapping_max_folio_order(mapping); } static inline int filemap_nr_thps(struct address_space *mapping) diff --git a/mm/filemap.c b/mm/filemap.c index d87c858465962..5c148144d5548 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1932,10 +1932,8 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP)))) fgp_flags |= FGP_LOCK; - if (!mapping_large_folio_support(mapping)) - order = 0; - if (order > MAX_PAGECACHE_ORDER) - order = MAX_PAGECACHE_ORDER; + if (order > mapping_max_folio_order(mapping)) + order = mapping_max_folio_order(mapping); /* If we're not aligned, allocate a smaller folio */ if (index & ((1UL << order) - 1)) order = __ffs(index); diff --git a/mm/readahead.c b/mm/readahead.c index e83fe1c6e5acd..e0cf3bfd2b2b3 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -449,10 +449,10 @@ void page_cache_ra_order(struct readahead_control *ractl, limit = min(limit, index + ra->size - 1); - if (new_order < MAX_PAGECACHE_ORDER) + if (new_order < mapping_max_folio_order(mapping)) new_order += 2; - new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order); + new_order = min(mapping_max_folio_order(mapping), new_order); new_order = min_t(unsigned int, new_order, ilog2(ra->size)); /* See comment in page_cache_ra_unbounded() */ From patchwork Thu Aug 22 13:50:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E57BFC52D7C for ; Thu, 22 Aug 2024 13:50:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D87D8002C; Thu, 22 Aug 2024 09:50:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5363A8001E; Thu, 22 Aug 2024 09:50:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B00C8002C; Thu, 22 Aug 2024 09:50:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 17AF88001E for ; Thu, 22 Aug 2024 09:50:47 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8B8454171D for ; Thu, 22 Aug 2024 13:50:46 +0000 (UTC) X-FDA: 82480016892.28.20A5897 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf13.hostedemail.com (Postfix) with ESMTP id AE72020015 for ; Thu, 22 Aug 2024 13:50:44 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=ovaCvJpN; spf=pass (imf13.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334564; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xvALe0si8mmJx8BbZzMNyuIQl/NHASIyzDu6VP1nIVA=; b=jYfZWxTAOcscNNNZn34Xv5Y9wiGGteFwi5BzgIbQF0PetSR7ZtLuTNW7/gTO+ulzfaoXg3 e4WJiaOOw/iZ8InAQhM/7su8OjYA86ShrChvOlW9sECt8gnJ+xCjfoNtY35YW112h4mHt9 gJRfk+eY5NohxAlXvU58s+E+kT5vpDs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334564; a=rsa-sha256; cv=none; b=Wd7u0oZrRiSRh6pc0+u+XYG7jb1QZjrrm1tzC5udZ5pBUiz21Qe0xlAhYgFnd/s82OPppv PMRigX7ZuMl2t4na3AqGZs74FrexoBEit+vTv43sly4Q+sAqvPl0ogVfR1TK2I9LPOSxou 3xl1NpxVmv4Ve4lAH4LxEEOiporkW7I= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=ovaCvJpN; spf=pass (imf13.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4WqPjY3DRkz9t26; Thu, 22 Aug 2024 15:50:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334641; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xvALe0si8mmJx8BbZzMNyuIQl/NHASIyzDu6VP1nIVA=; b=ovaCvJpNLGBJI2GBTEYrdBeqMLC7yel9OuonCATHEU9m6EmFu9rMSZPb/oNyjXYF5OUdNe INTtVtM4kti9F6p0oGL0VYFdbDgKEkZROQrPUorypLF+zQvSonEUz+pMqIv0pSeZdotpdS i3jSNC8CjbdShx8br9RzKyElRaMn2HvPG0q08p9vvPYSOKKZ5hDlDyLdR6X4k5ROQ2JX2j K9gJ28tjKVMl79CXcduAm2VN3+VSknitgLzx6UwdancjSb3NK8qIS9+M3dUKapxcUn3Umw FN8prU2mX2T5aAGRr2Uvr9t/wT8Uatut3aCLTVALbEoRIPIdYgaJ66LtZc2hrg== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, David Howells Subject: [PATCH v13 02/10] filemap: allocate mapping_min_order folios in the page cache Date: Thu, 22 Aug 2024 15:50:10 +0200 Message-ID: <20240822135018.1931258-3-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: AE72020015 X-Stat-Signature: fcgndo38zbht34ddye63azppa4sgadq3 X-HE-Tag: 1724334644-466970 X-HE-Meta: U2FsdGVkX1/hjI+DI2z9gZ5WCZoaoAClbti4xDpZxVWsmKhy48WiEvlKAS9a32/g2UjYLjySMxzegXNHn6D7jwcryLuvzN+94kIo2e6CaCVDnVGHpnco+4L/GxOhdPJICmpWdEgkCH1hpTyxki+8gwoTbHFMk3FyNPWp8QEtvWfs6jd6U+U/9jIzVbpE2/N1m2oWg/+ynWTDfq6Jako4gPxCG5OzdK7AYbdYn42a9SOKNWxSQW+pX9Uv9foo+DcLWHfEi6R2CZVVb6Jd8koO7vqkgFT3sq13JqZduuGdfSBBkykoJtafLg1Gee/Gv+ygnsK8HKRJ9G9bi4KdHnqlcszRLQQe/m3p8POZIzdVdVptkAQeTiu9rzvBk/I7FBOHJCsSRca2U/bOb0JCbbanVSniWkTeaC62qz73ZeckmLR/Sk+dnhwHv5JOqUrU9d/LoKa0ASB6k06nQ7Ssw8aRxh+TtAUTlcs2QtxWOeJ5n462x0NQREWlHwrIzm/z0yQJ1VOCWdNRFxcKvPxpYZtjSGNeP8SXPdw/DuD0fgLnNzKXR3dNgdqkawYB/prPj/S5nWh0mIejeonpeqz8TfrZEIFCv8aglKC2uJW3Ub8Apvavvi82YTWZX82R23umemAVWAcmorHoORFjPDL3dmC19Vk1FBhDPQCB2Oc/qjdCDtE8lGH3MO2SjgevhuUPrz3ZA+8rWXt55q1211Q26mKXtmBjocPKFOOjPtgM7RBQBBpgl0PcStWlOndyCbYOE6rHwP2YIxtpwP+aaUpif4Hb/8TlBc87sdgXXJG7quMIG+B4IkoetIA1pfF/C+ZIZZv32QQsk8LqGQx22z7mPGczntVCYmHkDcCm4vwM+3IwyBWPuz90Zzmc7nVn53i2G9jfNzXBFbTPfWCYDaVWLwTEXP1yRFI2V0005h0Vtfy2wpnZ3HmWaDaJj2F1B9c1LUotQd0jv2FK5UWYBgr4uM8 xCD/KEf+ 9wYzQkOErL1A1EmmA8ZAOj222e+PH2IKB9kT28w5L+vwjuiSRji/lKMHpdxKWnT5P40EaoB0jqQhqyRxXTrwQ7kFakSH1QImUk8oSoqC4vTxG4tQJGDkmpua0W9WgF7nF3ea7/2WKs+sv/+HKxlvxPfHhYeMf0r5/4d+tFpF8YQ8zgTnqgGOsl+gl4z24cuVg4EKqDk17EP0edXopJB64jXuNOeGLlKMJ/MsndJsz4SRpGV+OmJPMEaqJJkoeUOfcSrbfKZ2nYZVVK5sd6yKORwIfxLhkdWDDnTFcNAxcHfFKz8bXyrQ4gDNZ4NAO8B418gPc03BsCHJA1pBhsFkAU1L+vR8oLTHV7C4Q7/ciJD7JskmVy4u45wl2CyPhjcQVrTsq/A/jlCtleBbKo+JdBD8IuE+c8VJrXQJvV8d+WhKRAZ3F002OC2JyGCXLwmQ7hgoW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav filemap_create_folio() and do_read_cache_folio() were always allocating folio of order 0. __filemap_get_folio was trying to allocate higher order folios when fgp_flags had higher order hint set but it will default to order 0 folio if higher order memory allocation fails. Supporting mapping_min_order implies that we guarantee each folio in the page cache has at least an order of mapping_min_order. When adding new folios to the page cache we must also ensure the index used is aligned to the mapping_min_order as the page cache requires the index to be aligned to the order of the folio. Co-developed-by: Luis Chamberlain Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Darrick J. Wong Reviewed-by: Matthew Wilcox (Oracle) Tested-by: David Howells --- include/linux/pagemap.h | 20 ++++++++++++++++++++ mm/filemap.c | 24 ++++++++++++++++-------- 2 files changed, 36 insertions(+), 8 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c60025bb584c5..4cc170949e9c0 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -448,6 +448,26 @@ mapping_min_folio_order(const struct address_space *mapping) return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN; } +static inline unsigned long +mapping_min_folio_nrpages(struct address_space *mapping) +{ + return 1UL << mapping_min_folio_order(mapping); +} + +/** + * mapping_align_index() - Align index for this mapping. + * @mapping: The address_space. + * + * The index of a folio must be naturally aligned. If you are adding a + * new folio to the page cache and need to know what index to give it, + * call this function. + */ +static inline pgoff_t mapping_align_index(struct address_space *mapping, + pgoff_t index) +{ + return round_down(index, mapping_min_folio_nrpages(mapping)); +} + /* * Large folio support currently depends on THP. These dependencies are * being worked on but are not yet fixed. diff --git a/mm/filemap.c b/mm/filemap.c index 5c148144d5548..9a047c6d03e4e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -858,6 +858,8 @@ noinline int __filemap_add_folio(struct address_space *mapping, VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio); + VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping), + folio); mapping_set_update(&xas, mapping); VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); @@ -1918,8 +1920,10 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, folio_wait_stable(folio); no_page: if (!folio && (fgp_flags & FGP_CREAT)) { - unsigned order = FGF_GET_ORDER(fgp_flags); + unsigned int min_order = mapping_min_folio_order(mapping); + unsigned int order = max(min_order, FGF_GET_ORDER(fgp_flags)); int err; + index = mapping_align_index(mapping, index); if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping)) gfp |= __GFP_WRITE; @@ -1942,7 +1946,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, gfp_t alloc_gfp = gfp; err = -ENOMEM; - if (order > 0) + if (order > min_order) alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; folio = filemap_alloc_folio(alloc_gfp, order); if (!folio) @@ -1957,7 +1961,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, break; folio_put(folio); folio = NULL; - } while (order-- > 0); + } while (order-- > min_order); if (err == -EEXIST) goto repeat; @@ -2443,13 +2447,15 @@ static int filemap_update_page(struct kiocb *iocb, } static int filemap_create_folio(struct file *file, - struct address_space *mapping, pgoff_t index, + struct address_space *mapping, loff_t pos, struct folio_batch *fbatch) { struct folio *folio; int error; + unsigned int min_order = mapping_min_folio_order(mapping); + pgoff_t index; - folio = filemap_alloc_folio(mapping_gfp_mask(mapping), 0); + folio = filemap_alloc_folio(mapping_gfp_mask(mapping), min_order); if (!folio) return -ENOMEM; @@ -2467,6 +2473,7 @@ static int filemap_create_folio(struct file *file, * well to keep locking rules simple. */ filemap_invalidate_lock_shared(mapping); + index = (pos >> (PAGE_SHIFT + min_order)) << min_order; error = filemap_add_folio(mapping, folio, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error == -EEXIST) @@ -2527,8 +2534,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count, if (!folio_batch_count(fbatch)) { if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ)) return -EAGAIN; - err = filemap_create_folio(filp, mapping, - iocb->ki_pos >> PAGE_SHIFT, fbatch); + err = filemap_create_folio(filp, mapping, iocb->ki_pos, fbatch); if (err == AOP_TRUNCATED_PAGE) goto retry; return err; @@ -3748,9 +3754,11 @@ static struct folio *do_read_cache_folio(struct address_space *mapping, repeat: folio = filemap_get_folio(mapping, index); if (IS_ERR(folio)) { - folio = filemap_alloc_folio(gfp, 0); + folio = filemap_alloc_folio(gfp, + mapping_min_folio_order(mapping)); if (!folio) return ERR_PTR(-ENOMEM); + index = mapping_align_index(mapping, index); err = filemap_add_folio(mapping, folio, index, gfp); if (unlikely(err)) { folio_put(folio); From patchwork Thu Aug 22 13:50:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773524 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29D4AC52D7C for ; Thu, 22 Aug 2024 13:50:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B05976B0257; Thu, 22 Aug 2024 09:50:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8DD36B0258; Thu, 22 Aug 2024 09:50:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E0716B025A; Thu, 22 Aug 2024 09:50:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 696926B0257 for ; Thu, 22 Aug 2024 09:50:52 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 18D221A16B8 for ; Thu, 22 Aug 2024 13:50:52 +0000 (UTC) X-FDA: 82480017144.14.336476C Received: from mout-p-201.mailbox.org (mout-p-201.mailbox.org [80.241.56.171]) by imf25.hostedemail.com (Postfix) with ESMTP id 4DC23A0019 for ; Thu, 22 Aug 2024 13:50:50 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=SNYmq8LI; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf25.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334560; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V6ym79zqALc1EhwazXO34Dqur4lAVM3opZ9BwKTibbA=; b=XFqwvf2HHBgGUOs4ZfianRL1f2Iu27czx4jOX08lkuKacGWJ2T0b3HugPnA8+wT7K0xLhO L8QL3Cj0TB61UHktXJOmEdC/V2Q4ZKd+D6wEMhjJmZ24DjURQBp/LbpqbLnYMSD4ttWQi7 +cxVsCO7RzY4tE9cWjm8aDg1F+4pLac= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334560; a=rsa-sha256; cv=none; b=PwiJxkHYCPkrI0uX87Rn0ifH/4R1pbHd9MI0vprq547j0VhehqxzmpuGv9XDCEjzU8L0gI 2zefpCB39I0Mzyb770qiJpY7Njb0fq1p0xr3RWmItVk5VQp9rqFXsncVGZHaR/99JRbhgf aihK1xtjdNF9XKxiMC6NWsft34Z2yWI= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=SNYmq8LI; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf25.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.171 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com Received: from smtp102.mailbox.org (smtp102.mailbox.org [10.196.197.102]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-201.mailbox.org (Postfix) with ESMTPS id 4WqPjg19HWz9tFG; Thu, 22 Aug 2024 15:50:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334647; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V6ym79zqALc1EhwazXO34Dqur4lAVM3opZ9BwKTibbA=; b=SNYmq8LI47jO+MaSttCE92UjzwAUPmkZ3jgf0lMrm5qryjIfkW5AWKKdIuYVKjYPywWgwO IGanu2IUQAdOwh+9ThqTcC9Zk8JO003EmdeRy+FEF+7W+grLX08znRCGtgXSOtu9Wi4aMH 31TR2CMFjHfD7hUkJ79kxN+q1fiWAcBMOSfnJDdkExAm788JpOTXep8bbKVGRZrTiC8u8a fKvSM40D8sV+NnZIgJxQzua7JpKUz+xOreGn69Us/+pAk1a7sC11KxMzAbBQVOpWKEyHSa uH2R2/n7hApb6wJq/fYcpWxGe/Ywvs2pX2CWoYTlvSaYF2G5Fjhbhe2ugmeB9Q== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, David Howells Subject: [PATCH v13 03/10] readahead: allocate folios with mapping_min_order in readahead Date: Thu, 22 Aug 2024 15:50:11 +0200 Message-ID: <20240822135018.1931258-4-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4DC23A0019 X-Stat-Signature: fe87kgmb85t95dtqx1e97dyzofoccqnh X-Rspam-User: X-HE-Tag: 1724334650-301368 X-HE-Meta: U2FsdGVkX18vgCw506HlxeJAPdKWjLoWqkcAu4XxhGaBY+z7n7x3pd9um5AOeRT4vhnXpHHmcmDGMHiqTYXM6Ci7IWyFdc6jdnFWBcVvY9eq3+kXhFvXAI8H4km5nAX5tD6AKJ4WavZZfEuyo7JDzoIEn0xV1bfeOV9jdPocFl52Q3rs1893I4msmFXv+wsV5shug7ugTsRbzpdbJOesyCaK61dQnVcjAbHvxZOfzFR5HjPNfEimCLIqSOSipt1DsHp+39tiv/V3KX9aU5Z7tq5vW3YXBsLceMETy08n9DxrrwTr3ZmEhxM2qsIKwUkPf95JtgZC57hgsBCBVsYBkRGoxi1P5pa7t7oKAOi3KmJJcUvhjGcJJgwrRO2CJFA1W7KW9CcYHfu1Jwn3X04jI+abAPvmMzAz2fnbn00aSPc10ke3/cloNKkM/Fa6O63K43uavh7bw2KRcpP6vVDVMxvrGF9qHmyrZz4NmyJExOQd4tdHvzC+24b6p0rg3dcrrRjKjIaANQ1QQjrZZHKdFz1UqCl086udjkuS66QYt1UDrWK4qUDq/96eW5b8Kk99ay/yEf5+Hq+HP8nmx+qOZK7hffUCLn5kXvjHBYOrfihLBavnA/F4aQabVA4kJnXsJzzTZbC15OW3moaC45bxsvhRQi3HrfVccPaJ/g9vjuY1MrvBzUlxPyDjxh7D3sT7Hkk9OfowWaUCWhQg1gTH3qSKjdYjuSJL9uSJH1xgd25sacQ5N4pdNK/JDdg/e2XlOq3x1ao+CF+pjpnBBX+4TNfx3FDKpTCMeCKkDxDadjXxRJPzhSms9xo0KTxFaqc2m8/2qXw0RRmn4QjkVCPjJmtult/ycX1rewj7TsOy3xm8h7IvCfnRSvLUPHmFaHfqpfOdTGau+uygbJptrN6xBW5OXktdyfpzADHYxCvHExmxZnG3W9MaPu3R/qagEkB5F8ak2RIDhgp0FU8oQGO yQlH38Af 7mpfmEqDUC8ZDnmPq4sSgxJP/hmZTQvSK0egtbSU3eH3vTT+NojjwNor+VexMJbYCJmDNHOt9t6+XUhWnxfl01+kDvG77+1loJaka36lq2hJlFm3UVccbm/jZt9TMjlXlXG9iPTxscgPEoCZ8jAVtGnwFf7034mxTWuzNgrVU/2sC1Gn+0PWK38uSlSFnq5PJuXNkmjJYrvs58YhOAlLqhTNoXET6UoEiwhZA/OoGvqex/Vqt4ulZdbvHDGsa1Sfox7XzGx241WgiYkZryga3PVDl88uHiXLEsgoQzu2xn0CEI8KXyfxvAQYOlW1JnmoaTMyH8Ja182ztR67b/1ZqE2fs4DiGOg9tr1N0RbWsJHmbrNTzfzc6PuTt+w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav page_cache_ra_unbounded() was allocating single pages (0 order folios) if there was no folio found in an index. Allocate mapping_min_order folios as we need to guarantee the minimum order if it is set. page_cache_ra_order() tries to allocate folio to the page cache with a higher order if the index aligns with that order. Modify it so that the order does not go below the mapping_min_order requirement of the page cache. This function will do the right thing even if the new_order passed is less than the mapping_min_order. When adding new folios to the page cache we must also ensure the index used is aligned to the mapping_min_order as the page cache requires the index to be aligned to the order of the folio. readahead_expand() is called from readahead aops to extend the range of the readahead so this function can assume ractl->_index to be aligned with min_order. Signed-off-by: Pankaj Raghav Co-developed-by: Hannes Reinecke Signed-off-by: Hannes Reinecke Acked-by: Darrick J. Wong Tested-by: David Howells --- mm/readahead.c | 79 ++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 61 insertions(+), 18 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index e0cf3bfd2b2b3..3dc6c7a128dd3 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -206,9 +206,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, unsigned long nr_to_read, unsigned long lookahead_size) { struct address_space *mapping = ractl->mapping; - unsigned long index = readahead_index(ractl); + unsigned long ra_folio_index, index = readahead_index(ractl); gfp_t gfp_mask = readahead_gfp_mask(mapping); - unsigned long i; + unsigned long mark, i = 0; + unsigned int min_nrpages = mapping_min_folio_nrpages(mapping); /* * Partway through the readahead operation, we will have added @@ -223,10 +224,24 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, unsigned int nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); + index = mapping_align_index(mapping, index); + + /* + * As iterator `i` is aligned to min_nrpages, round_up the + * difference between nr_to_read and lookahead_size to mark the + * index that only has lookahead or "async_region" to set the + * readahead flag. + */ + ra_folio_index = round_up(readahead_index(ractl) + nr_to_read - lookahead_size, + min_nrpages); + mark = ra_folio_index - index; + nr_to_read += readahead_index(ractl) - index; + ractl->_index = index; + /* * Preallocate as many pages as we will need. */ - for (i = 0; i < nr_to_read; i++) { + while (i < nr_to_read) { struct folio *folio = xa_load(&mapping->i_pages, index + i); int ret; @@ -240,12 +255,13 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, * not worth getting one just for that. */ read_pages(ractl); - ractl->_index++; - i = ractl->_index + ractl->_nr_pages - index - 1; + ractl->_index += min_nrpages; + i = ractl->_index + ractl->_nr_pages - index; continue; } - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, + mapping_min_folio_order(mapping)); if (!folio) break; @@ -255,14 +271,15 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, if (ret == -ENOMEM) break; read_pages(ractl); - ractl->_index++; - i = ractl->_index + ractl->_nr_pages - index - 1; + ractl->_index += min_nrpages; + i = ractl->_index + ractl->_nr_pages - index; continue; } - if (i == nr_to_read - lookahead_size) + if (i == mark) folio_set_readahead(folio); ractl->_workingset |= folio_test_workingset(folio); - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; + i += min_nrpages; } /* @@ -438,13 +455,19 @@ void page_cache_ra_order(struct readahead_control *ractl, struct address_space *mapping = ractl->mapping; pgoff_t start = readahead_index(ractl); pgoff_t index = start; + unsigned int min_order = mapping_min_folio_order(mapping); pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; pgoff_t mark = index + ra->size - ra->async_size; unsigned int nofs; int err = 0; gfp_t gfp = readahead_gfp_mask(mapping); + unsigned int min_ra_size = max(4, mapping_min_folio_nrpages(mapping)); - if (!mapping_large_folio_support(mapping) || ra->size < 4) + /* + * Fallback when size < min_nrpages as each folio should be + * at least min_nrpages anyway. + */ + if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size) goto fallback; limit = min(limit, index + ra->size - 1); @@ -454,10 +477,19 @@ void page_cache_ra_order(struct readahead_control *ractl, new_order = min(mapping_max_folio_order(mapping), new_order); new_order = min_t(unsigned int, new_order, ilog2(ra->size)); + new_order = max(new_order, min_order); /* See comment in page_cache_ra_unbounded() */ nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); + /* + * If the new_order is greater than min_order and index is + * already aligned to new_order, then this will be noop as index + * aligned to new_order should also be aligned to min_order. + */ + ractl->_index = mapping_align_index(mapping, index); + index = readahead_index(ractl); + while (index <= limit) { unsigned int order = new_order; @@ -465,7 +497,7 @@ void page_cache_ra_order(struct readahead_control *ractl, if (index & ((1UL << order) - 1)) order = __ffs(index); /* Don't allocate pages past EOF */ - while (index + (1UL << order) - 1 > limit) + while (order > min_order && index + (1UL << order) - 1 > limit) order--; err = ra_alloc_folio(ractl, index, mark, order, gfp); if (err) @@ -703,8 +735,15 @@ void readahead_expand(struct readahead_control *ractl, struct file_ra_state *ra = ractl->ra; pgoff_t new_index, new_nr_pages; gfp_t gfp_mask = readahead_gfp_mask(mapping); + unsigned long min_nrpages = mapping_min_folio_nrpages(mapping); + unsigned int min_order = mapping_min_folio_order(mapping); new_index = new_start / PAGE_SIZE; + /* + * Readahead code should have aligned the ractl->_index to + * min_nrpages before calling readahead aops. + */ + VM_BUG_ON(!IS_ALIGNED(ractl->_index, min_nrpages)); /* Expand the leading edge downwards */ while (ractl->_index > new_index) { @@ -714,9 +753,11 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, min_order); if (!folio) return; + + index = mapping_align_index(mapping, index); if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { folio_put(folio); return; @@ -726,7 +767,7 @@ void readahead_expand(struct readahead_control *ractl, ractl->_workingset = true; psi_memstall_enter(&ractl->_pflags); } - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; ractl->_index = folio->index; } @@ -741,9 +782,11 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, min_order); if (!folio) return; + + index = mapping_align_index(mapping, index); if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { folio_put(folio); return; @@ -753,10 +796,10 @@ void readahead_expand(struct readahead_control *ractl, ractl->_workingset = true; psi_memstall_enter(&ractl->_pflags); } - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; if (ra) { - ra->size++; - ra->async_size++; + ra->size += min_nrpages; + ra->async_size += min_nrpages; } } } From patchwork Thu Aug 22 13:50:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1083AC52D7C for ; Thu, 22 Aug 2024 13:50:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CBD66B026D; Thu, 22 Aug 2024 09:50:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 97B946B026E; Thu, 22 Aug 2024 09:50:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CDFC6B026F; Thu, 22 Aug 2024 09:50:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5AF5B6B026D for ; Thu, 22 Aug 2024 09:50:57 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1BB28816D0 for ; Thu, 22 Aug 2024 13:50:57 +0000 (UTC) X-FDA: 82480017354.11.E1727A7 Received: from mout-p-102.mailbox.org (mout-p-102.mailbox.org [80.241.56.152]) by imf09.hostedemail.com (Postfix) with ESMTP id 403F214001B for ; Thu, 22 Aug 2024 13:50:53 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=TI3XkqrO; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf09.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.152 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334564; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x4+UJEIcZLaw+P6a+NKFEvwnnHiOsE5MmywSgHb9XjU=; b=nXPVUo02vCZO+2t2o/ECK03IAmIb05wbrcdMBC4XG9C/Bwf3AxSuqarUux/DSIi/56h94v 2jJFC44CNCv03na12MQpBHOPjPdqgHZK9Te55KFU+xDJSTS3/hYREqfKeCptsKfZED4Y5i 0bEGOxoqoWeMTIaohSIJNfJ3XJSptjo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334564; a=rsa-sha256; cv=none; b=ptVLCRAiK/2smq8DS70H65Ku+UdkgQP2neBgdVvIgCCFKSufU5oF5eG2f1SKpVSUtuzrIS 7dmHEvndFRd81LbQlGK2tb4iP1U472k0HFNY4gZA2+69VdIqsdum2jSr5PgXQtbbbGnPuv vGMOlKat6f5I/p9B4lvRlHU3VYMGMhQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=TI3XkqrO; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf09.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.152 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-102.mailbox.org (Postfix) with ESMTPS id 4WqPjk6K4bz9shX; Thu, 22 Aug 2024 15:50:50 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334650; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x4+UJEIcZLaw+P6a+NKFEvwnnHiOsE5MmywSgHb9XjU=; b=TI3XkqrOA1G5xwXEyCR1F9QdslfABIfK9sZB8HKy2iWovXDXulySJg3g87dZTPCkTT5C44 k4f8YSeu80Sbez6lb6ok73vzCwncZBmUYVJKzxwOG+Aw8p9A2YCq4Cd5reW0vK5LF/gYuY 23Q+7tPshAIXPIhqVbNRdyVEWa4tU9loSCNPaXo5yLr42au/8Kawto53+jwyjVFfKurQMD EsqyNjAiWtoV50XL7TfQealdHd01Q7HN/iIc1Mn7e3+saSIzpB4u2YHJLSTluwfivIXipl yYrPITnpRSP0mSTCPpx6KDZZ5K0836MrDORPqIy1r3Z+e42nnlmT57QA40k0pg== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, David Howells Subject: [PATCH v13 04/10] mm: split a folio in minimum folio order chunks Date: Thu, 22 Aug 2024 15:50:12 +0200 Message-ID: <20240822135018.1931258-5-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 403F214001B X-Stat-Signature: xrttywz4phcamgyzuu8fjoet6qcj94oc X-Rspam-User: X-HE-Tag: 1724334653-256354 X-HE-Meta: U2FsdGVkX18bRwJsVFR9lHVbAxRh22SrJ7yFnXLrDdU2dwuYourfQSJCwsaAYfmu4sNDbM+Xofyv6IX1MX/1FAGeW5T6dfFDYOozyNfBQ3mUyT2pcr9RTGXuasLTu3BLdgVQdSM7e29oHzDfcJdHrEJTv7F2atQC91JI5bwrH3XQtmfhLn1tHwG6tSJEtqwofQd2DXgUQjXrI02H2WxaFDvf8WHxKyaypxS8sd3+OQc6ScvcRdQXrxa8n/slKedGqn2cI4E1/IGbMMuCGNrVtaFqI37BUFl7X3nx967DyDr445yQlx51STvo4f0Or8IDWSjSW8W6+IihbzSJbvVkJcmxFBTxEzNGAXjj8Jm26ad0+Pa5q50ZlEMtIR8UlY/b9xkCMvNbPukh0rus3fgW9PT7mwHthRmMvVGJhrnfLuuUxwsvbBZvjV/zuGcSfC34AhcNkGGGepsrlVzvDeCoFjnOpiF4tM7D7t7LQwe9id8fqiyNpAd4eOnvoWKIU0LB++So/yBWLJWPm8zY+L2AgOEH3E/gyeY1OhN3Xjx67AMYGXnWFMCQoAVBXrZQtU4yo1WqDlq+FkFKfwDVG5eiKeJfXyPpEEG/iAWLsivdbJlXMNLNCox3HsfwFf94AsZbsLi4Ca3Mzrv5LAa0I6aBAoNRNU9d3oWd/Ihk/ZdtFBguzcUOF2u+qbr8ysDcdrOOqBcYnx4Ch27sCOoHDpjUh23TrlxtqTrE1Rh6y+/3UbdYak/Hte9VZn+a0FicDy9hGeIQIM8ZYyUEeQo0ZMGRMKBWjwHY+RM5Nmtkqw52dAvTsahawyhNqJXvTmJGi+qCKApoU3nVhkPBvEpmNunH3pVqDONicVVnzHo6zGkb+UWEAj34DPwkgrIRsE54YJPuOsIq8fhO4yCQCV1sMsM2bvj0+dQ8mlRFubExgVOVRx8v9Vw92+9YuqnwSxAZLVltdZZjnUiumdAAtI6BT75 IORZA7tF wdfHhgZfKUlICuJU8U53LqojHrskLaRqqcmKxCoNhBeDcx+TNXk9sVgwIgBL9d4+fGxBsHaJgOaMh+u4pKWfZNJGv5FYG6FmzeWj7gwEboNiHiAtuxqiMR5ylVPwgZsliXwE3+4OVwlzVVOgiVKrRxKSI3LWuiQ6+pZezgviehYIZYdB2RmlaEohFP/RPoZlmohdWUgp74OaGTfgUufYUVOekXT22lHZUZauGOcPCzeMZ/55Ptb7rTfM2z+Mis8Wq4yJMdDmD0msT0DB8N5qEi52hPgu1UZdQxaehEN2tgZJCyrQQDis9qE4d5iQmbB+ezP/BYv3+QQ14ykdJBCrQx1Yk/D02JNlq2KhYlyEuahx4yvc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Luis Chamberlain split_folio() and split_folio_to_list() assume order 0, to support minorder for non-anonymous folios, we must expand these to check the folio mapping order and use that. Set new_order to be at least minimum folio order if it is set in split_huge_page_to_list() so that we can maintain minimum folio order requirement in the page cache. Update the debugfs write files used for testing to ensure the order is respected as well. We simply enforce the min order when a file mapping is used. Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Zi Yan Tested-by: David Howells --- include/linux/huge_mm.h | 14 +++++++--- mm/huge_memory.c | 60 ++++++++++++++++++++++++++++++++++++++--- 2 files changed, 66 insertions(+), 8 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4c32058cacfec..70424d55da088 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -96,6 +96,8 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) +#define split_folio(f) split_folio_to_list(f, NULL) + #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES #define HPAGE_PMD_SHIFT PMD_SHIFT #define HPAGE_PUD_SHIFT PUD_SHIFT @@ -317,9 +319,10 @@ unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long add bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins); int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order); +int split_folio_to_list(struct folio *folio, struct list_head *list); static inline int split_huge_page(struct page *page) { - return split_huge_page_to_list_to_order(page, NULL, 0); + return split_folio(page_folio(page)); } void deferred_split_folio(struct folio *folio); @@ -495,6 +498,12 @@ static inline int split_huge_page(struct page *page) { return 0; } + +static inline int split_folio_to_list(struct folio *folio, struct list_head *list) +{ + return 0; +} + static inline void deferred_split_folio(struct folio *folio) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) @@ -622,7 +631,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) return split_folio_to_list_to_order(folio, NULL, new_order); } -#define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) -#define split_folio(f) split_folio_to_order(f, 0) - #endif /* _LINUX_HUGE_MM_H */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index cf8e34f62976f..06384b85a3a20 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3303,6 +3303,9 @@ bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins) * released, or if some unexpected race happened (e.g., anon VMA disappeared, * truncation). * + * Callers should ensure that the order respects the address space mapping + * min-order if one is set for non-anonymous folios. + * * Returns -EINVAL when trying to split to an order that is incompatible * with the folio. Splitting to order 0 is compatible with all folios. */ @@ -3384,6 +3387,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, mapping = NULL; anon_vma_lock_write(anon_vma); } else { + unsigned int min_order; gfp_t gfp; mapping = folio->mapping; @@ -3394,6 +3398,14 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, goto out; } + min_order = mapping_min_folio_order(folio->mapping); + if (new_order < min_order) { + VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u", + min_order); + ret = -EINVAL; + goto out; + } + gfp = current_gfp_context(mapping_gfp_mask(mapping) & GFP_RECLAIM_MASK); @@ -3506,6 +3518,25 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, return ret; } +int split_folio_to_list(struct folio *folio, struct list_head *list) +{ + unsigned int min_order = 0; + + if (folio_test_anon(folio)) + goto out; + + if (!folio->mapping) { + if (folio_test_pmd_mappable(folio)) + count_vm_event(THP_SPLIT_PAGE_FAILED); + return -EBUSY; + } + + min_order = mapping_min_folio_order(folio->mapping); +out: + return split_huge_page_to_list_to_order(&folio->page, list, + min_order); +} + void __folio_undo_large_rmappable(struct folio *folio) { struct deferred_split *ds_queue; @@ -3736,6 +3767,8 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, struct vm_area_struct *vma = vma_lookup(mm, addr); struct folio_walk fw; struct folio *folio; + struct address_space *mapping; + unsigned int target_order = new_order; if (!vma) break; @@ -3753,7 +3786,13 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, if (!is_transparent_hugepage(folio)) goto next; - if (new_order >= folio_order(folio)) + if (!folio_test_anon(folio)) { + mapping = folio->mapping; + target_order = max(new_order, + mapping_min_folio_order(mapping)); + } + + if (target_order >= folio_order(folio)) goto next; total++; @@ -3771,9 +3810,14 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, folio_get(folio); folio_walk_end(&fw, vma); - if (!split_folio_to_order(folio, new_order)) + if (!folio_test_anon(folio) && folio->mapping != mapping) + goto unlock; + + if (!split_folio_to_order(folio, target_order)) split++; +unlock: + folio_unlock(folio); folio_put(folio); @@ -3802,6 +3846,8 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, pgoff_t index; int nr_pages = 1; unsigned long total = 0, split = 0; + unsigned int min_order; + unsigned int target_order; file = getname_kernel(file_path); if (IS_ERR(file)) @@ -3815,6 +3861,8 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, file_path, off_start, off_end); mapping = candidate->f_mapping; + min_order = mapping_min_folio_order(mapping); + target_order = max(new_order, min_order); for (index = off_start; index < off_end; index += nr_pages) { struct folio *folio = filemap_get_folio(mapping, index); @@ -3829,15 +3877,19 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start, total++; nr_pages = folio_nr_pages(folio); - if (new_order >= folio_order(folio)) + if (target_order >= folio_order(folio)) goto next; if (!folio_trylock(folio)) goto next; - if (!split_folio_to_order(folio, new_order)) + if (folio->mapping != mapping) + goto unlock; + + if (!split_folio_to_order(folio, target_order)) split++; +unlock: folio_unlock(folio); next: folio_put(folio); From patchwork Thu Aug 22 13:50:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773526 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D24F6C3DA4A for ; Thu, 22 Aug 2024 13:51:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22CD16B0270; Thu, 22 Aug 2024 09:51:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DC186B0274; Thu, 22 Aug 2024 09:51:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 006706B0273; Thu, 22 Aug 2024 09:51:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D272B6B0270 for ; Thu, 22 Aug 2024 09:51:00 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 98A8F1415C8 for ; Thu, 22 Aug 2024 13:51:00 +0000 (UTC) X-FDA: 82480017480.20.5672FD5 Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by imf27.hostedemail.com (Postfix) with ESMTP id AE81C4001A for ; Thu, 22 Aug 2024 13:50:58 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=J0HCp7BX; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf27.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334642; a=rsa-sha256; cv=none; b=v1VLYl6sQe+pejtwPOk9cTzV0GecjC/02RjwJtksCzBWUKf+rqhAnUDzQayfgiJBfylx8D Qn0B0/RXgGqJ3YpkMkRJ32DWolsrQZrZIezcYfNOSPQVm+uuBM7Nsqt5DXW9B8tvuKcC8N XLc+RPsWXl2xZBjyHE0fAazdtb3hD7g= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=J0HCp7BX; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf27.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334642; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0Q1lbNMqgjDwqXL3NMwe856kt2DWnR8U6HSjOpp9wIA=; b=WElYUKhG5JlpS2g4pYGqdlXblDdoRb9eukMQc6a6LBafqKmcfnSvz322mxGuZGzUWT5F+r p6lwlzIBxdTnB5fzwnKctGSVUzWKE+8AL1z2vavizYmZiWgJjshP07wm6jE2szm13ncFDT AW1k+n2hPyRLKAtpKAxUuUeezVpqPzU= Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4WqPjq2sLvz9t4y; Thu, 22 Aug 2024 15:50:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334655; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0Q1lbNMqgjDwqXL3NMwe856kt2DWnR8U6HSjOpp9wIA=; b=J0HCp7BXa6hLOSuHOQdvc1tqzAJwVHXrI6wHgtgDZKOx5Vsa9imVJ7aEDuNgTxZZOJ45Xg vuTXxmTLHQEjTAuPYLpI2WKg9Tf40FDsa2jZJP9J+xBkawdcZ9ZXZoaebLfQBmWq+PZOvj /fZ4kfiR5/HCF0mprnb5JZHnl1Jytl5CiAZJ68wC3UdQajUvMv+S2i/kp2Es/uNeOUzhI4 BKbHhS/Sf0Wh3x5MmBQxNV9gS3KZPyfNEAoIESk3kUwDA6e/JywmZoK1G71/GODP13GZpl COWjMESiif/EilWe1/RDOigqZhf+45xezdL2GC4Wumxpcgx2YEKsWhljf2097g== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, David Howells Subject: [PATCH v13 05/10] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Date: Thu, 22 Aug 2024 15:50:13 +0200 Message-ID: <20240822135018.1931258-6-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: AE81C4001A X-Rspamd-Server: rspam01 X-Stat-Signature: 5yk4fmcw1fzn8n3xzdqp9qksjfp4zgfx X-HE-Tag: 1724334658-371619 X-HE-Meta: U2FsdGVkX1/FwPTEKNrA3ZbZg8lGRvKNfei+eqE2yXT2+2HnO6wz4xN2Nbnc/HvJwSL0m51+qRFNHoa7AjObhff13Bl9B/nL1gGVH+QgmBfdfrdYue7TzDjO1Znaw4zf4gHigHRiU4pl9I0ymEluo/kkdkpNfwv0XGBkaCexEucfF2rzA0IUvJdVVqeOZoFYNk8A/w+EhA/Fh6vDaaicqeFxm/tA7ZEMyCOsU9E9gQbIE0hDTvj8N+CqF3gJY2OroItNzw8fqf6Y74of7okcSGYNsuIJVKoBgaMjiiusVjNMklkxssPoI0XXJie98yoXWSZ6EZKz3l7A9L8C+VR5nwsGsCQwAg10YpgCYT2gtlCoenApuIhOpLEGxJNlsWBNJupOQeLJW7tgpkbxV+EwuF2JhCsOj7gwY2rp/XMPVvLH6MMw2nMx+UrKnWln+1rtM6XOQ+NxkalC6yV/i6D4XfRxGdafdRlur3jjEiOWyzoAUrZdRkJa7a2DASjsJitjtx9pko1Sw4WujaLXG5hLb4EfuCAOgLAFnNmeyMC6+MoIftd2UbnEnDh6XipZTRhDvEusULwzPuX9FSZIjggrtbjRmgthnBK/qTWjN9/yubl01y/VJVroVQyGW3uNypVtIv42CcCujGMlI2i21lq+ym9vuqtD7XbTPZIzMgua6VK6OtlDNws6JWGdmVFZH23m0AIQ1KO1lI189Z+oGIRXUXlTrIxCl0DtTO7r61+N/foIt7ASwHp5AE958mQ5bkYVPEEsVdGaB5dpdLJuKxCQFeFSXWCfc4ODZ2V33fO2ky+7cuWGWcXJrXPvgbbYBZGCqEiWlNSIjJdNEZ69Inh3d+By7U1ifH0Y4qZUJOOW61XnYYQSNopyWNfKb1fncI6wWIOwB61gNinoHxCn6j8p1rV4D5DsqbU0j5MersElmaA78WpBUoPgix9L7u7C5asq49dnjI5PKs8OvbEMddx COyPx7YV /xUKF3tAnSOZWn6GUSCSMQ8W8a1EJlMJU61xOUvvTF58E7pT0RDaoP6C7WDSuQRcQNjby+tuF2E8LKZxP1pE+3cFAM+l1aMFZdYvOwlx4M3US23fPh6qjt9PFBRWSlCAXAAWD6a/T/ZlUrtyC0cQTVV+HhWqpLko46GO9zbJh6ipyJm7deDWX2VOiDthGBYyFxDiyz/5IXZcKg7PZMIjPWQj85BQm804eZBNHO2cuIPRMkSLrTzN8eKmVtoV5PIpp1rUOZgmjuoirR9uM7VSfJpldf4Y86FOKEQPRlG6mVZND1i+ugCiOoslEz0b1KMNnQpXzHs1n23qTJI5n7jLbtDEGqajAquBJYK85MPSlsCY4dyDDI4GQzoYmVoL6LxJQmvAnPMsz/4j7cWP/PkovyAjbNsoDhi/USDiCHs+/J43SyQ8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Usually the page cache does not extend beyond the size of the inode, therefore, no PTEs are created for folios that extend beyond the size. But with LBS support, we might extend page cache beyond the size of the inode as we need to guarantee folios of minimum order. While doing a read, do_fault_around() can create PTEs for pages that lie beyond the EOF leading to incorrect error return when accessing a page beyond the mapped file. Cap the PTE range to be created for the page cache up to the end of file(EOF) in filemap_map_pages() so that return error codes are consistent with POSIX[1] for LBS configurations. generic/749 has been created to trigger this edge case. This also fixes generic/749 for tmpfs with huge=always on systems with 4k base page size. [1](from mmap(2)) SIGBUS Attempted access to a page of the buffer that lies beyond the end of the mapped file. For an explanation of the treatment of the bytes in the page that corresponds to the end of a mapped file that is not a multiple of the page size, see NOTES. Signed-off-by: Luis Chamberlain Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Matthew Wilcox (Oracle) Reviewed-by: Darrick J. Wong Tested-by: David Howells --- mm/filemap.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 9a047c6d03e4e..eab1f12e7b840 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3607,7 +3607,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, struct vm_area_struct *vma = vmf->vma; struct file *file = vma->vm_file; struct address_space *mapping = file->f_mapping; - pgoff_t last_pgoff = start_pgoff; + pgoff_t file_end, last_pgoff = start_pgoff; unsigned long addr; XA_STATE(xas, &mapping->i_pages, start_pgoff); struct folio *folio; @@ -3633,6 +3633,10 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, goto out; } + file_end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE) - 1; + if (end_pgoff > file_end) + end_pgoff = file_end; + folio_type = mm_counter_file(folio); do { unsigned long end; From patchwork Thu Aug 22 13:50:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCD32C3DA4A for ; Thu, 22 Aug 2024 13:51:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 279848002D; Thu, 22 Aug 2024 09:51:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2043F8001E; Thu, 22 Aug 2024 09:51:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 006FC8002D; Thu, 22 Aug 2024 09:51:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D0EE28001E for ; Thu, 22 Aug 2024 09:51:04 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 99AA0416F3 for ; Thu, 22 Aug 2024 13:51:04 +0000 (UTC) X-FDA: 82480017648.19.DD13697 Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by imf12.hostedemail.com (Postfix) with ESMTP id C592C4000C for ; Thu, 22 Aug 2024 13:51:02 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=ZMob7FDY; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf12.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334572; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pD3Ip9aPdQ9e51emhRiUmYHymg6Mc2qCEYPgRqwWlxo=; b=HBFTTSKcZqAcyzQ7L8pIdyA+icWiCpXYADHhJ5WtzjUvJh9WEYrBia90wczxwix5TczsLc sS9+6Gg7eaPoHv3/E818rf4EXrC49ecyLFFacPzNXs0aXVznO5OBRyPelRWj69E35vdgQG sQKTUdetabVQN3ZuAOGab7b1lXsP0cw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334572; a=rsa-sha256; cv=none; b=jPWYFTYR3cSrX2sUIRaYLXegEDXxqsG/0tGHmf/2+lC59xjnKi99n05FqQ3IMEruuCaQjK HzuKQHRgFZMq1qJK0bI3Tfxokmf7WH5AvqwrtcAxy5fDMdhBrMDfIhj0CJBDomq53zlQk/ o95SMYfM2lv0dBhbfgU836ckz5/Xyrc= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=ZMob7FDY; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf12.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4WqPjv43nRz9sny; Thu, 22 Aug 2024 15:50:59 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334659; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pD3Ip9aPdQ9e51emhRiUmYHymg6Mc2qCEYPgRqwWlxo=; b=ZMob7FDYE+lhI6JG4HexlL05VAspa1JRUwCDmQRJj/A5r1kZWc4UD0yVvBjU/pAQZOh2ul O4SuqWb0onFo/HRZIvoPlMP/hDFZIOnijx18xKM6E3Hiz79nJMLtzjRCcuS5OmV0FH8iyl 3G8acOSnKK5HhZqRcf6qIzNjTVoQipNTYl0XBj5LXmucXWOg/Op0LW7hborfeVOeMmOpxk CA/8w6S+xRSq+OX2KQkQQR5arFkJxE+WRZZsZlqPH5FlRRVnHHs6sMmki+wDEzZ4YsIBMx FHJ7ktDsFE28pwe8KJds/Et3uhEi56dSEy0LldN5SWMwLLDBd7otktq+Vy6KAw== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, Dave Chinner Subject: [PATCH v13 06/10] iomap: fix iomap_dio_zero() for fs bs > system page size Date: Thu, 22 Aug 2024 15:50:14 +0200 Message-ID: <20240822135018.1931258-7-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C592C4000C X-Stat-Signature: q8uof64dhf1jbzjgr4rjcdc3wf3bccei X-Rspam-User: X-HE-Tag: 1724334662-402031 X-HE-Meta: U2FsdGVkX19YMxbNFYk+DksFalTxfdZkUw6PbQjRJbq7vjZ1ZJeaFMWh3ocYWaqcGxfAMd+qsPloZjJOhSGAJf2Wl7wlEnoldQ43zTDtxDyENc5YWNeATckznFsOJgMsBUSeMkLmJ6FT3L+LvM+2hxL3PJiuXEqFHecwkqmPkMM0GaRozapfcJRgyl0qiXxkw+rZM42Rv5xPPgLcWKcmoBHdPWstE3jmpVZEjS9kVnoI8BFpdBTuBTNQkmEg+zuM/2vuOjFx/bJqShDnl6FCg7y9V+FTnTyvBVtDig2f/M3atpIxo1+0rFi+3aXR6gIBC2hD7x7cbI5kdHnCNNGFlsIzEOCrMa1U/AQtro1QgbYamn8ifmm3s/tpvnqh5/2Ca0y1NbpSb2IBZ32eVMyb96qv+qHkEd6gHRVmJANc3k5TkCkBmOpp89g1pruWcxoM2OToGM7lZK9gTaiK3XRwYu9yfTFtm2nowY1inRIiXEjegZEk+MK3FhzhWp3ytUCrQrJFrecDbb3dpL+0nHjY6o3XXpesGYt6ofr6gMqx3rx5PpM9FSlMtzvOB8dN+8O/C92mXhYJKxZpczx1TAEyfDa7nIvLpXhA0ifj3x9ELI608lY0vMo8/dy4rMBzFfKN14OevKXinYH64gxhbu0S0sHmwNxh2t/3ncKWonOvxFMxpQg7EyZp0pPfGN87kjVjZSk+MiNlyRGnZncLT07N+bKzb/jIQaAzAYMiFKcw7YXOgPcN/9fQ8tpZzDEpQVw4DIx1tmcMklJ/6yphPY63w7ZvUjOv4KCCS+tAj7jaThl1cz8VGylHVM2bOFtUX8JZ4+4I4UjrGHXh876RgUTCIw1FWtfjPpoMS2PIJca99kCH+bip2hRSXk2Al58mN92KSGZAQiYYuTccTUyqd4AP809JBftB008k65kwONZWuZVHFU4PrPE323jaEYtz+De/2I/0nuarpxPAt6R6dsU IOEehP76 zJtD1KpZfqk3bZd5TO05Him+DXKjcFx/lCLWZw6GS4ilb7xI8Zx5+PqzgtjrUr97OEfMLEgJj2vPyU8E8Nox78Sg89/fnZ10ypDiUqL7fhmaFDKyd3UY3+LfRDDoIiN8cqa7kj+AeIOesBijdaAgKUjObaqTrAvyrAf+o4xRJbfVl5sNKJL9pO3pppsBaQ5B3BoVOg+Uxzc/FQqN/XHaQNdWhuLd2jFoDIzRlZRmUrzbtOQejR9KE/tJbILGejBtbjE4F5FI2MB7CBucHsiUqpYtQaPEdCufj3WV1P6c2rpKerSp1TMcVsjvhVZOCg6lQilggzKaK00ygl44iF5iDqrAc5QIdnQ2bzJe7Pp3CpTzi4QB6p+9b/m8GHxDBrNao20gc20Jn+IrJpM8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav iomap_dio_zero() will pad a fs block with zeroes if the direct IO size < fs block size. iomap_dio_zero() has an implicit assumption that fs block size < page_size. This is true for most filesystems at the moment. If the block size > page size, this will send the contents of the page next to zero page(as len > PAGE_SIZE) to the underlying block device, causing FS corruption. iomap is a generic infrastructure and it should not make any assumptions about the fs block size and the page size of the system. Signed-off-by: Pankaj Raghav Reviewed-by: Hannes Reinecke Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/iomap/buffered-io.c | 4 ++-- fs/iomap/direct-io.c | 45 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 41 insertions(+), 8 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 9b4ca3811a242..cdab801e9d635 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -2007,10 +2007,10 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, } EXPORT_SYMBOL_GPL(iomap_writepages); -static int __init iomap_init(void) +static int __init iomap_buffered_init(void) { return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE), offsetof(struct iomap_ioend, io_bio), BIOSET_NEED_BVECS); } -fs_initcall(iomap_init); +fs_initcall(iomap_buffered_init); diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index f3b43d223a46e..c02b266bba525 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "trace.h" @@ -27,6 +28,13 @@ #define IOMAP_DIO_WRITE (1U << 30) #define IOMAP_DIO_DIRTY (1U << 31) +/* + * Used for sub block zeroing in iomap_dio_zero() + */ +#define IOMAP_ZERO_PAGE_SIZE (SZ_64K) +#define IOMAP_ZERO_PAGE_ORDER (get_order(IOMAP_ZERO_PAGE_SIZE)) +static struct page *zero_page; + struct iomap_dio { struct kiocb *iocb; const struct iomap_dio_ops *dops; @@ -232,13 +240,20 @@ void iomap_dio_bio_end_io(struct bio *bio) } EXPORT_SYMBOL_GPL(iomap_dio_bio_end_io); -static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, +static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, loff_t pos, unsigned len) { struct inode *inode = file_inode(dio->iocb->ki_filp); - struct page *page = ZERO_PAGE(0); struct bio *bio; + if (!len) + return 0; + /* + * Max block size supported is 64k + */ + if (WARN_ON_ONCE(len > IOMAP_ZERO_PAGE_SIZE)) + return -EINVAL; + bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE); fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits, GFP_KERNEL); @@ -246,8 +261,9 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, bio->bi_private = dio; bio->bi_end_io = iomap_dio_bio_end_io; - __bio_add_page(bio, page, len, 0); + __bio_add_page(bio, zero_page, len, 0); iomap_dio_submit_bio(iter, dio, bio, pos); + return 0; } /* @@ -356,8 +372,10 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, if (need_zeroout) { /* zero out from the start of the block to the write offset */ pad = pos & (fs_block_size - 1); - if (pad) - iomap_dio_zero(iter, dio, pos - pad, pad); + + ret = iomap_dio_zero(iter, dio, pos - pad, pad); + if (ret) + goto out; } /* @@ -431,7 +449,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, /* zero out from the end of the write to the end of the block */ pad = pos & (fs_block_size - 1); if (pad) - iomap_dio_zero(iter, dio, pos, fs_block_size - pad); + ret = iomap_dio_zero(iter, dio, pos, + fs_block_size - pad); } out: /* Undo iter limitation to current extent */ @@ -753,3 +772,17 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, return iomap_dio_complete(dio); } EXPORT_SYMBOL_GPL(iomap_dio_rw); + +static int __init iomap_dio_init(void) +{ + zero_page = alloc_pages(GFP_KERNEL | __GFP_ZERO, + IOMAP_ZERO_PAGE_ORDER); + + if (!zero_page) + return -ENOMEM; + + set_memory_ro((unsigned long)page_address(zero_page), + 1U << IOMAP_ZERO_PAGE_ORDER); + return 0; +} +fs_initcall(iomap_dio_init); From patchwork Thu Aug 22 13:50:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F3D3C3DA4A for ; Thu, 22 Aug 2024 13:51:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F4DF8002E; Thu, 22 Aug 2024 09:51:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 054F58001E; Thu, 22 Aug 2024 09:51:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E11068002E; Thu, 22 Aug 2024 09:51:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BE4D78001E for ; Thu, 22 Aug 2024 09:51:09 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 41BBF1C556F for ; Thu, 22 Aug 2024 13:51:09 +0000 (UTC) X-FDA: 82480017858.19.74390F2 Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by imf09.hostedemail.com (Postfix) with ESMTP id 6B12D140031 for ; Thu, 22 Aug 2024 13:51:07 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b="bKt/GB0h"; spf=pass (imf09.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334650; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D613TVQCptAW9VynAGuVSOK0s4oeJXQu1SypShzHmM4=; b=7PbizbAu0oZCtnanY9e3uDkUlEOEypSnBVntcTs3XCZGtfammi6O7d7Oi0gWXqIzflOZBX 2ey2FgJj3JMkBlEXAWzGcnzSYOjn+/F0159rrWQ/uHo5oHCWpajeDlb7mvecY0WItrQNbO Qgk31udXWrA3Z8BoFOEzP+JOG8H24qw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b="bKt/GB0h"; spf=pass (imf09.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334650; a=rsa-sha256; cv=none; b=WDH5F0a3RMiw1nuGLP0xZs6k9noHeAjA/KR8O9yAus69ATSapSnwDcVmFb9o+3BbcKw2z+ kXY331JOugQYN2LKz5fiHF9uWZvFzJpZ8QcuuuXLnkNzhUBdTVwE/sDryvKo5CIikd1mrM e2gBP87EI/lF52JNIkKEmF9dtOAdnZY= Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4WqPk01Njtz9sW2; Thu, 22 Aug 2024 15:51:04 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334664; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D613TVQCptAW9VynAGuVSOK0s4oeJXQu1SypShzHmM4=; b=bKt/GB0hYx+LKsC2vHy8ejt7D1pKON4lS+0/dv3nc15QJK6n9gyaWwk2SNmA0VgUQ3o1Sy r8d37LoR2T1fmY74GQ9P+bHUMqwJ/o/rJqpYoCObbHu0Ic4VYGbf+hqehCPcHFJfWpeTBe /X7PkbCiFiGW8XdH8Fz/NDI1NNIV/o+WhxMr9KQUNEHEQlkhvxA/wO4dJ/2YtpPe1ToWXO wmg66mqPfzHmvQHPZT4JNtFbVeqvSCNLU2Kq3gIBn+KQ9PQVo9SrIPjlm9BKn4NS8HmMdv d44jbDgbvG+h0In8m1/1KxM3Prunl0weJwgPes/0rcx++PIe2ouQNgQQWYplEQ== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, Dave Chinner Subject: [PATCH v13 07/10] xfs: use kvmalloc for xattr buffers Date: Thu, 22 Aug 2024 15:50:15 +0200 Message-ID: <20240822135018.1931258-8-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: y8h7aznhipugoku3tqacodpz9nmhbcsf X-Rspamd-Queue-Id: 6B12D140031 X-Rspamd-Server: rspam11 X-HE-Tag: 1724334667-721664 X-HE-Meta: U2FsdGVkX18QrXB5VUbXs5QYi6gar+nSLQHaHkems3s7nFbp5FKdLEn5lg+/7N+fgWUE3c8LYCz3rjkTbouOx6eqK3mAYFHO9W3UOMaxgsy78FHqbpZY7zZJ+Bzpgj+BZAH79rEC+KmS5V0++FvIpR3uLObQ8h+Gaab3bR5dJdq6r/s9m5mpi3akzM99WjdoxVYAQbOg5nIqTIB4FoDVtJ3VJZ1yaeAK9myRYqRNiodd2Muo+yqj5EFllerD/1dyWFPir0e20yJcT94WtToTCTnn5VpfasbHJEUD9UjxOdQVfLa5j3B1EyVrXATWAWLrvifudWIm2XHeUk7syf9Bzq32/uhCws0RnPNBF0kdxhyGlzCeRFM9jzGWf4hFkTpRLW2/ZI6LSL6kDLdaBbovPX3IGPXsmtJGagBk+5x/qKXLwmLksDUqwaOwb36U76KWXVQY1hs01WTCE3sHX2+RcFmgEswQ9Jte1recruX4fleseJwxwjXb4ljXMDAkXBFVREfjHohVPnliBuN5SSZyhfXqTxjW9Awwc8VD4NTmpzjJcCGIRfoFEZ+6UJvpnuhGnHWSaZIAdzI+l5FfL3p5FMjPHYBkXq0Jn6LyQbPFLBVKlQE+ZyHTItb/6Q+NQRyWgJLcR9kbmmQzeEzcDVpelzycvbBzGWcWhxCTpT123t52lCN+SV7X2bIX3aaEJcvO4Rqu28ZuSyLCy0CWxnKzNzhC174dbHwCYFAmeR7o5D9VHiuEi7ul3Xu1IwKsuByr02IscI1v+3Y283C6yatgK1hFpWjnhqvopIq5EWn7vIb7tHCaKWYjFY/MVNPqqkACG5p6eVX8lM7asQbNmeEIpt9NZmC5p29MKUniD2fd2nKudLzYS64gPNim/TjdsacPFe0KakPutjN7oVt4Pbqm8V0OnQ0SQQWsXjJj+MxAcz/NSOCguGaV+KjpPccXszxwYvT4rS28oCR4SGsr7eO WekwTihE b1Qgsl2pL7JiPxVrpEeKc1uDev18ZofjF/SV+sb6xmMgKmoq8fiAIm3x8CCywQtVjVuhtuyR2ZFV3se66RSv9syAsAJOT8MZp8tAUPxlJ/1GZhH3sfyP8p8mINDVe+/mVVHZ1gAFNP97RX1SPE09UQfVg0kOs6Vo09v1gX67radKjY+OisypUiKtJFLn2jMkMipayuCdNjSUEB1SKBPDeMtreEDnSFGbYfK+gc+g1Lmht381e0HCr2A9E1wkHWf3uL8qJao99imYlNbnQXVvKZxEsZ8GXWy7hq2gS/1cUSn0amrbACG1tN2a/lTmRozn4zvHeDok4xUuVOf7n2yEfWQF08T63jGU18FHpoka3UYNS3hUa4l+bxW2DftrzcPug2yBG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Dave Chinner Pankaj Raghav reported that when filesystem block size is larger than page size, the xattr code can use kmalloc() for high order allocations. This triggers a useless warning in the allocator as it is a __GFP_NOFAIL allocation here: static inline struct page *rmqueue(struct zone *preferred_zone, struct zone *zone, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags, int migratetype) { struct page *page; /* * We most definitely don't want callers attempting to * allocate greater than order-1 page units with __GFP_NOFAIL. */ >>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); ... Fix this by changing all these call sites to use kvmalloc(), which will strip the NOFAIL from the kmalloc attempt and if that fails will do a __GFP_NOFAIL vmalloc(). This is not an issue that productions systems will see as filesystems with block size > page size cannot be mounted by the kernel; Pankaj is developing this functionality right now. Reported-by: Pankaj Raghav Fixes: f078d4ea8276 ("xfs: convert kmem_alloc() to kmalloc()") Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Pankaj Raghav --- fs/xfs/libxfs/xfs_attr_leaf.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index b9e98950eb3d8..09f4cb061a6e0 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -1138,10 +1138,7 @@ xfs_attr3_leaf_to_shortform( trace_xfs_attr_leaf_to_sf(args); - tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); - if (!tmpbuffer) - return -ENOMEM; - + tmpbuffer = kvmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); memcpy(tmpbuffer, bp->b_addr, args->geo->blksize); leaf = (xfs_attr_leafblock_t *)tmpbuffer; @@ -1205,7 +1202,7 @@ xfs_attr3_leaf_to_shortform( error = 0; out: - kfree(tmpbuffer); + kvfree(tmpbuffer); return error; } @@ -1613,7 +1610,7 @@ xfs_attr3_leaf_compact( trace_xfs_attr_leaf_compact(args); - tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); + tmpbuffer = kvmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); memcpy(tmpbuffer, bp->b_addr, args->geo->blksize); memset(bp->b_addr, 0, args->geo->blksize); leaf_src = (xfs_attr_leafblock_t *)tmpbuffer; @@ -1651,7 +1648,7 @@ xfs_attr3_leaf_compact( */ xfs_trans_log_buf(trans, bp, 0, args->geo->blksize - 1); - kfree(tmpbuffer); + kvfree(tmpbuffer); } /* @@ -2330,7 +2327,7 @@ xfs_attr3_leaf_unbalance( struct xfs_attr_leafblock *tmp_leaf; struct xfs_attr3_icleaf_hdr tmphdr; - tmp_leaf = kzalloc(state->args->geo->blksize, + tmp_leaf = kvzalloc(state->args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL); /* @@ -2371,7 +2368,7 @@ xfs_attr3_leaf_unbalance( } memcpy(save_leaf, tmp_leaf, state->args->geo->blksize); savehdr = tmphdr; /* struct copy */ - kfree(tmp_leaf); + kvfree(tmp_leaf); } xfs_attr3_leaf_hdr_to_disk(state->args->geo, save_leaf, &savehdr); From patchwork Thu Aug 22 13:50:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773529 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80B87C3DA4A for ; Thu, 22 Aug 2024 13:51:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 142046B00CC; Thu, 22 Aug 2024 09:51:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F0EC6B00D7; Thu, 22 Aug 2024 09:51:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED5318001E; Thu, 22 Aug 2024 09:51:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BDA146B00CC for ; Thu, 22 Aug 2024 09:51:13 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6108C1A16B1 for ; Thu, 22 Aug 2024 13:51:13 +0000 (UTC) X-FDA: 82480018026.17.82FA5CB Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by imf02.hostedemail.com (Postfix) with ESMTP id B4E4B80008 for ; Thu, 22 Aug 2024 13:51:11 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=QSmqQrxN; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334612; a=rsa-sha256; cv=none; b=2c5DPU8peoxF4bDoBUl13rUawbk+0DrnkN+sBwa09aRnMaKIBLtHMGBs3u4iz4/QMTH0ES wTcRjUD0FHD5L93V3pc9xody9FkI+bCxNrcq03xhFVbaNQQehTshQGVsMjssfNNPhHCOgc Y0dLQUEqgjXYyPtbLaVlTn/95+x4R70= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=QSmqQrxN; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.161 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334612; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OFCQeRoYKCpIwc/V7VVfVT8xrzJ1wqzaVy/cCyGH2d4=; b=Zuj7auW99irp322Mk+Cztf1gkirMBg0u+8CbFaqMTTGQ6hRiWZWMUka9vWEgfO3lbRSkq3 BimLNSuzoAXvhXJw4ph0dg6l1Gu+HedazMj6gPiFuEQJudLnEY6ciHy9ZSdWfD8YgbN651 bhjE5NO71miOw8ihzp+56tQ/M/ddFOQ= Received: from smtp2.mailbox.org (smtp2.mailbox.org [IPv6:2001:67c:2050:b231:465::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4WqPk42jmPz9t4y; Thu, 22 Aug 2024 15:51:08 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334668; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OFCQeRoYKCpIwc/V7VVfVT8xrzJ1wqzaVy/cCyGH2d4=; b=QSmqQrxN/St3SlPpugnj/7VLtEgFH1rnm6MAu+uf+ewxUpJh1ieYAOC0IvJLtJMtB95I+E 5YP8PRjKedI61w5GTNc255JZOUFU86S3qu8aTpRUvPPBcdQCpmKjR6xQNo46fbdX9t3e4a As4fA4lrcqRmBTZXiywIKfveS7VCsXBnAnH9v0VMBfSE9vzo5IaXoq9t7YGtb9JMSyXDQa 8WOwwovh4ymZA4ygmCo87eXqRdYvQW9yuZPqAalL2vJa8Hk26WoNS6MJ8Cxq8B+SzFaI9y bt1uOGaxPJKuhni6YRaqdR4eP42qXtReewloa616gce2nIiz59bL++/23JWx/A== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, Dave Chinner Subject: [PATCH v13 08/10] xfs: expose block size in stat Date: Thu, 22 Aug 2024 15:50:16 +0200 Message-ID: <20240822135018.1931258-9-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: B4E4B80008 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: r4gf3yoger1j8awna6kqowxgpw694wkr X-HE-Tag: 1724334671-911486 X-HE-Meta: U2FsdGVkX1/60wXjzAYhVZ0ObIVTtyeGvJMDxvkKhubaquuViNsBvaptvlFpK5Dpk2K97bpVmh0gitH38b/2Z357BqUk+6VW3sHDU3ZYo2+tyF7ZBz7xI71LVcz4qwN25vrnRBPZ2RjIdByWtw+J3Su44HFS6DB3KazmVELbZsEXUdmwGDehT6AZjPPSAhyUDy7pxJpM/V3qSRrviHdY0TQ3ppFhhEVESIz/AHzjq/4LXm/JXRKEL53hmzBvCqdlPTrQfQntvabgX1c2+oRQEK3tNi13wOuwmUiBF4u6JxN/T70ib6XFZqVYPZDNztt4nElFdnA7HIyWGfA5ER4xNH12RKxnum73zQc8fWDuF8S7p8Ime+VZpPw3skeiNDQHW8Tg71s0iHZCz2K2Ytyam9Ls+XovIerixoHJWfWsago+GayR1R5e4OjV1Ju3dJZKouGRhjWeC8oQ8bCp3F981yb0eG9/ptFJOXnb6mLTX91mmcD9ok613UMBlbIWu0CVXMBc7bAJmNF9aHWPeq1y8aXQKh2P2vI1i/0BTFcNFzx4/y9Vu2gIBPYent9MoPbk+q6oZ3aaxKjKi9LUhiN01KAqxxJkIMmTzKauWsjPbc7m8CgjThoHBzFbMCgQ2VjKT3gonutFUZKA/ETW5ujtibmDQnFoTtMxU1MwAZklyrmDMa7ZP41/qmnphgD8u3fEPUrPndXQXWLsvdNH+mDdVexuT81lov6isUotJxVFBYwaQJM6ALmaU8p6WRBh5s4WQ4Pjja1oanTC3tq4bZMjirpK1xj+p0p8k39yUBFZ3xJijeRkwYJYoe3kaBpX+F9+dmbPmtbXPqjL1h3iI9GG92HH8wGRAU+fmR8oPuwJfmAgpdonfuD0tR86dz+28WSdhMGx8jv/FmIk6ddhzjUU9FDhkHt0JL90KTMZyT9lso/AGdgKzL6TZGquBHnMXml5Cd0PF+RFVBRYndKPRAD Y021V3kV P57QykJLOPeoKIfrinyfMglt5UByTj24lrWM0nImSPuV8QBtwCbIrwuEY+ysZYCMbiqqMPVgHLillEbAfKT6YoY05Q4E43lxt92UOSytyExYSS3fkYFznCAejkeIJCn+J/6t6di750toWz4KsBlxFyx2pRekL46JdazQyqMBQrT+XpH3yQxzoyQex2ddYjMM330HZsBbMLv/uIvWpEag45fnKzlTd7TRVIsc4V82z8FNeky/YlcANMj0h1bQw4jOoyPpiKShAHtz4/DKLjLAvqD9/B+cNSuWfXQbxQbr+gnYCVESrvcIW3HtFpMi2bFU/kdCEvJ0SIwkEAsRM39rXrKNk7edDqhX/LqVDkaw1BqFJ+HxNMZ5+UpXx4MJ4PSybtOdw3eAWtdDp9jaxMQSZCW3Sl1rGY+qjgM0zS1bMS0rzVduoMyCbKB2qtZ8Xqyj+I3yinDRn8pnNJFzj6RI8t8A05w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav For block size larger than page size, the unit of efficient IO is the block size, not the page size. Leaving stat() to report PAGE_SIZE as the block size causes test programs like fsx to issue illegal ranges for operations that require block size alignment (e.g. fallocate() insert range). Hence update the preferred IO size to reflect the block size in this case. This change is based on a patch originally from Dave Chinner.[1] [1] https://lwn.net/ml/linux-fsdevel/20181107063127.3902-16-david@fromorbit.com/ Signed-off-by: Pankaj Raghav Signed-off-by: Luis Chamberlain Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/xfs_iops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index a1c4a350a6dbf..2b8dbe8bf1381 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -567,7 +567,7 @@ xfs_stat_blksize( return 1U << mp->m_allocsize_log; } - return PAGE_SIZE; + return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize); } STATIC int From patchwork Thu Aug 22 13:50:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773530 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F785C3DA4A for ; Thu, 22 Aug 2024 13:51:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8BC998002F; Thu, 22 Aug 2024 09:51:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 843838001E; Thu, 22 Aug 2024 09:51:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 697638002F; Thu, 22 Aug 2024 09:51:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 499318001E for ; Thu, 22 Aug 2024 09:51:19 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F2B051C55B2 for ; Thu, 22 Aug 2024 13:51:18 +0000 (UTC) X-FDA: 82480018236.13.DDBC0DF Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by imf02.hostedemail.com (Postfix) with ESMTP id 4F2628001B for ; Thu, 22 Aug 2024 13:51:17 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=TFVIL2wP; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334611; a=rsa-sha256; cv=none; b=BdAidgA9QJWOEzkeehkfQt+d9sqkum3K8LFY8uG+3FUPFIVLMixiL8jNLvuDEbY81vQH1q bGJqLQ5VVeymOW1Oyc6P5MT2IVat5w3Twbo8ewtxmlIm1I/dzjoBQkYNZaYrSE22T+92Qz TqLSVNcT6f/C5bk9R9RoyNWrETm5kzk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=TFVIL2wP; dmarc=pass (policy=quarantine) header.from=pankajraghav.com; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.172 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334611; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8rdOu+47Xth5hXKbmOu06fRFiMFY0Pwf2tkrEBizRPk=; b=1ozFSw0VKp3lEYXFYZx/2ZlNzn9N/XDTd1B2LuBlJ2IvU4hPxkgUyzEalkQq44YE7bjEAQ ulQndxNDGwQRtRQzgYudK7V+li2PCBtVbqDG2XHfW9BlyFZ1SXn6OPEzl5Y86BeBKs4Nkq q0lrchilOhaC2IBvje9wcFdd2QJ7k9Y= Received: from smtp1.mailbox.org (smtp1.mailbox.org [10.196.197.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4WqPkB0fTWz9t9B; Thu, 22 Aug 2024 15:51:14 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334673; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8rdOu+47Xth5hXKbmOu06fRFiMFY0Pwf2tkrEBizRPk=; b=TFVIL2wPAsMgSBP+VSo4hX8ZaDG/obLodnzi7AQrGoZhvWUcCm54/Pk/SnjurnYnn6DHHn 38k4im4iaU87lWQJZTscaY35ujnCbQHLGJtLOy/BkGiPKq5hK565nW0/1J84B7Tp9O5trG rZxTkr8gmKHQVOT0PxEdNhgDGiDwNW50ejNpocmMvxotg9dvxejbrdh4xhIK2tbidOyMPH DpOt+Kr0H8aIc6wW5NWQfofcgUu9u6hxDTzbK+RN59gH7OcI2aYlHFhjwsJ4V4TriGYlCv J/IbBakQr9Li0ijK8nTSg147afO6Yp2EkKqBOQ4bThDUNRhdhtiwCcwWFAZZkg== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, Dave Chinner Subject: [PATCH v13 09/10] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Date: Thu, 22 Aug 2024 15:50:17 +0200 Message-ID: <20240822135018.1931258-10-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4F2628001B X-Stat-Signature: sexrz7sedhzfgoyekkkuubfcs53xftcn X-Rspam-User: X-HE-Tag: 1724334677-584969 X-HE-Meta: U2FsdGVkX19SndS2xI3hfqD4CBTGUp/G2puC+iAxX4q1SsnmveC1lNe8DCg/Ne0fhCHTKd5g+tEfbYyhuknNVxv/EV55MsTM0OO9C7+3A3iZj0Sg1xqx7F6JXlV2GgDrHmwLu1AwUJ/NWuffHi4BnW1znBN+yy+7Ww3k5ZWib4CzTS3v3QKefadiawhFF+E3Sf9pmjlTslvwvyMFoPd5+A+4iKD0/G7Hy7HcHu7iioFkNAPugeG6v+Z5b3IFMUIjxC6c8zeQ58Io/iv5UEzECazH6RNALRUyuTadLndZZM1AHsE+FcHQOkBkGdJtCSHoPu/dyyOJTNWwZnHev+4qVtv6ULQ30O5YTpHvio0eh9DV/AnFh+UjvdqI8LoajzTfOET5hRuC6xTphQoym01Ydr7a8q7j2FDxJAvS8am1aX+iESrfTrKAt7E6Q8QcrnjoZAAingLQt96XMX0TXflsaluoYzUgr6Cf0rPk67TtMBq/6KSdqC1Dj0fdjb8Sw96lDcVtHK+VT75mN3wQ++CQVYfiZaX6tA6a+0YOrR1a8PPa5Xh638eF/OmFO3qxxMej71pIfpACbdbReNYv9gIoFuzjYKa2xgSfR/650yqs2HegonXaC2oUAVgAENtYuwTRhd7sVa1x8GmUWwT+7q1w509fCb/QgBygfbLHT0RjZbpRFMZIGslcZZXfqokMVPU03Cqo9kCDpCUATRTcb0GCallfnagGsO4LCNW3KPoBPtPAMWdGEMaXbRPdD2D0UBg7ZgupOsAfWI3qseB25L9N1phybjw6FsAIue0h0KADhjOWpjAOQCooGV2vRhrBQvGuT2AroXSoD1cm7LHbOGtztDoPCBWOPIvjov8gEadbX8TezCfh7ZvpZ2QSeqVXUYByk3ywJIiOkrveZffOLg0LqgYEVGiGOQV9dbtKbgnvranSn+sWJILZL+D5HJTeUrJraq3EFkI4KFTGpt/IeZm xtPTNxyZ tRpd7uSe+mFE+inmzfghHFKcibGTAOe4yGqbLKTtsxTF0gubREousRMJ9j2DpiAwdFDOc7jzLojPCojbJ677/Psp6l6SAHALXSV7Gtxt/0TUJUU9s/eVHuKhQx5u/jSMPErEOC6i81XafSSmvEtGsVcTdcgugnGX20TnZs7CeqPJhzCRqnHWOW1x7KaLXaZW5kFRwbfJRH3ceHA1Q9Pp3ZG4BtxySeS2rvXIz82bDjYzzmIWoo6qe3kDiVm9b7xN+08ZxHzejHoFLnCYHI1LSdlCc0y17x42MZ4XpMXOBzpSkTIBUVXzQeROsnCsZXeZdpsizVf7B8aRMe4+SsE+iHGc3lL4WuML1b1V+FuX0/QoRNAGcaUXdCWiukA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Instead of assuming that PAGE_SHIFT is always higher than the blocklog, make the calculation generic so that page cache count can be calculated correctly for LBS. Signed-off-by: Pankaj Raghav Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/xfs_mount.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 09eef1721ef4f..3949f720b5354 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -132,11 +132,16 @@ xfs_sb_validate_fsb_count( xfs_sb_t *sbp, uint64_t nblocks) { + uint64_t max_bytes; + ASSERT(PAGE_SHIFT >= sbp->sb_blocklog); ASSERT(sbp->sb_blocklog >= BBSHIFT); + if (check_shl_overflow(nblocks, sbp->sb_blocklog, &max_bytes)) + return -EFBIG; + /* Limited by ULONG_MAX of page cache index */ - if (nblocks >> (PAGE_SHIFT - sbp->sb_blocklog) > ULONG_MAX) + if (max_bytes >> PAGE_SHIFT > ULONG_MAX) return -EFBIG; return 0; } From patchwork Thu Aug 22 13:50:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Pankaj Raghav (Samsung)" X-Patchwork-Id: 13773531 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AE83C52D7C for ; Thu, 22 Aug 2024 13:51:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BF03A6B014E; Thu, 22 Aug 2024 09:51:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B9F278001E; Thu, 22 Aug 2024 09:51:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F3076B02FE; Thu, 22 Aug 2024 09:51:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7DA406B014E for ; Thu, 22 Aug 2024 09:51:24 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 32C4C120611 for ; Thu, 22 Aug 2024 13:51:24 +0000 (UTC) X-FDA: 82480018488.24.8E01A82 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by imf02.hostedemail.com (Postfix) with ESMTP id 5C5998001F for ; Thu, 22 Aug 2024 13:51:22 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=VegWij3r; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724334665; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uKuJPTxavc2u9bTAhiaeRY690QnbuCpo0c6Cxa2GBYk=; b=bgdAufzRIg7EzT3mduHlWkC9sReWFtGuHPfI3wfACrQKXqfgd9gkFr9aSAerx0i63aNs3Q CGrrnpkTmzpSkP+gxfNLnFHibtFdl+J2IEgJHl+3J695xAlzCe5EuLOj3SaZgEO11buegm 2EjAM9LREDjjWklI+111s6kShvkujIQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=pankajraghav.com header.s=MBO0001 header.b=VegWij3r; spf=pass (imf02.hostedemail.com: domain of kernel@pankajraghav.com designates 80.241.56.151 as permitted sender) smtp.mailfrom=kernel@pankajraghav.com; dmarc=pass (policy=quarantine) header.from=pankajraghav.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724334665; a=rsa-sha256; cv=none; b=54KN5v27fWkpaU7+5sZgZd1iPgzYBqyJ6JM/xtjUWEbiWBMhBwc4+UBUOeV7u3T4oLyLcK lZocgWxUPW93N+ZIMedNIQmb63+OrGnR35ZNfmfFEcbuIWGDZ7D5L4Ew3ex7Eb3t86ZaVw 8OyaXfjBZg4Mem8XnYz1FIJFzGawul4= Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:b231:465::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4WqPkH1DMrz9swN; Thu, 22 Aug 2024 15:51:19 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pankajraghav.com; s=MBO0001; t=1724334679; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uKuJPTxavc2u9bTAhiaeRY690QnbuCpo0c6Cxa2GBYk=; b=VegWij3rLaHj8dGmARrjjens56tR5/JYIcYcwfp+ooIquZOFoXJ2L0sBQHuJh7uOqWy2qT 9knWMKf8O5hBLnDbIMuM5RSRQeW4ESx7KNzXJDkfAgLizL/aamh7xaayXb5I+TKBqblHzf cIXsZyKICHnGWkeZJS/tFzwOuLTB0X9pxWcXmf3mfB6n+HHFLWiRsg5F6e2RzafYBNew5c 1S5SmBJj1rFEDVDs4aPfV8Pmn8XoOpy+l77RYQ4AcdRHyHWioUmEwNeJcNeD3IzrYypLnr NQG2n4nSntJR8Qpyh4VnlU974HMgHA9t1T7tTw4I7CGSrT7PPCrMOWkXT2O13A== From: "Pankaj Raghav (Samsung)" To: brauner@kernel.org, akpm@linux-foundation.org Cc: chandan.babu@oracle.com, linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de, gost.dev@samsung.com, linux-xfs@vger.kernel.org, kernel@pankajraghav.com, hch@lst.de, david@fromorbit.com, Zi Yan , yang@os.amperecomputing.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com, cl@os.amperecomputing.com, p.raghav@samsung.com, mcgrof@kernel.org, ryan.roberts@arm.com, Dave Chinner Subject: [PATCH v13 10/10] xfs: enable block size larger than page size support Date: Thu, 22 Aug 2024 15:50:18 +0200 Message-ID: <20240822135018.1931258-11-kernel@pankajraghav.com> In-Reply-To: <20240822135018.1931258-1-kernel@pankajraghav.com> References: <20240822135018.1931258-1-kernel@pankajraghav.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: tiax3tm4tkxrnprmsrigjajjte6erwq7 X-Rspamd-Queue-Id: 5C5998001F X-Rspamd-Server: rspam11 X-HE-Tag: 1724334682-957622 X-HE-Meta: U2FsdGVkX1+z3CfBl57rulxNOodBnbeWx7yvSBSIMRhJnBX2TaI7sZTeC27My3I7lA8SaWqqIQfIyFDW5PbbWd+z/LYjUiTvoVdqh/7KVGyicg8SOlKUU9Zewc9zHDw1lraVmln2oqD28qQ6BTQOy0u2leECXhQwGw7AjWupexyObMoBB9ubEM/WD6sJ3xLL18snQvynllZ+EFR8oCSHAQbRJ4Ou54jvLy8KkCRRpJojMuWQqrAIsb3JqCjpaZfGPaEBPfTDnwpcSLKyzjoIngbpReSxi3l5icTUTD+K9GGu3y5r0gPGCGzx+GcCpzC7X5gqeag2u1hz6jX09qGZoH+JLnEdLQ+yZ/173xsawgLrCNFhyHQ1hdwr0ebglyHz0MNL7+d+4xt0eCFJAxA0oKsDJxW4oBh26SPYh5HyxKGFy8gTSEh0+epBy1IjF2T6q7WGWQO13EkHg35SJgOjXL4v7WHbdlREGjQle8CA7D2O/EADXi5pa88f76vCc6lUg9/HRXRNPrZVFcZsJdyVRcUJDPfs4LDnVP4CEhEM97133Bt3h9wGWDUtpiEEnQczB3lxUFQzMLD8xv3sXOHShannJoWwpP7wUjj5aJ0FgIZX5SuWgV1KUKs3sPOIgQLM+A2dzPHzttvdEunqnCNiEvO4VVZPwtNbjHwmRCBE+lG7+UFcL74jp9PQQmw9NaNMvyBNp1xb+PEklZ5qxwUry3vCYEckJ+xba7nADmUNocou/b0jcArfhbUfChPrm+uuc1XrwUE0ZCBZkJAI0WInfltS31+LLD6X5zTpTnLdWC6wfqw6uzyuwBgU/YQxOyTeZ0Nh4SnVsygwTZmJEgTQtioDPMDQlef/uKxtwe3yxrLnrbPNwvMiT3ZV5Nv9DBuqJabJ2GjNoF4GhGDuT+ca5SqQHX/bXyDdACD/U0MoAZGcxntbJ9fl05oTUAbXW/vPuCQ63N5WF0ysvMfvVzd 0gFHXA0e oeGtXwrLWubBAcmhKrsLf3mDJX5K6dXDOHvhx3c1HzmiV1kaSkW6edTFw/9vSOidLqAGNUDi3B6BV1ctyBBDN799Il+dy6Tt8v0FE6wlIfg5uoiGM91VDTECz2knTW7AP7Qw+wgyqQHZZjWUO+0SqS58KdMviXg01pu4uuVljRL3ajBpyo15e0etw/TMNa+jzWVenu2yimxpFmYOmRi5clgvzQQiHGa1iwqC0dTRnKAe9TRENR5ycP/e228v8ZUl+QnwxfhjY9ndGVng5ghehPho53IQOn2nfuwqFaw0C+y3zdPxFma/DX8V4eFFt+zOyeVM/OElVIg7PhRl+V1b3k2a6PA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pankaj Raghav Page cache now has the ability to have a minimum order when allocating a folio which is a prerequisite to add support for block size > page size. Signed-off-by: Pankaj Raghav Signed-off-by: Luis Chamberlain Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner --- fs/xfs/libxfs/xfs_ialloc.c | 5 +++++ fs/xfs/libxfs/xfs_shared.h | 3 +++ fs/xfs/xfs_icache.c | 6 ++++-- fs/xfs/xfs_mount.c | 1 - fs/xfs/xfs_super.c | 28 ++++++++++++++++++++-------- include/linux/pagemap.h | 13 +++++++++++++ 6 files changed, 45 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 0af5b7a33d055..1921b689888b8 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -3033,6 +3033,11 @@ xfs_ialloc_setup_geometry( igeo->ialloc_align = mp->m_dalign; else igeo->ialloc_align = 0; + + if (mp->m_sb.sb_blocksize > PAGE_SIZE) + igeo->min_folio_order = mp->m_sb.sb_blocklog - PAGE_SHIFT; + else + igeo->min_folio_order = 0; } /* Compute the location of the root directory inode that is laid out by mkfs. */ diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 2f7413afbf46c..33b84a3a83ff6 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -224,6 +224,9 @@ struct xfs_ino_geometry { /* precomputed value for di_flags2 */ uint64_t new_diflags2; + /* minimum folio order of a page cache allocation */ + unsigned int min_folio_order; + }; #endif /* __XFS_SHARED_H__ */ diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index cf629302d48e7..0fcf235e50235 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -88,7 +88,8 @@ xfs_inode_alloc( /* VFS doesn't initialise i_mode! */ VFS_I(ip)->i_mode = 0; - mapping_set_large_folios(VFS_I(ip)->i_mapping); + mapping_set_folio_min_order(VFS_I(ip)->i_mapping, + M_IGEO(mp)->min_folio_order); XFS_STATS_INC(mp, vn_active); ASSERT(atomic_read(&ip->i_pincount) == 0); @@ -325,7 +326,8 @@ xfs_reinit_inode( inode->i_uid = uid; inode->i_gid = gid; inode->i_state = state; - mapping_set_large_folios(inode->i_mapping); + mapping_set_folio_min_order(inode->i_mapping, + M_IGEO(mp)->min_folio_order); return error; } diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 3949f720b5354..c6933440f8066 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -134,7 +134,6 @@ xfs_sb_validate_fsb_count( { uint64_t max_bytes; - ASSERT(PAGE_SHIFT >= sbp->sb_blocklog); ASSERT(sbp->sb_blocklog >= BBSHIFT); if (check_shl_overflow(nblocks, sbp->sb_blocklog, &max_bytes)) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 210481b03fdb4..8cd76a01b543f 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1638,16 +1638,28 @@ xfs_fs_fill_super( goto out_free_sb; } - /* - * Until this is fixed only page-sized or smaller data blocks work. - */ if (mp->m_sb.sb_blocksize > PAGE_SIZE) { - xfs_warn(mp, - "File system with blocksize %d bytes. " - "Only pagesize (%ld) or less will currently work.", + size_t max_folio_size = mapping_max_folio_size_supported(); + + if (!xfs_has_crc(mp)) { + xfs_warn(mp, +"V4 Filesystem with blocksize %d bytes. Only pagesize (%ld) or less is supported.", mp->m_sb.sb_blocksize, PAGE_SIZE); - error = -ENOSYS; - goto out_free_sb; + error = -ENOSYS; + goto out_free_sb; + } + + if (mp->m_sb.sb_blocksize > max_folio_size) { + xfs_warn(mp, +"block size (%u bytes) not supported; Only block size (%ld) or less is supported", + mp->m_sb.sb_blocksize, max_folio_size); + error = -ENOSYS; + goto out_free_sb; + } + + xfs_warn(mp, +"EXPERIMENTAL: V5 Filesystem with Large Block Size (%d bytes) enabled.", + mp->m_sb.sb_blocksize); } /* Ensure this filesystem fits in the page cache limits */ diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 4cc170949e9c0..55b254d951da7 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -374,6 +374,19 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask) #define MAX_XAS_ORDER (XA_CHUNK_SHIFT * 2 - 1) #define MAX_PAGECACHE_ORDER min(MAX_XAS_ORDER, PREFERRED_MAX_PAGECACHE_ORDER) +/* + * mapping_max_folio_size_supported() - Check the max folio size supported + * + * The filesystem should call this function at mount time if there is a + * requirement on the folio mapping size in the page cache. + */ +static inline size_t mapping_max_folio_size_supported(void) +{ + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return 1U << (PAGE_SHIFT + MAX_PAGECACHE_ORDER); + return PAGE_SIZE; +} + /* * mapping_set_folio_order_range() - Set the orders supported by a file. * @mapping: The address space of the file.