From patchwork Sat Dec 14 03:10:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908286 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92CC5175AB; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; cv=none; b=AR13UTyByvqApnxOVpl4Y30yYsF8JWPkSR8MHRWMbo6bGESIJTjRZDPBCgdfROWO8FDZYDJwIfQjIs1vDRwztzMkwks2gJj5kCQUPfs0geNp5/xinNFtCDEZFNfXSzInzIL8ZsiNXJ5II8tB+iPfN+K/m1E8r5AbqKjxyHJzsUc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; c=relaxed/simple; bh=Cyw294pY8t2yQLICdjQY6l3hUilybp3zGt9Q1IBsli0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LU4/0g1J6Jdm3JvcoqbC+9RTCTSriA6q3WX+LVh5WkKL5AjW6OGg+8HI2dcGYqL9ALjPRcWlaBv+d1U26h19k1Tt5bUYdGaDy7AkrfSR8DiKhN+vIlTcrYzJewhRdsDr29ytxbOphwQaWRDsCgGspiHpxXY5ssOAY0yfb8lq0SA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=q6vmcgeK; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="q6vmcgeK" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=XGC/Uaj6ejX3COzNjOnUbs+G/8tqYkyreK9oJdjYOIk=; b=q6vmcgeKRQGC9VxZzw/i1K7IM4 UNJ2yA++QL819qBrpBG/20R6ObuMeDwxlObsUy00dcyHx3CICTjepQNTTG+/gL1M5fpux2bmxw6n6 7t4UhgYo+OzeILY+IrXH2oI0Cahn8DBNp7WgjELSFjhfoWfs9IVGP/VhY7NY+jDs5EmPPaa9Y+kIz NARAxIxF7SjifSqhF6aI+7Zoi2gPZvlxQYTqJpAlmHGOzy9CjN/G+WQlyLRdP/1UGnYKCEbmo6Bxg ZKoyOC8tlp2vZxwmgJINhJlX7cYH+oDp2luJRSXAjsInk1TVs1QoqSD2+d58oT8EvC6nAam8H+IEZ urGCzVIw==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYN-00000005c3X-3GCO; Sat, 14 Dec 2024 03:10:51 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 01/11] fs/buffer: move async batch read code into a helper Date: Fri, 13 Dec 2024 19:10:39 -0800 Message-ID: <20241214031050.1337920-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain Move the code from block_read_full_folio() which does a batch of async reads into a helper. No functional changes. Signed-off-by: Luis Chamberlain --- fs/buffer.c | 73 +++++++++++++++++++++++++++++++---------------------- 1 file changed, 43 insertions(+), 30 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index cc8452f60251..580451337efa 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2350,6 +2350,48 @@ bool block_is_partially_uptodate(struct folio *folio, size_t from, size_t count) } EXPORT_SYMBOL(block_is_partially_uptodate); +static void bh_read_batch_async(struct folio *folio, + int nr, struct buffer_head *arr[], + bool fully_mapped, bool no_reads, + bool any_get_block_error) +{ + int i; + struct buffer_head *bh; + + if (fully_mapped) + folio_set_mappedtodisk(folio); + + if (no_reads) { + /* + * All buffers are uptodate or get_block() returned an + * error when trying to map them *all* buffers we can + * finish the read. + */ + folio_end_read(folio, !any_get_block_error); + return; + } + + /* Stage one: lock the buffers */ + for (i = 0; i < nr; i++) { + bh = arr[i]; + lock_buffer(bh); + mark_buffer_async_read(bh); + } + + /* + * Stage 2: start the IO. Check for uptodateness + * inside the buffer lock in case another process reading + * the underlying blockdev brought it uptodate (the sct fix). + */ + for (i = 0; i < nr; i++) { + bh = arr[i]; + if (buffer_uptodate(bh)) + end_buffer_async_read(bh, 1); + else + submit_bh(REQ_OP_READ, bh); + } +} + /* * Generic "read_folio" function for block devices that have the normal * get_block functionality. This is most of the block device filesystems. @@ -2414,37 +2456,8 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) arr[nr++] = bh; } while (i++, iblock++, (bh = bh->b_this_page) != head); - if (fully_mapped) - folio_set_mappedtodisk(folio); - - if (!nr) { - /* - * All buffers are uptodate or get_block() returned an - * error when trying to map them - we can finish the read. - */ - folio_end_read(folio, !page_error); - return 0; - } - - /* Stage two: lock the buffers */ - for (i = 0; i < nr; i++) { - bh = arr[i]; - lock_buffer(bh); - mark_buffer_async_read(bh); - } + bh_read_batch_async(folio, nr, arr, fully_mapped, nr == 0, page_error); - /* - * Stage 3: start the IO. Check for uptodateness - * inside the buffer lock in case another process reading - * the underlying blockdev brought it uptodate (the sct fix). - */ - for (i = 0; i < nr; i++) { - bh = arr[i]; - if (buffer_uptodate(bh)) - end_buffer_async_read(bh, 1); - else - submit_bh(REQ_OP_READ, bh); - } return 0; } EXPORT_SYMBOL(block_read_full_folio); From patchwork Sat Dec 14 03:10:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908296 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75B811401C; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145862; cv=none; b=NLphXPXRJO1cP2GadUHCVXzngjv7O04FbheJtgLAxj4rsV/5sOYsI7/LQTxqMkokKxC0ce5SbkSkD7lm5a4VxAwFrFG67MLQmKDYQ7Mnyf2OWZC+ubK9YveBKZiqh2axCajJyK0Szqh+XdpcXdz/B6C8Aw0kkZApiKEHoSaRYfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145862; c=relaxed/simple; bh=vEzp21Qt7bdEDcirqhV48PHN9Uws+42ZUaIYtjNb4MQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oifTes4WWaSRADvou6hC1epjZHeqcKpYYhxxp21Lu0/k65iGMcARlk2C0xOl8sS0nVRnZB4dn/1GEn6Guy4sz043Qxj/Z8iX5tBsQA8VzMpFsOWOEb98qMMflYcycOhUDr0IOkln9kTIVBbBy4b+KgYG43JQJNeq/kzIbrGxAPc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=y+Vot3PJ; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="y+Vot3PJ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=ly9Z0m8eTHU3wfdp35JFAoonrPj9YJ5xpu33d2EAXi4=; b=y+Vot3PJozFxtHqZ7Fp9JDbo4c jsrflL/YvKop7QmnivqVS3/9cSOrtdNg1NZt87O4y3Fo0P3YmFEh9/6/z11Ef8JQPkW11UbAml0SX dB33hD26ixC0mXy6W9q36zaEDrc+EuP2Gay4kESD2wcM+A4tuvYvCRryQz1gZrGPfX6i8p2pffDWZ DouswSptmAW/mI1zWobDfnT0kYB+yzVgSHn87hRfSub8MYVWaJdRL+WpipyflNXZIoh8fl/YQ4MMU hxiYm95uqF/bNOR2mx96TdbdT1NVHq4GImAWGKf8IHWbdAstSa0ejAnvTpz9mLIlOtxuybuU1h404 JfGexPMg==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYN-00000005c3Z-3ONS; Sat, 14 Dec 2024 03:10:51 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 02/11] fs/buffer: add a for_each_bh() for block_read_full_folio() Date: Fri, 13 Dec 2024 19:10:40 -0800 Message-ID: <20241214031050.1337920-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain We want to be able to work through all buffer heads on a folio for an async read, but in the future we want to support the option to stop before we've processed all linked buffer heads. To make code easier to read and follow adopt a for_each_bh(tmp, head) loop instead of using a do { ... } while () to make the code easier to read and later be expanded in subsequent patches. This introduces no functional changes. Signed-off-by: Luis Chamberlain Reviewed-by: Hannes Reinecke --- fs/buffer.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 580451337efa..108e1c36fc1a 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2392,6 +2392,17 @@ static void bh_read_batch_async(struct folio *folio, } } +#define bh_is_last(__bh, __head) ((__bh)->b_this_page == (__head)) + +#define bh_next(__bh, __head) \ + (bh_is_last(__bh, __head) ? NULL : (__bh)->b_this_page) + +/* Starts from the provided head */ +#define for_each_bh(__tmp, __head) \ + for ((__tmp) = (__head); \ + (__tmp); \ + (__tmp) = bh_next(__tmp, __head)) + /* * Generic "read_folio" function for block devices that have the normal * get_block functionality. This is most of the block device filesystems. @@ -2421,11 +2432,10 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) iblock = div_u64(folio_pos(folio), blocksize); lblock = div_u64(limit + blocksize - 1, blocksize); - bh = head; nr = 0; i = 0; - do { + for_each_bh(bh, head) { if (buffer_uptodate(bh)) continue; @@ -2454,7 +2464,9 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) continue; } arr[nr++] = bh; - } while (i++, iblock++, (bh = bh->b_this_page) != head); + i++; + iblock++; + } bh_read_batch_async(folio, nr, arr, fully_mapped, nr == 0, page_error); From patchwork Sat Dec 14 03:10:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908289 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB2D718AE2; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; cv=none; b=B0jByBsIAspS727ZSlHvrMvtadBoutu9MQ4M3tf0t3WtDZmyOsXbMO/bM4+kL3+t1S4+1Jepvya9aZLCrDh9Wk2ZXT8b7YDm36yMxA3ilqfQhiqg/15+s6egH/8qYl20SNWIUz7Ouxf+402S77xie224092F3XaVmvvCtNZ92Ug= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; c=relaxed/simple; bh=qIBIacVwUPG6Ngtozy/fdIO8D5rV9f+t5tzPrtzygH8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EkbdhMxq/GWtDWQt+29GfNO/r807Thy5lXUnEyRbNRFvwPXhb95ewy9kcCLlUL1vt2AzckNR5zUxgRq4aah/jmsGKpr3LYX9/jy96VYb1I4Ccf0Tpjkwic7pU4g6g/JNHyqbhrokyJKHZXjeey9N5C1NQ+oLg54wsyTUVbLg3CE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=VHdDK0Rw; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="VHdDK0Rw" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=17Q0gy/g/R6GF/kNACXWfTzKrmUaCrFTxm3DWtltqxI=; b=VHdDK0RwjffflcX0z4byIMeij+ aBC6gTkms+qYDlNfhyjSs4Iliz0FzPjXIe39juSBMwSojG8tpjFyDSN1F1sOdoq2Hbh9yXAmmtlOt usRJ2g2lmhXmusq/xZg+cKiTaGMTY+FQVlDo3OPNgLXemfBWpiTctwVZz8NZvXSWH9g8b9OeoGLXE 5H0355Vu+iAVkzf1E2XE6CSW/DqoZK/gq2UnpMD2T5zGEPbqz6W6d8F2PgzjrXvnaBVpXzpwxgwPW sLKjc5IS86MBxhWVC7oEkQMS+svl9Wx4GyxjP1KXnek3QBbi0tPj2ClewMr8FHSTRGQw/xqCPLI3r u9L99Mgg==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYN-00000005c3b-3W4y; Sat, 14 Dec 2024 03:10:51 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 03/11] fs/buffer: add iteration support for block_read_full_folio() Date: Fri, 13 Dec 2024 19:10:41 -0800 Message-ID: <20241214031050.1337920-4-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain Provide a helper to iterate on buffer heads on a folio. We do this as a preliminary step so to make the subsequent changes easier to read. Right now we use an array on stack to loop over all buffer heads in a folio of size MAX_BUF_PER_PAGE, however on CPUs where the system page size is quite larger like Hexagon with 256 KiB page size support this can mean the kernel can end up spewing spews stack growth warnings. To be able to break this down into smaller array chunks add support for processing smaller array chunks of buffer heads at a time. The used array size is not changed yet, that will be done in a subsequent patch, this just adds the iterator support and logic. While at it clarify the booleans used on bh_read_batch_async() and how they are only valid in consideration when we've processed all buffer-heads of a folio, that is when we're on the last buffer head in a folio: * bh_folio_reads * unmapped Signed-off-by: Luis Chamberlain Reviewed-by: Hannes Reinecke --- fs/buffer.c | 130 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 94 insertions(+), 36 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 108e1c36fc1a..f8e6a5454dbb 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2397,57 +2397,64 @@ static void bh_read_batch_async(struct folio *folio, #define bh_next(__bh, __head) \ (bh_is_last(__bh, __head) ? NULL : (__bh)->b_this_page) +/* Starts from a pivot which you initialize */ +#define for_each_bh_pivot(__pivot, __last, __head) \ + for ((__pivot) = __last = (__pivot); \ + (__pivot); \ + (__pivot) = bh_next(__pivot, __head), \ + (__last) = (__pivot) ? (__pivot) : (__last)) + /* Starts from the provided head */ #define for_each_bh(__tmp, __head) \ for ((__tmp) = (__head); \ (__tmp); \ (__tmp) = bh_next(__tmp, __head)) +struct bh_iter { + sector_t iblock; + get_block_t *get_block; + bool any_get_block_error; + int unmapped; + int bh_folio_reads; +}; + /* - * Generic "read_folio" function for block devices that have the normal - * get_block functionality. This is most of the block device filesystems. - * Reads the folio asynchronously --- the unlock_buffer() and - * set/clear_buffer_uptodate() functions propagate buffer state into the - * folio once IO has completed. + * Reads up to MAX_BUF_PER_PAGE buffer heads at a time on a folio on the given + * block range iblock to lblock and helps update the number of buffer-heads + * which were not uptodate or unmapped for which we issued an async read for + * on iter->bh_folio_reads for the full folio. Returns the last buffer-head we + * worked on. */ -int block_read_full_folio(struct folio *folio, get_block_t *get_block) -{ - struct inode *inode = folio->mapping->host; - sector_t iblock, lblock; - struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE]; - size_t blocksize; - int nr, i; - int fully_mapped = 1; - bool page_error = false; - loff_t limit = i_size_read(inode); - - /* This is needed for ext4. */ - if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode)) - limit = inode->i_sb->s_maxbytes; - - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - - head = folio_create_buffers(folio, inode, 0); - blocksize = head->b_size; +static struct buffer_head *bh_read_iter(struct folio *folio, + struct buffer_head *pivot, + struct buffer_head *head, + struct inode *inode, + struct bh_iter *iter, sector_t lblock) +{ + struct buffer_head *arr[MAX_BUF_PER_PAGE]; + struct buffer_head *bh = pivot, *last; + int nr = 0, i = 0; + size_t blocksize = head->b_size; + bool no_reads = false; + bool fully_mapped = false; + + /* collect buffers not uptodate and not mapped yet */ + for_each_bh_pivot(bh, last, head) { + BUG_ON(nr >= MAX_BUF_PER_PAGE); - iblock = div_u64(folio_pos(folio), blocksize); - lblock = div_u64(limit + blocksize - 1, blocksize); - nr = 0; - i = 0; - - for_each_bh(bh, head) { if (buffer_uptodate(bh)) continue; if (!buffer_mapped(bh)) { int err = 0; - fully_mapped = 0; - if (iblock < lblock) { + iter->unmapped++; + if (iter->iblock < lblock) { WARN_ON(bh->b_size != blocksize); - err = get_block(inode, iblock, bh, 0); + err = iter->get_block(inode, iter->iblock, + bh, 0); if (err) - page_error = true; + iter->any_get_block_error = true; } if (!buffer_mapped(bh)) { folio_zero_range(folio, i * blocksize, @@ -2465,10 +2472,61 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) } arr[nr++] = bh; i++; - iblock++; + iter->iblock++; + } + + iter->bh_folio_reads += nr; + + WARN_ON_ONCE(!bh_is_last(last, head)); + + if (bh_is_last(last, head)) { + if (!iter->bh_folio_reads) + no_reads = true; + if (!iter->unmapped) + fully_mapped = true; } - bh_read_batch_async(folio, nr, arr, fully_mapped, nr == 0, page_error); + bh_read_batch_async(folio, nr, arr, fully_mapped, no_reads, + iter->any_get_block_error); + + return last; +} + +/* + * Generic "read_folio" function for block devices that have the normal + * get_block functionality. This is most of the block device filesystems. + * Reads the folio asynchronously --- the unlock_buffer() and + * set/clear_buffer_uptodate() functions propagate buffer state into the + * folio once IO has completed. + */ +int block_read_full_folio(struct folio *folio, get_block_t *get_block) +{ + struct inode *inode = folio->mapping->host; + sector_t lblock; + size_t blocksize; + struct buffer_head *bh, *head; + struct bh_iter iter = { + .get_block = get_block, + .unmapped = 0, + .any_get_block_error = false, + .bh_folio_reads = 0, + }; + loff_t limit = i_size_read(inode); + + /* This is needed for ext4. */ + if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode)) + limit = inode->i_sb->s_maxbytes; + + VM_BUG_ON_FOLIO(folio_test_large(folio), folio); + + head = folio_create_buffers(folio, inode, 0); + blocksize = head->b_size; + + iter.iblock = div_u64(folio_pos(folio), blocksize); + lblock = div_u64(limit + blocksize - 1, blocksize); + + for_each_bh(bh, head) + bh = bh_read_iter(folio, bh, head, inode, &iter, lblock); return 0; } From patchwork Sat Dec 14 03:10:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908288 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C86BD1B59A; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; cv=none; b=V1FTcWhlPJZqYhKemOH4OG+wvscCIZS8dG+c8HFi/BrL15THJkJyHnw1yuUYZJBEgfUT+gZItE1bdxsnO1w2jqkY7sPLnkEq90d4pLlEnsAKKHJhJYMQnPBLnKGIOuvZocZ6Swl4xPtndovepsvSLFHGkDYGrZvVIMet4IdXb4M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; c=relaxed/simple; bh=4GsMPiZMYzBUymOr893Sq86sStZdqDjFqsXKQduu+2E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nR/iSmGMmYkQk/aT8kXAxW8UnCEPlAoaJkQWG2PDbNGykLJjezcPMv6tk4eQR13ucS0tgui8IjL3tI6WP13LE8al/HJEG0O10TGMDIezKKJL3FGgbcMLvUgYolFypjHOL7ghsv/1MdT98s6PaoR0DcQG9bKOrfkNVwYtecHCHyQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=DUwsEwV0; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="DUwsEwV0" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=7VkY/OzBoZXEa2cuwiycPcKImztiWgLu3Zlcz14G1p4=; b=DUwsEwV079VBpxwAvrs6U6I8Cu +yB5Af7X5dyJDHK+0BxH0M7EaxriNLYaJvIdNxNvPOrYxYSQH/jU6jB8GynWgvVdzNF+teFgMqOkz mjPRr02cXWTdhinHpXpBhT4dzbeSTZFB7pJuqYAnEpNKz8fKfY9zR9r1IIknCnCbVnL2MuibGND6h Zeje92OBV13LXKmVEQ6HFBf9qbrjM5QFgsqwfDPfPRBUdVSwCTMN+0mmrrEp5a7iPTJDQkb1vIBl1 IWJrME7NwGx/w9FAz1UoW5zrg8+kESNsILRCVqNAQZjRAt/wB6HzCQ9sloAPF0+rCThoEaLmed8nf eYOcYkpA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYN-00000005c3d-3dl7; Sat, 14 Dec 2024 03:10:51 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 04/11] fs/buffer: reduce stack usage on bh_read_iter() Date: Fri, 13 Dec 2024 19:10:42 -0800 Message-ID: <20241214031050.1337920-5-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain Now that we can read asynchronously buffer heads from a folio in chunks, we can chop up bh_read_iter() with a smaller array size. Use an array of 8 to avoid stack growth warnings on systems with huge base page sizes. Signed-off-by: Luis Chamberlain Reviewed-by: Hannes Reinecke --- fs/buffer.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index f8e6a5454dbb..b4994c48e6ee 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2410,7 +2410,10 @@ static void bh_read_batch_async(struct folio *folio, (__tmp); \ (__tmp) = bh_next(__tmp, __head)) +#define MAX_BUF_CHUNK 8 + struct bh_iter { + int chunk_number; sector_t iblock; get_block_t *get_block; bool any_get_block_error; @@ -2419,7 +2422,7 @@ struct bh_iter { }; /* - * Reads up to MAX_BUF_PER_PAGE buffer heads at a time on a folio on the given + * Reads up to MAX_BUF_CHUNK buffer heads at a time on a folio on the given * block range iblock to lblock and helps update the number of buffer-heads * which were not uptodate or unmapped for which we issued an async read for * on iter->bh_folio_reads for the full folio. Returns the last buffer-head we @@ -2431,16 +2434,18 @@ static struct buffer_head *bh_read_iter(struct folio *folio, struct inode *inode, struct bh_iter *iter, sector_t lblock) { - struct buffer_head *arr[MAX_BUF_PER_PAGE]; + struct buffer_head *arr[MAX_BUF_CHUNK]; struct buffer_head *bh = pivot, *last; int nr = 0, i = 0; size_t blocksize = head->b_size; + int chunk_idx = MAX_BUF_CHUNK * iter->chunk_number; bool no_reads = false; bool fully_mapped = false; /* collect buffers not uptodate and not mapped yet */ for_each_bh_pivot(bh, last, head) { - BUG_ON(nr >= MAX_BUF_PER_PAGE); + if (nr >= MAX_BUF_CHUNK) + break; if (buffer_uptodate(bh)) continue; @@ -2457,7 +2462,8 @@ static struct buffer_head *bh_read_iter(struct folio *folio, iter->any_get_block_error = true; } if (!buffer_mapped(bh)) { - folio_zero_range(folio, i * blocksize, + folio_zero_range(folio, + (i + chunk_idx) * blocksize, blocksize); if (!err) set_buffer_uptodate(bh); @@ -2476,8 +2482,7 @@ static struct buffer_head *bh_read_iter(struct folio *folio, } iter->bh_folio_reads += nr; - - WARN_ON_ONCE(!bh_is_last(last, head)); + iter->chunk_number++; if (bh_is_last(last, head)) { if (!iter->bh_folio_reads) @@ -2507,6 +2512,7 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) struct buffer_head *bh, *head; struct bh_iter iter = { .get_block = get_block, + .chunk_number = 0, .unmapped = 0, .any_get_block_error = false, .bh_folio_reads = 0, From patchwork Sat Dec 14 03:10:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908295 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9E352033A; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145861; cv=none; b=ad4nq84rog23UhlGmDsCYiHADpJyEtqOr3ZShfcGFk5WVUxAe1F7bgyRDzMTQ2ghMOil9SG1VNpTBCYMHQ+87e2xhQX1gBgVG/8pTY9yKGSvBk/jqY9d3fd0v7J1XHrR5AI41jryCy0nRusjAxrvePU8H7sIrVTNNhEZCS1bF9E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145861; c=relaxed/simple; bh=7OVdGKA9zHmiwz4sbqPuLFNrP59tUjIx+537GeT3dl4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qDBK6bQr+Q9sbKNbOLd0BeAtz035ddMO4VW+Tc0QXYPUk3bdrGoKjDsziKE5skluf0cU2m7WGsc8RYq8w8Dqd3+VK+1akA9oPYfx1kmgzA+G6Gw/TsgHM/vYpdljOVuvQ8r64TA/UQPmh/LqzdDYSNijHjdEZ+bGqo7cGM4weT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=KMN4gb6Z; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="KMN4gb6Z" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=aEpOv6XaftX3uxnh2wMZL90wJOkQQ8bIBff4iQiKj9w=; b=KMN4gb6ZeBQRXIMA2XWLrKJJTB pIrG5Ktc6U0DoQagUaj+ERQm4kWG6cF1ajtuJOl+EJiecSkCWL9GXyRhAqZ8ZxcVm5Ksv1I2pnswt r6gbCfon+8RrA6makCRyTq2s0qpzICufK3Qdj8xWAgQIDdTUDKoS4c33yXNnUZpXL4wCwSX9r/B9c hHzYuYWPTofuLhGoDbCXe3VE6cNZaovjbad0tLE8otQao0h9gB4kQjPsLXho3oceM3qHgMlt8Y4YJ mJCX7XB9nldEhu4yn8HzaTcJBc+Nt3SIbOIF0mF3fra3/ckrLLzox5hRBSOxXp+dSRHwWfcMvFi46 KK9ACr9A==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYN-00000005c3f-3maB; Sat, 14 Dec 2024 03:10:51 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 05/11] fs/mpage: use blocks_per_folio instead of blocks_per_page Date: Fri, 13 Dec 2024 19:10:43 -0800 Message-ID: <20241214031050.1337920-6-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain From: Hannes Reinecke Convert mpage to folios and associate the number of blocks with a folio and not a page. Signed-off-by: Hannes Reinecke --- fs/mpage.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/fs/mpage.c b/fs/mpage.c index 82aecf372743..eb6fee7de529 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -107,7 +107,7 @@ static void map_buffer_to_folio(struct folio *folio, struct buffer_head *bh, * don't make any buffers if there is only one buffer on * the folio and the folio just needs to be set up to date */ - if (inode->i_blkbits == PAGE_SHIFT && + if (inode->i_blkbits == folio_shift(folio) && buffer_uptodate(bh)) { folio_mark_uptodate(folio); return; @@ -153,7 +153,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) struct folio *folio = args->folio; struct inode *inode = folio->mapping->host; const unsigned blkbits = inode->i_blkbits; - const unsigned blocks_per_page = PAGE_SIZE >> blkbits; + const unsigned blocks_per_folio = folio_size(folio) >> blkbits; const unsigned blocksize = 1 << blkbits; struct buffer_head *map_bh = &args->map_bh; sector_t block_in_file; @@ -161,7 +161,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) sector_t last_block_in_file; sector_t first_block; unsigned page_block; - unsigned first_hole = blocks_per_page; + unsigned first_hole = blocks_per_folio; struct block_device *bdev = NULL; int length; int fully_mapped = 1; @@ -182,7 +182,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) goto confused; block_in_file = (sector_t)folio->index << (PAGE_SHIFT - blkbits); - last_block = block_in_file + args->nr_pages * blocks_per_page; + last_block = block_in_file + args->nr_pages * blocks_per_folio; last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits; if (last_block > last_block_in_file) last_block = last_block_in_file; @@ -204,7 +204,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) clear_buffer_mapped(map_bh); break; } - if (page_block == blocks_per_page) + if (page_block == blocks_per_folio) break; page_block++; block_in_file++; @@ -216,7 +216,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) * Then do more get_blocks calls until we are done with this folio. */ map_bh->b_folio = folio; - while (page_block < blocks_per_page) { + while (page_block < blocks_per_folio) { map_bh->b_state = 0; map_bh->b_size = 0; @@ -229,7 +229,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) if (!buffer_mapped(map_bh)) { fully_mapped = 0; - if (first_hole == blocks_per_page) + if (first_hole == blocks_per_folio) first_hole = page_block; page_block++; block_in_file++; @@ -247,7 +247,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) goto confused; } - if (first_hole != blocks_per_page) + if (first_hole != blocks_per_folio) goto confused; /* hole -> non-hole */ /* Contiguous blocks? */ @@ -260,7 +260,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) if (relative_block == nblocks) { clear_buffer_mapped(map_bh); break; - } else if (page_block == blocks_per_page) + } else if (page_block == blocks_per_folio) break; page_block++; block_in_file++; @@ -268,7 +268,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) bdev = map_bh->b_bdev; } - if (first_hole != blocks_per_page) { + if (first_hole != blocks_per_folio) { folio_zero_segment(folio, first_hole << blkbits, PAGE_SIZE); if (first_hole == 0) { folio_mark_uptodate(folio); @@ -303,10 +303,10 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) relative_block = block_in_file - args->first_logical_block; nblocks = map_bh->b_size >> blkbits; if ((buffer_boundary(map_bh) && relative_block == nblocks) || - (first_hole != blocks_per_page)) + (first_hole != blocks_per_folio)) args->bio = mpage_bio_submit_read(args->bio); else - args->last_block_in_bio = first_block + blocks_per_page - 1; + args->last_block_in_bio = first_block + blocks_per_folio - 1; out: return args->bio; @@ -385,7 +385,7 @@ int mpage_read_folio(struct folio *folio, get_block_t get_block) { struct mpage_readpage_args args = { .folio = folio, - .nr_pages = 1, + .nr_pages = folio_nr_pages(folio), .get_block = get_block, }; @@ -456,12 +456,12 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc, struct address_space *mapping = folio->mapping; struct inode *inode = mapping->host; const unsigned blkbits = inode->i_blkbits; - const unsigned blocks_per_page = PAGE_SIZE >> blkbits; + const unsigned blocks_per_folio = folio_size(folio) >> blkbits; sector_t last_block; sector_t block_in_file; sector_t first_block; unsigned page_block; - unsigned first_unmapped = blocks_per_page; + unsigned first_unmapped = blocks_per_folio; struct block_device *bdev = NULL; int boundary = 0; sector_t boundary_block = 0; @@ -486,12 +486,12 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc, */ if (buffer_dirty(bh)) goto confused; - if (first_unmapped == blocks_per_page) + if (first_unmapped == blocks_per_folio) first_unmapped = page_block; continue; } - if (first_unmapped != blocks_per_page) + if (first_unmapped != blocks_per_folio) goto confused; /* hole -> non-hole */ if (!buffer_dirty(bh) || !buffer_uptodate(bh)) @@ -536,7 +536,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc, goto page_is_mapped; last_block = (i_size - 1) >> blkbits; map_bh.b_folio = folio; - for (page_block = 0; page_block < blocks_per_page; ) { + for (page_block = 0; page_block < blocks_per_folio; ) { map_bh.b_state = 0; map_bh.b_size = 1 << blkbits; @@ -618,14 +618,14 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc, BUG_ON(folio_test_writeback(folio)); folio_start_writeback(folio); folio_unlock(folio); - if (boundary || (first_unmapped != blocks_per_page)) { + if (boundary || (first_unmapped != blocks_per_folio)) { bio = mpage_bio_submit_write(bio); if (boundary_block) { write_boundary_block(boundary_bdev, boundary_block, 1 << blkbits); } } else { - mpd->last_block_in_bio = first_block + blocks_per_page - 1; + mpd->last_block_in_bio = first_block + blocks_per_folio - 1; } goto out; From patchwork Sat Dec 14 03:10:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908290 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE22322EE5; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; cv=none; b=igTSrXjasEZ7u8OIS2jB0K2GdrWbud6IRBRxC0PNzlql6rB1Zvj+jaP0OiT9Ur5CV52vd2T7gyxl8nCNx5ETSEKJTjHo17wNquKo83TlQMksgRvEtR30sZlBnelNZdHqsfQyIRnP7g9h9A1DD5yrlTym9ojK2Ldwcahmhj+LoBk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; c=relaxed/simple; bh=qLOoquAiywgpFbIfyCc3drngjRhAf1ne3Fx8Jz2mZW4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rAw49spvd60SbUZs1FqGZK86G/o8zLEnKpMKOPeYyqjU4hwgezzZCxKsPA8KpfzclaIgjC5B/FFi2B9YczoIBR7paGJ5Dx1ctS2OhjeySC6bmd/F9CP+zL3TspIKmOzV/Wg6xEl4Q42fMNXxWU9BJ5pECunAJObE+qGt0/DVsh0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=AaqLnJlB; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="AaqLnJlB" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=fu2uu1E/hdWYAQ+oUve2n/osalBR9KfPKW7YQSw9qoM=; b=AaqLnJlBZdgAEm5L97UO99Elg9 nSwRtJ5Oh3gR4SidRU6mFt90VVz+uswm/DRZRf273o490Dg0O6HfiZRMrUUTBIgcA6OuPe89DR+Vg vaQkWlNILmOXkYnVsWtisqS9qNdmSsIeFG8chQ5k17HhmQk8rCRVwKhiO9O1nCDo7QkMNiFt7Yh5J uPSFIk/Lz/RQS3axKLBxUkRTvMoMh+lSyCmjjB904f1VhaIBtKABlUHxaYW8D8E1rb5iszDDWVX1Y HPuyapGGhV0myArKMLDbpY+SLDfnwInKpoU2vRMF5hG0ZLB07t0T2nEWCJx2uFiVKZiqy6/VE3B2q 0Mw72q5Q==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYN-00000005c3h-3us2; Sat, 14 Dec 2024 03:10:51 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org, Hannes Reinecke Subject: [RFC v2 06/11] fs/mpage: avoid negative shift for large blocksize Date: Fri, 13 Dec 2024 19:10:44 -0800 Message-ID: <20241214031050.1337920-7-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain From: Hannes Reinecke For large blocksizes the number of block bits is larger than PAGE_SHIFT, so use instead use folio_pos(folio) >> blkbits to calculate the sector number. With this in place we can now enable large folios on with buffer-heads. Signed-off-by: Hannes Reinecke --- fs/mpage.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/mpage.c b/fs/mpage.c index eb6fee7de529..c6bb2a9706a1 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -181,7 +181,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) if (folio_buffers(folio)) goto confused; - block_in_file = (sector_t)folio->index << (PAGE_SHIFT - blkbits); + block_in_file = folio_pos(folio) >> blkbits; last_block = block_in_file + args->nr_pages * blocks_per_folio; last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits; if (last_block > last_block_in_file) @@ -527,7 +527,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc, * The page has no buffers: map it to disk */ BUG_ON(!folio_test_uptodate(folio)); - block_in_file = (sector_t)folio->index << (PAGE_SHIFT - blkbits); + block_in_file = folio_pos(folio) >> blkbits; /* * Whole page beyond EOF? Skip allocating blocks to avoid leaking * space. From patchwork Sat Dec 14 03:10:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908291 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95B5718052; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145861; cv=none; b=Je6gU6e6BZicQhQ3FaWEy56Pach+QS6A7lWQZSihbmcGUJQPdlAvB7BJt74JPaZAM5Ft8nnhY2Zt8HQOB80USOvhDog6gIZbVqCRMqH7baTGhwl2bG++uFRuok/67GqcbK/8PXAzZUoHHfixanlLBM7xWTz4SPspBikXtXMed/4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145861; c=relaxed/simple; bh=BgjdvO26G8eQuOo3VdDz0/aZlJjB533efwx99NspO9I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lxs+9Qk0XkZXmOg1RQJk6dPQkfoRTtavrpoEjoIex0fMI7mLC6OCkdE78shDThmId92bbuiL1l6rnuV8Vfz1EotBfClNFEpsb7fK1qBV1fmphEuJ5EVTitRf6hRenyGTtZRDIyEpiRX20m2RPZqGDGr3JsGpf7aPTXsLoRsMxK0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=QyQwMKci; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="QyQwMKci" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=sbicKkEIU1GP8SA/RwQPAcQQ45CmcRGRebxtvNHMUlw=; b=QyQwMKci1SqqFVKwrphSBzYf7/ Kpt9D/juXjczuPs656dIQuZkABPk6iOh6ydyQfGcB5HaXY2bllFf74fF0ENUyoB2UlRvo3XSz0KEz mxuqbJyUulQ5iwka2He1evk8XYb4OsjtZJXxyiBYzHqFBcLMrPHH3yi2xLBo3cR2aEmq+hISCQM91 dhM6+dSdz+2+MMZXeGS1uRFB+JO0vyRAwGA2jWCWZExWukPikTyRnjV38M32dXiAuPD/YxikEzEfQ ECcMOBlpHDPO9S7qSoYieXOOvRSx5o9fQ+doAZwsSFn7gbseriWxxJgXbzDRPGPhKQ+3fk4VzrNdi htcjBV+g==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYN-00000005c3j-42bm; Sat, 14 Dec 2024 03:10:51 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 07/11] fs/buffer fs/mpage: remove large folio restriction Date: Fri, 13 Dec 2024 19:10:45 -0800 Message-ID: <20241214031050.1337920-8-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain Now that buffer-heads has been converted over to support large folios we can remove the built-in VM_BUG_ON_FOLIO() checks which prevents their use. Reviewed-by: Hannes Reinecke Signed-off-by: Luis Chamberlain --- fs/buffer.c | 2 -- fs/mpage.c | 3 --- 2 files changed, 5 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index b4994c48e6ee..4296bfb06fb1 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2523,8 +2523,6 @@ int block_read_full_folio(struct folio *folio, get_block_t *get_block) if (IS_ENABLED(CONFIG_FS_VERITY) && IS_VERITY(inode)) limit = inode->i_sb->s_maxbytes; - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - head = folio_create_buffers(folio, inode, 0); blocksize = head->b_size; diff --git a/fs/mpage.c b/fs/mpage.c index c6bb2a9706a1..c1b85be8df64 100644 --- a/fs/mpage.c +++ b/fs/mpage.c @@ -170,9 +170,6 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args) unsigned relative_block; gfp_t gfp = mapping_gfp_constraint(folio->mapping, GFP_KERNEL); - /* MAX_BUF_PER_PAGE, for example */ - VM_BUG_ON_FOLIO(folio_test_large(folio), folio); - if (args->is_readahead) { opf |= REQ_RAHEAD; gfp |= __GFP_NORETRY | __GFP_NOWARN; From patchwork Sat Dec 14 03:10:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908287 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B93F8182BC; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; cv=none; b=el2yTXyltcTWCFKcXNNbPBGpjRbDQP4Tk7iJfBpctM6xFTVAM/6Mwzy9i/UWR9ASCI8LzkvmD8+ncTyNSxTY390uEW4o8WSf5F205VdG5MUCWVNHe0tatJh7BKq6PcufWhoLzO01CKrNTjdcYv+tRJggyZHfgY7YKVAx3ce4xwY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; c=relaxed/simple; bh=SsLwRYjDnZhLjFEWH6rmXr0rjN+JdjyyGgFfERa/yfM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BX727nW4lRvlH3dp3v2w3lH8MBl7IN0vGPuqblgIWVfENEbMFDUaTjPGfM0ioZSjv/xVRWa8lW1HocpJtN3igZAxayumlrJu9Zp+tLZ/Su9Z/6WENZLg1WuLTJsPbeynk2lmdftJAS5nY1cBjoTCMhsKZeCXRzMELuvrnAvV4r8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=rBgMS/ly; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="rBgMS/ly" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=rV5AoYJPlny/7wnGoBOtLs3pAmaEJ9e3y3MRINCsK0A=; b=rBgMS/lyjy3Ti+VMrwe1JNxiA8 eR0POwBQ7F9I7u0BreYHGzLgGHYo5au1bZKC4A+0HUDXIRpEPtKNdKPl/hg3NYWSWTTwsbxN1CFmk u4en09SHB4VOE44Qb3533cdPHpRkd9NUC5ozc6o1GJdedziqFC57uH9xNYHFqkHE3dHjwfOGvIIeK 7nV7Nug+EBDp5bnSB4J2mRCG6dhkXiUkNic0zADTIySpUwYdJqcqKKaesr25P1UGq21cts4hwc9ib mtMiz46VjuC3+q9iEz0GBO8iSLgOZcuy4HAHH1DYHuDv32yDA8/NKuGE2fj3/LwlDnmWlwrJ8d5SA fNZXHcCg==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYN-00000005c3n-4A6d; Sat, 14 Dec 2024 03:10:51 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 08/11] block/bdev: enable large folio support for large logical block sizes Date: Fri, 13 Dec 2024 19:10:46 -0800 Message-ID: <20241214031050.1337920-9-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain From: Hannes Reinecke Call mapping_set_folio_min_order() when modifying the logical block size to ensure folios are allocated with the correct size. Signed-off-by: Hannes Reinecke --- block/bdev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/bdev.c b/block/bdev.c index 738e3c8457e7..167d82b46781 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -148,6 +148,8 @@ static void set_init_blocksize(struct block_device *bdev) bsize <<= 1; } BD_INODE(bdev)->i_blkbits = blksize_bits(bsize); + mapping_set_folio_min_order(BD_INODE(bdev)->i_mapping, + get_order(bsize)); } int set_blocksize(struct file *file, int size) @@ -170,6 +172,7 @@ int set_blocksize(struct file *file, int size) if (inode->i_blkbits != blksize_bits(size)) { sync_blockdev(bdev); inode->i_blkbits = blksize_bits(size); + mapping_set_folio_min_order(inode->i_mapping, get_order(size)); kill_bdev(bdev); } return 0; From patchwork Sat Dec 14 03:10:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908285 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 944C417BA0; Sat, 14 Dec 2024 03:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; cv=none; b=HNGm8YBifu1YP+dv0xH5j/nuthycCFCvZtDJ6qAXXBWU8UINzQGw+t9vUCDWD5EII2Z1BJzQa5l7b5LjJnDTVdsTOsDfnJVtUIvGDa11stCtrvj/oYgsweLvIrs6iitAJGOvqajYDKg5+94QdSY/QwmarYqkW81qnmsOtyUINls= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145860; c=relaxed/simple; bh=P3uVfLlVLJfHe4TSb4QwF1TtpfSmrcrLSBF0vRF6L7U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AuWxW44nlSOTV4BV1Rcc7y4PJ61/xqdOCIhMPkvwIp3123AQeSOQMBLa58M86jVv+VX2TaUcpevNzF2soltlrdKI7w6Q1Jq1/1iVL/cCuutyMy3wiKNF4qmfdHMvJgJQKHrporNUphQbi946dvB5sqzk4h2CQ92mBUWS4dlyqok= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=qL6vJdTy; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="qL6vJdTy" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=nLfCbW88XZtUGGKLjWBTHyDCe42vAPNXKRBtJR7DxnY=; b=qL6vJdTyzDGrY9BTzxUAOzoxWR GmyxjXs/ypIzQ4OBwJwjxzVHxTYl5WowwA8PlpzdvxsYCfAWKCxVKGzklNbFg+j9lyRrusBf2H2bo GLkVMnOTVZV/lpZd85GFZ1MVi81ANkvcUeR+JJE+WC+Wr3Z0gWG3fCONYfibpZ6l3jyduDgskVKBw zGXlDz4JQevtASwNKCWNnNAOsApI5OkuvhXNNNIaNtVIY0GOanrmRiewmyM4XTWoW+VYaF9Ch6GO/ jAb+ekAcuFvILM50r3oh8PDu2OVvOax56B8d0fV1YdR3Auj0Fd26/gnSmPf68M1AcX68Sws/msj8r Mx/t0qRA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYO-00000005c3p-05r5; Sat, 14 Dec 2024 03:10:52 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 09/11] block/bdev: lift block size restrictions and use common definition Date: Fri, 13 Dec 2024 19:10:47 -0800 Message-ID: <20241214031050.1337920-10-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain We now can support blocksizes larger than PAGE_SIZE, so lift the restriction up to the max supported page cache order and just bake this into a common helper used by the block layer. We bound ourselves to 64k as a sensible limit. The hard limit, however is 1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER). Signed-off-by: Luis Chamberlain Reviewed-by: Hannes Reinecke Reviewed-by: John Garry --- block/bdev.c | 5 ++--- include/linux/blkdev.h | 11 ++++++++++- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index 167d82b46781..b57dc4bff81b 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -157,8 +157,7 @@ int set_blocksize(struct file *file, int size) struct inode *inode = file->f_mapping->host; struct block_device *bdev = I_BDEV(inode); - /* Size must be a power of two, and between 512 and PAGE_SIZE */ - if (size > PAGE_SIZE || size < 512 || !is_power_of_2(size)) + if (blk_validate_block_size(size)) return -EINVAL; /* Size cannot be smaller than the size supported by the device */ @@ -185,7 +184,7 @@ int sb_set_blocksize(struct super_block *sb, int size) if (set_blocksize(sb->s_bdev_file, size)) return 0; /* If we get here, we know size is power of two - * and it's value is between 512 and PAGE_SIZE */ + * and its value is larger than 512 */ sb->s_blocksize = size; sb->s_blocksize_bits = blksize_bits(size); return sb->s_blocksize; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 08a727b40816..a7303a55ed2a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -269,10 +269,19 @@ static inline dev_t disk_devt(struct gendisk *disk) return MKDEV(disk->major, disk->first_minor); } +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/* + * The hard limit is (1 << (PAGE_SHIFT + MAX_PAGECACHE_ORDER). + */ +#define BLK_MAX_BLOCK_SIZE (SZ_64K) +#else +#define BLK_MAX_BLOCK_SIZE (PAGE_SIZE) +#endif + /* blk_validate_limits() validates bsize, so drivers don't usually need to */ static inline int blk_validate_block_size(unsigned long bsize) { - if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize)) + if (bsize < 512 || bsize > BLK_MAX_BLOCK_SIZE || !is_power_of_2(bsize)) return -EINVAL; return 0; From patchwork Sat Dec 14 03:10:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908292 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B63E926AC3; Sat, 14 Dec 2024 03:10:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145861; cv=none; b=EWjeYDP+JmO7eKyMU7O4nUtMqQlGyjImFJY0vaG81Cb+RZAEp30sX3aKd7E4iAC0ddQ1BmPOXfNoXaeh0WgY18QlP0SDOqFmEKGNPY80k0VjpGcCiOFDxROsTLlqvjB8R7Mrt4FspkaNtLoxG7oE8yZj0YKtiCsjf0LCB6NuK7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145861; c=relaxed/simple; bh=mH4uJDHRT/XfJAI6xVPmuifowd7/VbSotp3ctbAhHE0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YW+YDrSbIi/5zg+grp7FyA+tKT75c0G0Qz7jll0rJgy1WrBimxiA/faIAyn2Ynxer/wevSSJGOXYSLg233e3P835df8xRgVRWLIOHlgl0CdUV6k/YSLuMvk+ormXzx3owpPniVMJ/WNivjyWIu53Z8BkQ0Z+eTUsdJaTaAuwRwc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=1EBbiPVm; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="1EBbiPVm" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=SC13mhGE1TKS2Lk87NYZVfEBxxs49zvhyYsSfa79OZk=; b=1EBbiPVmmO/i62EdqjkG8nBcyi k/algK3tiiXr1Br1Uk9Qt9p/uY5q0O43rcQZuE5e6kuBKV2SWOEt/KqsMaIQY8aN1wO/D2NHKMlEJ 2+BNxHzVQUme6o6AFjNh5P1B87JL3yLWk9CzE9EYwujJ7MO+6hQx0giICBsCV8eymOQsYUKIy/d8m NYsfTm4ZQ1Hs/vrfv18SZK9QPKW+R5YpL1pEDfUKo1BRlZUL1vdwtlvgETaVFB4OUYozDByc6viYJ Q+1N9a4ZD2iUqPcfZdrG2s1dHK9WAdLBpCkOVJcGl4BLCn+tts5iT/ZXrC6R+O9AOQrOKZ7DuAaKS rE/RqCUw==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYO-00000005c3r-0E3S; Sat, 14 Dec 2024 03:10:52 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 10/11] nvme: remove superfluous block size check Date: Fri, 13 Dec 2024 19:10:48 -0800 Message-ID: <20241214031050.1337920-11-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain The block layer already validates proper block sizes with blk_validate_block_size() for us so we can remove this now superfluous check. Reviewed-by: Hannes Reinecke Signed-off-by: Luis Chamberlain --- drivers/nvme/host/core.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d169a30eb935..bbb5e9d2415c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2029,16 +2029,6 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id, u32 atomic_bs, phys_bs, io_opt = 0; bool valid = true; - /* - * The block layer can't support LBA sizes larger than the page size - * or smaller than a sector size yet, so catch this early and don't - * allow block I/O. - */ - if (head->lba_shift > PAGE_SHIFT || head->lba_shift < SECTOR_SHIFT) { - bs = (1 << 9); - valid = false; - } - atomic_bs = phys_bs = bs; if (id->nabo == 0) { /* From patchwork Sat Dec 14 03:10:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Chamberlain X-Patchwork-Id: 13908293 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B73F25765; Sat, 14 Dec 2024 03:10:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145861; cv=none; b=GKS5G+3FEEw0IwFPyVwu3FeUQXIElOxuQf3LgCsXPEmYIJx2NETaG5zrzIl8sxhrXZ3cNvnZfgDYS6ZzhO72RdsAFYvKCzUl114GKt2uyA40GX4ZKLQHd6UIiUme9q4qJTrw4Q16CqwKjO2Gfz9ffp+Bcg0QnMPgWJ+GEUXwsKg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734145861; c=relaxed/simple; bh=xXw6E+HkHMk+5jlnDPXJyxIJ7c9aWW7APmbJz64rOq8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hk8Dmb7juHtfU+DQ6BQYe8ddrZLL4s+c2LgcHXjJ/gJJu7S7GQvNcF6DNSHlIVAcNYSQvvfT9z2VRP0OQ+11eqQbIxdVLg1cOECS1djdLI8Z9i0vynyNbvuYmzQB1RUhrHFmyHJsf7PEjzrvK2lsszfx4N5fiGqWTKWFyhcCM2s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=rqjr9k1z; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="rqjr9k1z" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=QwW6+QbJTlSFLC3FzzKGoyGuJ59VODNiRzXn4st2nlY=; b=rqjr9k1z6khTpfp/RfwgO2EvnD k52jL8QOpAlYrZREH8RLqgl823+evliu61hP3EYJT2SyxbKDjFF6QIbCTLEXl6zNtz7rtY9ibW1XX 9G53kXlos5RYGUGI0eTPtAixrTTPmMTWq9gMwJOIXzq27iu+8EcpEKpwuV4Y5e5HX8Iu0HoNtW9Hh 3suyN6IBf1BSA7JSl8RBtjp2PWaw+GXqawh3KLsERTN1DmlWP7edWp622G5Pif3PYfCeqLljzTire 7wzDzx+BFmYz7jXqjtZF0ZogUuuKCv57HOKzkkML/yQwXYfR4DRqneo5wBLssOM58W7xyPgGpl+eg 8Qvtpm0A==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1tMIYO-00000005c3t-0MSX; Sat, 14 Dec 2024 03:10:52 +0000 From: Luis Chamberlain To: willy@infradead.org, hch@lst.de, hare@suse.de, dave@stgolabs.net, david@fromorbit.com, djwong@kernel.org Cc: john.g.garry@oracle.com, ritesh.list@gmail.com, kbusch@kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, gost.dev@samsung.com, p.raghav@samsung.com, da.gomez@samsung.com, kernel@pankajraghav.com, mcgrof@kernel.org Subject: [RFC v2 11/11] bdev: use bdev_io_min() for statx block size Date: Fri, 13 Dec 2024 19:10:49 -0800 Message-ID: <20241214031050.1337920-12-mcgrof@kernel.org> X-Mailer: git-send-email 2.47.1 In-Reply-To: <20241214031050.1337920-1-mcgrof@kernel.org> References: <20241214031050.1337920-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Sender: Luis Chamberlain You can use lsblk to query for a block device block device block size: lsblk -o MIN-IO /dev/nvme0n1 MIN-IO 4096 The min-io is the minimum IO the block device prefers for optimal performance. In turn we map this to the block device block size. The current block size exposed even for block devices with an LBA format of 16k is 4k. Likewise devices which support 4k LBA format but have a larger Indirection Unit of 16k have an exposed block size of 4k. This incurs read-modify-writes on direct IO against devices with a min-io larger than the page size. To fix this, use the block device min io, which is the minimal optimal IO the device prefers. With this we now get: lsblk -o MIN-IO /dev/nvme0n1 MIN-IO 16384 And so userspace gets the appropriate information it needs for optimal performance. This is verified with blkalgn against mkfs against a device with LBA format of 4k but an NPWG of 16k (min io size) mkfs.xfs -f -b size=16k /dev/nvme3n1 blkalgn -d nvme3n1 --ops Write Block size : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 0 | | 8192 -> 16383 : 0 | | 16384 -> 32767 : 66 |****************************************| 32768 -> 65535 : 0 | | 65536 -> 131071 : 0 | | 131072 -> 262143 : 2 |* | Block size: 14 - 66 Block size: 17 - 2 Algn size : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 0 | | 8192 -> 16383 : 0 | | 16384 -> 32767 : 66 |****************************************| 32768 -> 65535 : 0 | | 65536 -> 131071 : 0 | | 131072 -> 262143 : 2 |* | Algn size: 14 - 66 Algn size: 17 - 2 Signed-off-by: Luis Chamberlain --- block/bdev.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index b57dc4bff81b..b1be720bd485 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -1277,9 +1277,6 @@ void bdev_statx(struct path *path, struct kstat *stat, struct inode *backing_inode; struct block_device *bdev; - if (!(request_mask & (STATX_DIOALIGN | STATX_WRITE_ATOMIC))) - return; - backing_inode = d_backing_inode(path->dentry); /* @@ -1306,6 +1303,8 @@ void bdev_statx(struct path *path, struct kstat *stat, queue_atomic_write_unit_max_bytes(bd_queue)); } + stat->blksize = bdev_io_min(bdev); + blkdev_put_no_open(bdev); }