From patchwork Mon Jun 5 01:31:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13266833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2439BC7EE2D for ; Mon, 5 Jun 2023 01:32:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232586AbjFEBcQ (ORCPT ); Sun, 4 Jun 2023 21:32:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230193AbjFEBcN (ORCPT ); Sun, 4 Jun 2023 21:32:13 -0400 Received: from mail-oi1-x22f.google.com (mail-oi1-x22f.google.com [IPv6:2607:f8b0:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 307E7A1; Sun, 4 Jun 2023 18:32:12 -0700 (PDT) Received: by mail-oi1-x22f.google.com with SMTP id 5614622812f47-38dec65ab50so3705173b6e.2; Sun, 04 Jun 2023 18:32:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685928731; x=1688520731; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TxrzfIp1bwlxrggScZU2c1c7t8HyvxqKo5xx2x+BHzs=; b=ZhXeTtpuVcgvVDbY1dfOup6Ldk16IIubJUXqGucRD5kz4YgZ4tAqh3Faon7A0meR+u wqle3fVsQfPYcjp9erTTIZg4eBEBysJK6kZ6ixGe2yhAq2aRBdRXhgd9U39uwzxR5E3x 0T0bcXnb/G14/ER9lH8MVvSSvIuoIJWVZz4m+gqEgKclO4WJKuWrT0JAbftrhQ5s41Sd uC9My94hDSd5CAKh65bXlJWCSanDCw4cUFl3wn+oBhsvo/r1DTd/UTCQstaTrCu0nYUV TC6lhOvc+DQzEck16laapZXAekCX/jVk8/T0UOd5ZxZI7ixGeFX2oYqgSNKxDC464e8N vqPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685928731; x=1688520731; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TxrzfIp1bwlxrggScZU2c1c7t8HyvxqKo5xx2x+BHzs=; b=jZe6Y1kp5AvEnUa0XnjsJBZ6V2iGcwWmQSgZK5Eq5LbOQEkj/pP+wZwNgxAgrUtCZM dBzRlN47iA5mNhcSjGEdrnRYzHgGgfluW8H9BFeNvlP567b/AwSk/8ga4eeINrwBmd8K i/5y07qHjEtNGrZzZMfvKyIPfdBjbm5eLXYQISVn/bGW6BD/Gbc297AHvaUENuKxgifr A5lvAcwnNRIh+Eyf6XLAhhQo9qYKi5L1pDzLEM/IH7rE4lW83HTMg8a4CN1rejaa7Q1Y x6GQ/lrHf7oiy1fU5TAYnjbAxRaJe5ny769w/3REJBbBK7YJnMAy8oQLzZP3z9nETCu3 vURA== X-Gm-Message-State: AC+VfDymBLkpZwv/wSqwhWkgbV027ZnQyJCyiZWZoQub4OrhZY9sm/GR hi4uO0QHM+PJnX4kyjMool5UZzTDADI= X-Google-Smtp-Source: ACHHUZ6RYWiSVkkZ4VhODX4Mhf1YOWaKUEbzP6jNZkPQI6Q5WgZjsqbNR2QuoYYGXIr9vPq9jrfS7w== X-Received: by 2002:a05:6808:288f:b0:39a:967e:b962 with SMTP id eu15-20020a056808288f00b0039a967eb962mr4653025oib.33.1685928730963; Sun, 04 Jun 2023 18:32:10 -0700 (PDT) Received: from dw-tp.ihost.com ([49.207.220.159]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c14d00b001aaec7a2a62sm5209287plj.188.2023.06.04.18.32.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jun 2023 18:32:10 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , Dave Chinner , Brian Foster , Christoph Hellwig , Andreas Gruenbacher , Ojaswin Mujoo , Disha Goel , "Ritesh Harjani (IBM)" Subject: [PATCHv6 1/5] iomap: Rename iomap_page_create/release() to iomap_iop_alloc/free() Date: Mon, 5 Jun 2023 07:01:48 +0530 Message-Id: <9982c97c646a4a970340b67ccfc96bdb2c981b3f.1685900733.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This patch renames the iomap_page_create/release() functions to iomap_iop_alloc/free() calls. Later patches adds more functions for handling iop structure with iomap_iop_** naming conventions. Hence iomap_iop_alloc/free() makes more sense to be consistent with all APIs. Signed-off-by: Ritesh Harjani (IBM) --- fs/iomap/buffered-io.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 063133ec77f4..4567bdd4fff9 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -43,8 +43,8 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio) static struct bio_set iomap_ioend_bioset; -static struct iomap_page * -iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) +static struct iomap_page *iomap_iop_alloc(struct inode *inode, + struct folio *folio, unsigned int flags) { struct iomap_page *iop = to_iomap_page(folio); unsigned int nr_blocks = i_blocks_per_folio(inode, folio); @@ -69,7 +69,7 @@ iomap_page_create(struct inode *inode, struct folio *folio, unsigned int flags) return iop; } -static void iomap_page_release(struct folio *folio) +static void iomap_iop_free(struct folio *folio) { struct iomap_page *iop = folio_detach_private(folio); struct inode *inode = folio->mapping->host; @@ -231,7 +231,7 @@ static int iomap_read_inline_data(const struct iomap_iter *iter, if (WARN_ON_ONCE(size > iomap->length)) return -EIO; if (offset > 0) - iop = iomap_page_create(iter->inode, folio, iter->flags); + iop = iomap_iop_alloc(iter->inode, folio, iter->flags); else iop = to_iomap_page(folio); @@ -269,7 +269,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, return iomap_read_inline_data(iter, folio); /* zero post-eof blocks as the page may be mapped */ - iop = iomap_page_create(iter->inode, folio, iter->flags); + iop = iomap_iop_alloc(iter->inode, folio, iter->flags); iomap_adjust_read_range(iter->inode, folio, &pos, length, &poff, &plen); if (plen == 0) goto done; @@ -490,7 +490,7 @@ bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags) */ if (folio_test_dirty(folio) || folio_test_writeback(folio)) return false; - iomap_page_release(folio); + iomap_iop_free(folio); return true; } EXPORT_SYMBOL_GPL(iomap_release_folio); @@ -507,12 +507,12 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len) if (offset == 0 && len == folio_size(folio)) { WARN_ON_ONCE(folio_test_writeback(folio)); folio_cancel_dirty(folio); - iomap_page_release(folio); + iomap_iop_free(folio); } else if (folio_test_large(folio)) { /* Must release the iop so the page can be split */ WARN_ON_ONCE(!folio_test_uptodate(folio) && folio_test_dirty(folio)); - iomap_page_release(folio); + iomap_iop_free(folio); } } EXPORT_SYMBOL_GPL(iomap_invalidate_folio); @@ -559,7 +559,8 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, return 0; folio_clear_error(folio); - iop = iomap_page_create(iter->inode, folio, iter->flags); + iop = iomap_iop_alloc(iter->inode, folio, iter->flags); + if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) return -EAGAIN; @@ -1612,7 +1613,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct inode *inode, struct folio *folio, u64 end_pos) { - struct iomap_page *iop = iomap_page_create(inode, folio, 0); + struct iomap_page *iop = iomap_iop_alloc(inode, folio, 0); struct iomap_ioend *ioend, *next; unsigned len = i_blocksize(inode); unsigned nblocks = i_blocks_per_folio(inode, folio); From patchwork Mon Jun 5 01:31:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13266834 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 651E1C7EE2F for ; Mon, 5 Jun 2023 01:32:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232624AbjFEBcU (ORCPT ); Sun, 4 Jun 2023 21:32:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232604AbjFEBcS (ORCPT ); Sun, 4 Jun 2023 21:32:18 -0400 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38D34DA; Sun, 4 Jun 2023 18:32:15 -0700 (PDT) Received: by mail-oi1-x232.google.com with SMTP id 5614622812f47-39a3f26688bso3596689b6e.2; Sun, 04 Jun 2023 18:32:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685928734; x=1688520734; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ATOgTCD4yzJoITjwwVO7z1jeNiclQOzGdaaC13HT7dU=; b=iXV8TJjwPfdvLP5q/uHJcwt9ZqxB6wVvQ1qZho/hZHuQPu7c1cbZRvos/993vdKQCs SbffgKS6J016HqbmIBV/SbpiDPjjBSRQ3l8LjtBo3w4d/09q28YguMsRV7tfy10T4qPv virb/r4FrTzg5rlSlObfp1+SnX5oljTbE4JCoWjm5/t0hujITD43kyTNpndFNr9sjiuc 6zZm9+zRowSUV8ZYrYo7xC8JOvOhUN531CeObiv5htYgVxXTU3jMA6YY9FrNS7ojs2TR ZwJP1j78WWAuun+7oFoDvtvF3DGlpHpi3XDGaSZYk+ew0LsA6P9n0IEVTRTtxRwWy8C3 O8hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685928734; x=1688520734; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ATOgTCD4yzJoITjwwVO7z1jeNiclQOzGdaaC13HT7dU=; b=VIEVHMl5Hwuwkvu+8CKLQvi9hAMKjdymcnY5vETJzlNx4XzIuS1GdqAwKRfgJyetSo 72l6RfGUcKezBMrSDf4hYDFGvJjomqniHlncu0Ncx5SBgCyPTfWUopS1KdDnmuVQG8JN u9E9f/ecTG+nfZ6Gf1Z3x6CLPaqYDqPCAwOzQ1OUpZAMk7w+eMWdXMr5sb+1pl9e+YiS zjUud0oyk4dYDtdowF/UVJdBhlRKXon1tiVAL0JFwboSxgfu31tOJ6NtjSAV/fZoINtO hqRuJFtre0eCrlNtcREL+FfG0Xtdqe2wySaoJ18BlyapO5dVTWQJsAlvp2oqi59vUu2x IS6g== X-Gm-Message-State: AC+VfDzcYvBeznAHmZLpRKDJpRT+XrAMtWgf/BlOS+C4w5j8+YS7nrQt Hl8O+pvWx8+k6FduiJZboE6b1wBzR7M= X-Google-Smtp-Source: ACHHUZ717vkQBUTY8Nl9I3WEOE+oUuvaq5Nl0cCYBk34bNtx+NVhLBJmn3yVTvL8PNldmDU79u3NuA== X-Received: by 2002:a54:4002:0:b0:398:4465:ed25 with SMTP id x2-20020a544002000000b003984465ed25mr6837200oie.37.1685928734182; Sun, 04 Jun 2023 18:32:14 -0700 (PDT) Received: from dw-tp.ihost.com ([49.207.220.159]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c14d00b001aaec7a2a62sm5209287plj.188.2023.06.04.18.32.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jun 2023 18:32:13 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , Dave Chinner , Brian Foster , Christoph Hellwig , Andreas Gruenbacher , Ojaswin Mujoo , Disha Goel , "Ritesh Harjani (IBM)" Subject: [PATCHv6 2/5] iomap: Move folio_detach_private() in iomap_iop_free() to the end Date: Mon, 5 Jun 2023 07:01:49 +0530 Message-Id: <4b57e8bf317c1d08c9a44dca5fa4290d213bd004.1685900733.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In later patches we will add other accessor APIs which will take inode and folio to operate over struct iomap_page. Since we need folio's private (iomap_page) in those functions, hence this function moves detaching of folio's private at the end just before calling kfree(iop). Signed-off-by: Ritesh Harjani (IBM) --- fs/iomap/buffered-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 4567bdd4fff9..6fffda355c45 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -71,7 +71,7 @@ static struct iomap_page *iomap_iop_alloc(struct inode *inode, static void iomap_iop_free(struct folio *folio) { - struct iomap_page *iop = folio_detach_private(folio); + struct iomap_page *iop = to_iomap_page(folio); struct inode *inode = folio->mapping->host; unsigned int nr_blocks = i_blocks_per_folio(inode, folio); @@ -81,6 +81,7 @@ static void iomap_iop_free(struct folio *folio) WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending)); WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) != folio_test_uptodate(folio)); + folio_detach_private(folio); kfree(iop); } From patchwork Mon Jun 5 01:31:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13266836 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 135E2C7EE2E for ; Mon, 5 Jun 2023 01:32:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231308AbjFEBc3 (ORCPT ); Sun, 4 Jun 2023 21:32:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232613AbjFEBcU (ORCPT ); Sun, 4 Jun 2023 21:32:20 -0400 Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 958F8CA; Sun, 4 Jun 2023 18:32:18 -0700 (PDT) Received: by mail-qk1-x731.google.com with SMTP id af79cd13be357-75d4094f9baso134502185a.1; Sun, 04 Jun 2023 18:32:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685928737; x=1688520737; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1L+oiSZSqnkbRDURfR6VZ5Tph30FYeliu+whZymZoMQ=; b=K5Fq2USFRb94LNjo/FizkaukbnHSnAdsoK4XnnIDJZy+G9+fhX2Ck0nhXhUEzwQlXV tnpmfbPAkk7o9E60uVlmtlxmxrrsobyMf9Up5h3P/HP5W+S7ts+z4snmEo1DWfDhIJh0 Fw1rZayf1kwFKGTlyIoTpYWhgg3aPwvKbAFrzRykCUoqIhy+SmRF4YHduEJY0u0Qhe2D 9jdNEtpE4n3ytP/LU2Qa6FSoLGMUwtNMlADZdcH20mju5onczw1K33e7JO35lpaHwg8F 4+6NGXGfvFMptaXhrKijvP9FAN8Od5HQGuR6h4AjEiewRMmolj7iaRkbZ0caPZP6KUjl Jvkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685928737; x=1688520737; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1L+oiSZSqnkbRDURfR6VZ5Tph30FYeliu+whZymZoMQ=; b=erXTmPs0052q1TFSecEdcNxY34QwQsjGSwAmRpOmuCfEXRimUUvXb+bvwqh/7y4GKA XmtkiI9MMm0sHXwQoQer/ZKZKHb9HRRSoRSMLdtbe0q8gks1Wg5ctR8o8k6SN/X9wwAn /uvzMjwhy9Hi1kI68Ohd+RxeoKbG7GDbmsgcHi269eRtQyWcR6bjACV+NkO2NI/ot7cn uVyCibLQmTzTzDdw+FTmZugsFlO74xvwmwJbA8sMc5W9dNTFMMd90hl+ePJzLQVGaLCU VBIwJ1thE1VGqpxi2JQZOSJDBdwiDU4g5qx0MRFRptCHUz2ad/onlt+aDGvkbEwYRQQ1 B9zg== X-Gm-Message-State: AC+VfDyrzTgXWgoAb3sbgTBmHx9iyDyZtGRYLSZog/JrYr6anOqIvMs1 4bCN15VbFbCmtYWGYutPcbR11KXHU34= X-Google-Smtp-Source: ACHHUZ7jmtD2YNQPmfw3SPrO6mG9OIxC/YbRyvnjV0lVgfrlRMgUpUdOl4H2QLj1DMMpNeL7qHceQA== X-Received: by 2002:a05:6214:4118:b0:625:aa48:e625 with SMTP id kc24-20020a056214411800b00625aa48e625mr6301208qvb.53.1685928737404; Sun, 04 Jun 2023 18:32:17 -0700 (PDT) Received: from dw-tp.ihost.com ([49.207.220.159]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c14d00b001aaec7a2a62sm5209287plj.188.2023.06.04.18.32.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jun 2023 18:32:17 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , Dave Chinner , Brian Foster , Christoph Hellwig , Andreas Gruenbacher , Ojaswin Mujoo , Disha Goel , "Ritesh Harjani (IBM)" Subject: [PATCHv6 3/5] iomap: Refactor some iop related accessor functions Date: Mon, 5 Jun 2023 07:01:50 +0530 Message-Id: <0d52baa3865f4c8fe49b8389f8e8070ed01144f8.1685900733.git.ritesh.list@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We would eventually use iomap_iop_** function naming by the rest of the buffered-io iomap code. This patch update function arguments and naming from iomap_set_range_uptodate() -> iomap_iop_set_range_uptodate(). iop_set_range_uptodate() then becomes an accessor function used by iomap_iop_** functions. Signed-off-by: Ritesh Harjani (IBM) --- fs/iomap/buffered-io.c | 108 ++++++++++++++++++++++++----------------- 1 file changed, 63 insertions(+), 45 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 6fffda355c45..e264ff0fa36e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -24,14 +24,14 @@ #define IOEND_BATCH_SIZE 4096 /* - * Structure allocated for each folio when block size < folio size - * to track sub-folio uptodate status and I/O completions. + * Structure allocated for each folio to track per-block uptodate state + * and I/O completions. */ struct iomap_page { atomic_t read_bytes_pending; atomic_t write_bytes_pending; - spinlock_t uptodate_lock; - unsigned long uptodate[]; + spinlock_t state_lock; + unsigned long state[]; }; static inline struct iomap_page *to_iomap_page(struct folio *folio) @@ -43,6 +43,48 @@ static inline struct iomap_page *to_iomap_page(struct folio *folio) static struct bio_set iomap_ioend_bioset; +static bool iop_test_full_uptodate(struct folio *folio) +{ + struct iomap_page *iop = to_iomap_page(folio); + struct inode *inode = folio->mapping->host; + + return bitmap_full(iop->state, i_blocks_per_folio(inode, folio)); +} + +static bool iop_test_block_uptodate(struct folio *folio, unsigned int block) +{ + struct iomap_page *iop = to_iomap_page(folio); + + return test_bit(block, iop->state); +} + +static void iop_set_range_uptodate(struct inode *inode, struct folio *folio, + size_t off, size_t len) +{ + struct iomap_page *iop = to_iomap_page(folio); + unsigned int first_blk = off >> inode->i_blkbits; + unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; + unsigned int nr_blks = last_blk - first_blk + 1; + unsigned long flags; + + spin_lock_irqsave(&iop->state_lock, flags); + bitmap_set(iop->state, first_blk, nr_blks); + if (iop_test_full_uptodate(folio)) + folio_mark_uptodate(folio); + spin_unlock_irqrestore(&iop->state_lock, flags); +} + +static void iomap_iop_set_range_uptodate(struct inode *inode, + struct folio *folio, size_t off, size_t len) +{ + struct iomap_page *iop = to_iomap_page(folio); + + if (iop) + iop_set_range_uptodate(inode, folio, off, len); + else + folio_mark_uptodate(folio); +} + static struct iomap_page *iomap_iop_alloc(struct inode *inode, struct folio *folio, unsigned int flags) { @@ -58,12 +100,12 @@ static struct iomap_page *iomap_iop_alloc(struct inode *inode, else gfp = GFP_NOFS | __GFP_NOFAIL; - iop = kzalloc(struct_size(iop, uptodate, BITS_TO_LONGS(nr_blocks)), + iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(nr_blocks)), gfp); if (iop) { - spin_lock_init(&iop->uptodate_lock); + spin_lock_init(&iop->state_lock); if (folio_test_uptodate(folio)) - bitmap_fill(iop->uptodate, nr_blocks); + bitmap_fill(iop->state, nr_blocks); folio_attach_private(folio, iop); } return iop; @@ -72,14 +114,12 @@ static struct iomap_page *iomap_iop_alloc(struct inode *inode, static void iomap_iop_free(struct folio *folio) { struct iomap_page *iop = to_iomap_page(folio); - struct inode *inode = folio->mapping->host; - unsigned int nr_blocks = i_blocks_per_folio(inode, folio); if (!iop) return; WARN_ON_ONCE(atomic_read(&iop->read_bytes_pending)); WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending)); - WARN_ON_ONCE(bitmap_full(iop->uptodate, nr_blocks) != + WARN_ON_ONCE(iop_test_full_uptodate(folio) != folio_test_uptodate(folio)); folio_detach_private(folio); kfree(iop); @@ -111,7 +151,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, /* move forward for each leading block marked uptodate */ for (i = first; i <= last; i++) { - if (!test_bit(i, iop->uptodate)) + if (!iop_test_block_uptodate(folio, i)) break; *pos += block_size; poff += block_size; @@ -121,7 +161,7 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, /* truncate len if we find any trailing uptodate block(s) */ for ( ; i <= last; i++) { - if (test_bit(i, iop->uptodate)) { + if (iop_test_block_uptodate(folio, i)) { plen -= (last - i + 1) * block_size; last = i - 1; break; @@ -145,30 +185,6 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, *lenp = plen; } -static void iomap_iop_set_range_uptodate(struct folio *folio, - struct iomap_page *iop, size_t off, size_t len) -{ - struct inode *inode = folio->mapping->host; - unsigned first = off >> inode->i_blkbits; - unsigned last = (off + len - 1) >> inode->i_blkbits; - unsigned long flags; - - spin_lock_irqsave(&iop->uptodate_lock, flags); - bitmap_set(iop->uptodate, first, last - first + 1); - if (bitmap_full(iop->uptodate, i_blocks_per_folio(inode, folio))) - folio_mark_uptodate(folio); - spin_unlock_irqrestore(&iop->uptodate_lock, flags); -} - -static void iomap_set_range_uptodate(struct folio *folio, - struct iomap_page *iop, size_t off, size_t len) -{ - if (iop) - iomap_iop_set_range_uptodate(folio, iop, off, len); - else - folio_mark_uptodate(folio); -} - static void iomap_finish_folio_read(struct folio *folio, size_t offset, size_t len, int error) { @@ -178,7 +194,8 @@ static void iomap_finish_folio_read(struct folio *folio, size_t offset, folio_clear_uptodate(folio); folio_set_error(folio); } else { - iomap_set_range_uptodate(folio, iop, offset, len); + iomap_iop_set_range_uptodate(folio->mapping->host, folio, + offset, len); } if (!iop || atomic_sub_and_test(len, &iop->read_bytes_pending)) @@ -214,7 +231,7 @@ struct iomap_readpage_ctx { static int iomap_read_inline_data(const struct iomap_iter *iter, struct folio *folio) { - struct iomap_page *iop; + struct iomap_page __maybe_unused *iop; const struct iomap *iomap = iomap_iter_srcmap(iter); size_t size = i_size_read(iter->inode) - iomap->offset; size_t poff = offset_in_page(iomap->offset); @@ -240,7 +257,8 @@ static int iomap_read_inline_data(const struct iomap_iter *iter, memcpy(addr, iomap->inline_data, size); memset(addr + size, 0, PAGE_SIZE - poff - size); kunmap_local(addr); - iomap_set_range_uptodate(folio, iop, offset, PAGE_SIZE - poff); + iomap_iop_set_range_uptodate(iter->inode, folio, offset, + PAGE_SIZE - poff); return 0; } @@ -277,7 +295,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter, if (iomap_block_needs_zeroing(iter, pos)) { folio_zero_range(folio, poff, plen); - iomap_set_range_uptodate(folio, iop, poff, plen); + iomap_iop_set_range_uptodate(iter->inode, folio, poff, plen); goto done; } @@ -452,7 +470,7 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count) last = (from + count - 1) >> inode->i_blkbits; for (i = first; i <= last; i++) - if (!test_bit(i, iop->uptodate)) + if (!iop_test_block_uptodate(folio, i)) return false; return true; } @@ -591,7 +609,7 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, if (status) return status; } - iomap_set_range_uptodate(folio, iop, poff, plen); + iomap_iop_set_range_uptodate(iter->inode, folio, poff, plen); } while ((block_start += plen) < block_end); return 0; @@ -698,7 +716,6 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len, size_t copied, struct folio *folio) { - struct iomap_page *iop = to_iomap_page(folio); flush_dcache_folio(folio); /* @@ -714,7 +731,8 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len, */ if (unlikely(copied < len && !folio_test_uptodate(folio))) return 0; - iomap_set_range_uptodate(folio, iop, offset_in_folio(folio, pos), len); + iomap_iop_set_range_uptodate(inode, folio, offset_in_folio(folio, pos), + len); filemap_dirty_folio(inode->i_mapping, folio); return copied; } @@ -1630,7 +1648,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, * invalid, grab a new one. */ for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) { - if (iop && !test_bit(i, iop->uptodate)) + if (iop && !iop_test_block_uptodate(folio, i)) continue; error = wpc->ops->map_blocks(wpc, inode, pos); From patchwork Mon Jun 5 01:31:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13266837 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C926C7EE2F for ; Mon, 5 Jun 2023 01:32:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232276AbjFEBca (ORCPT ); Sun, 4 Jun 2023 21:32:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232634AbjFEBcX (ORCPT ); Sun, 4 Jun 2023 21:32:23 -0400 Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB51AA1; Sun, 4 Jun 2023 18:32:21 -0700 (PDT) Received: by mail-qt1-x830.google.com with SMTP id d75a77b69052e-3f6c0d651adso47769781cf.2; Sun, 04 Jun 2023 18:32:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685928740; x=1688520740; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xLqGuzdsqiQuDLfBM9llr1W2m1fUQAqO6zoxsqQVjPk=; b=Q4H/X4xlT6SHFDj3Wuj5VJQwWP9wKb6cXdz6OdjgLohmqRtC+5pnJur5Rk2I3XBpFn UJpS2KfKsCMT5SI6bInk+oxk9tp0uyHMnC5cYVmk5xwEZ2H1wQ5pBB3Ly5yWzdlYCxIL Q5QG1z48DNborHcuFqcWNyBWwpCoM04ea06SuhKi9peRbwLhxeMC6Xw9G/XTtbTVgweZ iQRTQCG2dDMnV78rKsz8XsNNlyuBLhnXlH2uoeaij92NkuyNuz3kaHg3iPiyk7mNqtXe 6cPr2imrrWUGY5q/t5KygUSZuvSKOd4TAgeQABaFGgTNZyqHDWuJubh1eD56uT9GRs/3 v7iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685928740; x=1688520740; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xLqGuzdsqiQuDLfBM9llr1W2m1fUQAqO6zoxsqQVjPk=; b=cyr7q4mRZ4QllPjUNUnS4lt22x9wUACJTmebXEeHHLJqZ4WWVc1Bj4FuX/+8yxKEW/ KSOB/dT2X/0PxVX/upG2QQEI0E+kwsy6pe8YNyyT50nbXXTf8Zkv/hsu6HbfJ3pRN9MG 6LbqlJoCCxTaqC0I5o06D4rNvUIZRNm29ONa000dbwgQV+7GsDVnFxMvePQL5YG6AOoK x6VnVgyEkiQN83u9dG6IpHFVbu+pQSw1uGUkTQSzSU3INR8f/U6suYmern+FgL2bo3+9 5OHSQsMVovfle5xNfCTZdOy5YRJigkTCo4HN0uT8hZTGuf14tH+yFAczShoFRHNLj1vG a/PA== X-Gm-Message-State: AC+VfDyz958DnCS51TWD3FqbhSAnQmJNVqQHE201F7KjPuo7F7a2u710 wxmOVnZz1nUBfJ6mg/YOySWJef47fhE= X-Google-Smtp-Source: ACHHUZ7DEMjY6VySSmFewLWF+bj3KV25D7is32iTlbnla/Lgcor3Gov+GojsRNq43vbqzP99sxzOjA== X-Received: by 2002:a05:6214:2347:b0:625:775e:8802 with SMTP id hu7-20020a056214234700b00625775e8802mr6829036qvb.18.1685928740627; Sun, 04 Jun 2023 18:32:20 -0700 (PDT) Received: from dw-tp.ihost.com ([49.207.220.159]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c14d00b001aaec7a2a62sm5209287plj.188.2023.06.04.18.32.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jun 2023 18:32:20 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , Dave Chinner , Brian Foster , Christoph Hellwig , Andreas Gruenbacher , Ojaswin Mujoo , Disha Goel , "Ritesh Harjani (IBM)" Subject: [PATCHv6 4/5] iomap: Allocate iop in ->write_begin() early Date: Mon, 5 Jun 2023 07:01:51 +0530 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We dont need to allocate an iop in ->write_begin() for writes where the position and length completely overlap with the given folio. Therefore, such cases are skipped. Currently when the folio is uptodate, we only allocate iop at writeback time (in iomap_writepage_map()). This is ok until now, but when we are going to add support for per-block dirty state bitmap in iop, this could cause some performance degradation. The reason is that if we don't allocate iop during ->write_begin(), then we will never mark the necessary dirty bits in ->write_end() call. And we will have to mark all the bits as dirty at the writeback time, that could cause the same write amplification and performance problems as it is now. Signed-off-by: Ritesh Harjani (IBM) --- fs/iomap/buffered-io.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index e264ff0fa36e..a70242cb32b1 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -574,15 +574,24 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, size_t from = offset_in_folio(folio, pos), to = from + len; size_t poff, plen; - if (folio_test_uptodate(folio)) + /* + * If the write completely overlaps the current folio, then + * entire folio will be dirtied so there is no need for + * per-block state tracking structures to be attached to this folio. + */ + if (pos <= folio_pos(folio) && + pos + len >= folio_pos(folio) + folio_size(folio)) return 0; - folio_clear_error(folio); iop = iomap_iop_alloc(iter->inode, folio, iter->flags); if ((iter->flags & IOMAP_NOWAIT) && !iop && nr_blocks > 1) return -EAGAIN; + if (folio_test_uptodate(folio)) + return 0; + folio_clear_error(folio); + do { iomap_adjust_read_range(iter->inode, folio, &block_start, block_end - block_start, &poff, &plen); From patchwork Mon Jun 5 01:31:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ritesh Harjani (IBM)" X-Patchwork-Id: 13266835 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FB1BC7EE29 for ; Mon, 5 Jun 2023 01:32:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230437AbjFEBc2 (ORCPT ); Sun, 4 Jun 2023 21:32:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229886AbjFEBc0 (ORCPT ); Sun, 4 Jun 2023 21:32:26 -0400 Received: from mail-pg1-x52f.google.com (mail-pg1-x52f.google.com [IPv6:2607:f8b0:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11510BC; Sun, 4 Jun 2023 18:32:25 -0700 (PDT) Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-543a6cc5f15so690937a12.2; Sun, 04 Jun 2023 18:32:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685928744; x=1688520744; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tuFVXo5Ny6vLvYL0scNWsN/W1oeEd81ZruWOf5lilt8=; b=WqtoTZf4I7EWnFVD6VHopq5stg4cBkxi9A6FcScOZl+yhvRiQvVhb3uy8IfKC8Jzf8 bks+uqT7UnqmwHlWMcNRmJpqQ5N3a1xZO3711soeBZNXL13P72IdSFvnQwQMryqP4C+X h5UFposZkcCs/2dshNjHXprM8bbWVvgwz/I9ADuDx8RgnVMFV6wfAZbS/GRWzaTCFuXt 0VWsq3eTAW4s8QOweb5sn1auLo1nFh1n2T3LQXx+eGwvCBv1bHtuBzZM7v4Kf7SP9RkI uLDr8YwYSnDE+oPf6xa97DzmvazOgs/xHCjLwyAVoI6Q2Lu5DJW9xVwtBM4hFS/b+0mp 9JLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685928744; x=1688520744; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tuFVXo5Ny6vLvYL0scNWsN/W1oeEd81ZruWOf5lilt8=; b=f7sOiYYfkKwhOA9WjucjMNBU9P/ZyBYHq6Sb2Z07gOqbUAcR8SO0z6w0sayW88m3+B 9/eGDC50Mgwt+92e1tAdAlhmzhfELVGIMy7CuAzuPgjFF7KH52DtRF1UARaonIPjyiUq tfyooa/ckkb1CnQPeaiTz4sRmsDZPpaggXW9Yulv3gtGOE7yDSr6PEOSzEXh5ZGpCxIL vTojuXABPjOyKiD1xUsf2iqHmrt7lVIYXDENcYI9zfoIUt+jbf8D86R7WSSgFcYavky4 6Olyu51icPA0SO0JxHhiPzIST9asKrDa2pAuIcL/sxdQp+IpHIUxlfNDNUVbwPPUvv1k QN0A== X-Gm-Message-State: AC+VfDy9gCc8O+kbCIsBJZ9NqF2fkMJK1f0gdU5U5xyqgrcSrPBJfWhQ c6gEd2iFGf4RO65TpHVbBS7nJPqBMAU= X-Google-Smtp-Source: ACHHUZ6zpOTGoocDp+hfSbKZbeyX5psi3ysMifgtpKtfN58oxKyH794EQCvccxciNyq4CkRr9IA93w== X-Received: by 2002:a17:902:c206:b0:1b0:122f:672b with SMTP id 6-20020a170902c20600b001b0122f672bmr6157541pll.47.1685928744058; Sun, 04 Jun 2023 18:32:24 -0700 (PDT) Received: from dw-tp.ihost.com ([49.207.220.159]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c14d00b001aaec7a2a62sm5209287plj.188.2023.06.04.18.32.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Jun 2023 18:32:23 -0700 (PDT) From: "Ritesh Harjani (IBM)" To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, Matthew Wilcox , Dave Chinner , Brian Foster , Christoph Hellwig , Andreas Gruenbacher , Ojaswin Mujoo , Disha Goel , "Ritesh Harjani (IBM)" , Aravinda Herle Subject: [PATCHv6 5/5] iomap: Add per-block dirty state tracking to improve performance Date: Mon, 5 Jun 2023 07:01:52 +0530 Message-Id: X-Mailer: git-send-email 2.40.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When filesystem blocksize is less than folio size (either with mapping_large_folio_support() or with blocksize < pagesize) and when the folio is uptodate in pagecache, then even a byte write can cause an entire folio to be written to disk during writeback. This happens because we currently don't have a mechanism to track per-block dirty state within struct iomap_page. We currently only track uptodate state. This patch implements support for tracking per-block dirty state in iomap_page->state bitmap. This should help improve the filesystem write performance and help reduce write amplification. Performance testing of below fio workload reveals ~16x performance improvement using nvme with XFS (4k blocksize) on Power (64K pagesize) FIO reported write bw scores improved from around ~28 MBps to ~452 MBps. 1. [global] ioengine=psync rw=randwrite overwrite=1 pre_read=1 direct=0 bs=4k size=1G dir=./ numjobs=8 fdatasync=1 runtime=60 iodepth=64 group_reporting=1 [fio-run] 2. Also our internal performance team reported that this patch improves their database workload performance by around ~83% (with XFS on Power) Reported-by: Aravinda Herle Reported-by: Brian Foster Signed-off-by: Ritesh Harjani (IBM) --- fs/gfs2/aops.c | 2 +- fs/iomap/buffered-io.c | 172 +++++++++++++++++++++++++++++++++++------ fs/xfs/xfs_aops.c | 2 +- fs/zonefs/file.c | 2 +- include/linux/iomap.h | 1 + 5 files changed, 152 insertions(+), 27 deletions(-) diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index a5f4be6b9213..75efec3c3b71 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -746,7 +746,7 @@ static const struct address_space_operations gfs2_aops = { .writepages = gfs2_writepages, .read_folio = gfs2_read_folio, .readahead = gfs2_readahead, - .dirty_folio = filemap_dirty_folio, + .dirty_folio = iomap_dirty_folio, .release_folio = iomap_release_folio, .invalidate_folio = iomap_invalidate_folio, .bmap = gfs2_bmap, diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index a70242cb32b1..dee81d16804e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -85,6 +85,63 @@ static void iomap_iop_set_range_uptodate(struct inode *inode, folio_mark_uptodate(folio); } +static bool iop_test_block_dirty(struct folio *folio, int block) +{ + struct iomap_page *iop = to_iomap_page(folio); + struct inode *inode = folio->mapping->host; + unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); + + return test_bit(block + blks_per_folio, iop->state); +} + +static void iop_set_range_dirty(struct inode *inode, struct folio *folio, + size_t off, size_t len) +{ + struct iomap_page *iop = to_iomap_page(folio); + unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); + unsigned int first_blk = off >> inode->i_blkbits; + unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; + unsigned int nr_blks = last_blk - first_blk + 1; + unsigned long flags; + + spin_lock_irqsave(&iop->state_lock, flags); + bitmap_set(iop->state, first_blk + blks_per_folio, nr_blks); + spin_unlock_irqrestore(&iop->state_lock, flags); +} + +static void iomap_iop_set_range_dirty(struct inode *inode, struct folio *folio, + size_t off, size_t len) +{ + struct iomap_page *iop = to_iomap_page(folio); + + if (iop) + iop_set_range_dirty(inode, folio, off, len); +} + +static void iop_clear_range_dirty(struct inode *inode, struct folio *folio, + size_t off, size_t len) +{ + struct iomap_page *iop = to_iomap_page(folio); + unsigned int blks_per_folio = i_blocks_per_folio(inode, folio); + unsigned int first_blk = off >> inode->i_blkbits; + unsigned int last_blk = (off + len - 1) >> inode->i_blkbits; + unsigned int nr_blks = last_blk - first_blk + 1; + unsigned long flags; + + spin_lock_irqsave(&iop->state_lock, flags); + bitmap_clear(iop->state, first_blk + blks_per_folio, nr_blks); + spin_unlock_irqrestore(&iop->state_lock, flags); +} + +static void iomap_iop_clear_range_dirty(struct inode *inode, + struct folio *folio, size_t off, size_t len) +{ + struct iomap_page *iop = to_iomap_page(folio); + + if (iop) + iop_clear_range_dirty(inode, folio, off, len); +} + static struct iomap_page *iomap_iop_alloc(struct inode *inode, struct folio *folio, unsigned int flags) { @@ -100,12 +157,20 @@ static struct iomap_page *iomap_iop_alloc(struct inode *inode, else gfp = GFP_NOFS | __GFP_NOFAIL; - iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(nr_blocks)), + /* + * iop->state tracks two sets of state flags when the + * filesystem block size is smaller than the folio size. + * The first state tracks per-block uptodate and the + * second tracks per-block dirty state. + */ + iop = kzalloc(struct_size(iop, state, BITS_TO_LONGS(2 * nr_blocks)), gfp); if (iop) { spin_lock_init(&iop->state_lock); if (folio_test_uptodate(folio)) - bitmap_fill(iop->state, nr_blocks); + bitmap_set(iop->state, 0, nr_blocks); + if (folio_test_dirty(folio)) + bitmap_set(iop->state, nr_blocks, nr_blocks); folio_attach_private(folio, iop); } return iop; @@ -536,6 +601,18 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len) } EXPORT_SYMBOL_GPL(iomap_invalidate_folio); +bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio) +{ + struct iomap_page __maybe_unused *iop; + struct inode *inode = mapping->host; + size_t len = folio_size(folio); + + iop = iomap_iop_alloc(inode, folio, 0); + iomap_iop_set_range_dirty(inode, folio, 0, len); + return filemap_dirty_folio(mapping, folio); +} +EXPORT_SYMBOL_GPL(iomap_dirty_folio); + static void iomap_write_failed(struct inode *inode, loff_t pos, unsigned len) { @@ -742,6 +819,8 @@ static size_t __iomap_write_end(struct inode *inode, loff_t pos, size_t len, return 0; iomap_iop_set_range_uptodate(inode, folio, offset_in_folio(folio, pos), len); + iomap_iop_set_range_dirty(inode, folio, offset_in_folio(folio, pos), + copied); filemap_dirty_folio(inode->i_mapping, folio); return copied; } @@ -906,6 +985,61 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i, } EXPORT_SYMBOL_GPL(iomap_file_buffered_write); +static int iomap_write_delalloc_punch(struct inode *inode, struct folio *folio, + loff_t *punch_start_byte, loff_t start_byte, loff_t end_byte, + int (*punch)(struct inode *inode, loff_t offset, loff_t length)) +{ + struct iomap_page *iop; + unsigned int first_blk, last_blk, i; + loff_t last_byte; + u8 blkbits = inode->i_blkbits; + int ret = 0; + + if (start_byte > *punch_start_byte) { + ret = punch(inode, *punch_start_byte, + start_byte - *punch_start_byte); + if (ret) + goto out_err; + } + /* + * When we have per-block dirty tracking, there can be + * blocks within a folio which are marked uptodate + * but not dirty. In that case it is necessary to punch + * out such blocks to avoid leaking any delalloc blocks. + */ + iop = to_iomap_page(folio); + if (!iop) + goto skip_iop_punch; + + last_byte = min_t(loff_t, end_byte - 1, + (folio_next_index(folio) << PAGE_SHIFT) - 1); + first_blk = offset_in_folio(folio, start_byte) >> blkbits; + last_blk = offset_in_folio(folio, last_byte) >> blkbits; + for (i = first_blk; i <= last_blk; i++) { + if (!iop_test_block_dirty(folio, i)) { + ret = punch(inode, i << blkbits, 1 << blkbits); + if (ret) + goto out_err; + } + } + +skip_iop_punch: + /* + * Make sure the next punch start is correctly bound to + * the end of this data range, not the end of the folio. + */ + *punch_start_byte = min_t(loff_t, end_byte, + folio_next_index(folio) << PAGE_SHIFT); + + return ret; + +out_err: + folio_unlock(folio); + folio_put(folio); + return ret; + +} + /* * Scan the data range passed to us for dirty page cache folios. If we find a * dirty folio, punch out the preceeding range and update the offset from which @@ -940,26 +1074,9 @@ static int iomap_write_delalloc_scan(struct inode *inode, } /* if dirty, punch up to offset */ - if (folio_test_dirty(folio)) { - if (start_byte > *punch_start_byte) { - int error; - - error = punch(inode, *punch_start_byte, - start_byte - *punch_start_byte); - if (error) { - folio_unlock(folio); - folio_put(folio); - return error; - } - } - - /* - * Make sure the next punch start is correctly bound to - * the end of this data range, not the end of the folio. - */ - *punch_start_byte = min_t(loff_t, end_byte, - folio_next_index(folio) << PAGE_SHIFT); - } + if (folio_test_dirty(folio)) + iomap_write_delalloc_punch(inode, folio, punch_start_byte, + start_byte, end_byte, punch); /* move offset to start of next folio in range */ start_byte = folio_next_index(folio) << PAGE_SHIFT; @@ -1641,7 +1758,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, struct writeback_control *wbc, struct inode *inode, struct folio *folio, u64 end_pos) { - struct iomap_page *iop = iomap_iop_alloc(inode, folio, 0); + struct iomap_page *iop = to_iomap_page(folio); struct iomap_ioend *ioend, *next; unsigned len = i_blocksize(inode); unsigned nblocks = i_blocks_per_folio(inode, folio); @@ -1649,6 +1766,11 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, int error = 0, count = 0, i; LIST_HEAD(submit_list); + if (!iop && nblocks > 1) { + iop = iomap_iop_alloc(inode, folio, 0); + iomap_iop_set_range_dirty(inode, folio, 0, folio_size(folio)); + } + WARN_ON_ONCE(iop && atomic_read(&iop->write_bytes_pending) != 0); /* @@ -1657,7 +1779,7 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, * invalid, grab a new one. */ for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) { - if (iop && !iop_test_block_uptodate(folio, i)) + if (iop && !iop_test_block_dirty(folio, i)) continue; error = wpc->ops->map_blocks(wpc, inode, pos); @@ -1701,6 +1823,8 @@ iomap_writepage_map(struct iomap_writepage_ctx *wpc, } } + iomap_iop_clear_range_dirty(inode, folio, 0, + end_pos - folio_pos(folio)); folio_start_writeback(folio); folio_unlock(folio); diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 2ef78aa1d3f6..77c7332ae197 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -578,7 +578,7 @@ const struct address_space_operations xfs_address_space_operations = { .read_folio = xfs_vm_read_folio, .readahead = xfs_vm_readahead, .writepages = xfs_vm_writepages, - .dirty_folio = filemap_dirty_folio, + .dirty_folio = iomap_dirty_folio, .release_folio = iomap_release_folio, .invalidate_folio = iomap_invalidate_folio, .bmap = xfs_vm_bmap, diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c index 132f01d3461f..e508c8e97372 100644 --- a/fs/zonefs/file.c +++ b/fs/zonefs/file.c @@ -175,7 +175,7 @@ const struct address_space_operations zonefs_file_aops = { .read_folio = zonefs_read_folio, .readahead = zonefs_readahead, .writepages = zonefs_writepages, - .dirty_folio = filemap_dirty_folio, + .dirty_folio = iomap_dirty_folio, .release_folio = iomap_release_folio, .invalidate_folio = iomap_invalidate_folio, .migrate_folio = filemap_migrate_folio, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e2b836c2e119..eb9335c46bf3 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -264,6 +264,7 @@ bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos); bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); +bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,