diff mbox series

[3/3] btrfs: don't hold an extra reference for redirtied buffers

Message ID 20230508145839.43725-4-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [1/3] btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add | expand

Commit Message

Christoph Hellwig May 8, 2023, 2:58 p.m. UTC
When btrfs_redirty_list_add redirties a buffer, it also acquires
an extra reference that is released on transaction commit.  But
this is not required as buffers that are dirty or under writeback
are never freed (look for calls to extent_buffer_under_io())).

Remove the extra reference and the infrastructure used to drop it
again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c     |  2 --
 fs/btrfs/extent_io.c   |  1 -
 fs/btrfs/extent_io.h   |  1 -
 fs/btrfs/transaction.c |  9 ---------
 fs/btrfs/transaction.h |  3 ---
 fs/btrfs/zoned.c       | 28 ++++------------------------
 fs/btrfs/zoned.h       |  2 --
 7 files changed, 4 insertions(+), 42 deletions(-)

Comments

David Sterba May 9, 2023, 10:57 p.m. UTC | #1
On Mon, May 08, 2023 at 07:58:39AM -0700, Christoph Hellwig wrote:
> When btrfs_redirty_list_add redirties a buffer, it also acquires
> an extra reference that is released on transaction commit.  But
> this is not required as buffers that are dirty or under writeback
> are never freed (look for calls to extent_buffer_under_io())).
> 
> Remove the extra reference and the infrastructure used to drop it
> again.

I vaguely remember that the redirty list was need for zoned to avoid
some write pattern that disrupts the ordering, added in d3575156f662
("btrfs: zoned: redirty released extent buffers").

I'd appreciate more eyes on this patch, with the indirections and
writeback involved it's not clear to me that we don't need the list at
all. Pointing to extent_buffer_under_io() is a good start but the state
transitions of eb are complex so a more concrete example how it works
should be in the changelog.

For testing I'll add the series to misc-next, changelog update can be
done later.  Thanks.
Christoph Hellwig May 15, 2023, 9:22 a.m. UTC | #2
On Wed, May 10, 2023 at 12:57:37AM +0200, David Sterba wrote:
> On Mon, May 08, 2023 at 07:58:39AM -0700, Christoph Hellwig wrote:
> > When btrfs_redirty_list_add redirties a buffer, it also acquires
> > an extra reference that is released on transaction commit.  But
> > this is not required as buffers that are dirty or under writeback
> > are never freed (look for calls to extent_buffer_under_io())).
> > 
> > Remove the extra reference and the infrastructure used to drop it
> > again.
> 
> I vaguely remember that the redirty list was need for zoned to avoid
> some write pattern that disrupts the ordering, added in d3575156f662
> ("btrfs: zoned: redirty released extent buffers").

So the redirting itself is needed for that - without it buffers where
the dirty bit wasn't ever set would never get written, leading to a
write outside of the zone pointer.  But the extra reference can't
influece the write pattern, as we don't make writeback descriptions
based of it.

> I'd appreciate more eyes on this patch, with the indirections and
> writeback involved it's not clear to me that we don't need the list at
> all.

My suspicision is that Aoto-san wanted the extra safety of the extra
reference because he didn't want to trust or hadn't noticed the
extent_buffer_under_io() magic.  Auto-san, can you confirm or deny? :)
David Sterba May 30, 2023, 3:56 p.m. UTC | #3
On Mon, May 15, 2023 at 11:22:54AM +0200, Christoph Hellwig wrote:
> On Wed, May 10, 2023 at 12:57:37AM +0200, David Sterba wrote:
> > On Mon, May 08, 2023 at 07:58:39AM -0700, Christoph Hellwig wrote:
> > > When btrfs_redirty_list_add redirties a buffer, it also acquires
> > > an extra reference that is released on transaction commit.  But
> > > this is not required as buffers that are dirty or under writeback
> > > are never freed (look for calls to extent_buffer_under_io())).
> > > 
> > > Remove the extra reference and the infrastructure used to drop it
> > > again.
> > 
> > I vaguely remember that the redirty list was need for zoned to avoid
> > some write pattern that disrupts the ordering, added in d3575156f662
> > ("btrfs: zoned: redirty released extent buffers").
> 
> So the redirting itself is needed for that - without it buffers where
> the dirty bit wasn't ever set would never get written, leading to a
> write outside of the zone pointer.  But the extra reference can't
> influece the write pattern, as we don't make writeback descriptions
> based of it.
> 
> > I'd appreciate more eyes on this patch, with the indirections and
> > writeback involved it's not clear to me that we don't need the list at
> > all.
> 
> My suspicision is that Aoto-san wanted the extra safety of the extra
> reference because he didn't want to trust or hadn't noticed the
> extent_buffer_under_io() magic.  Auto-san, can you confirm or deny? :)

The number of patches above this one in the queue is increasing so it
would get harder to remove it. I took another look and agree that
regarding the references it's safe but would still like a confirmation.
Christoph Hellwig May 31, 2023, 4:16 a.m. UTC | #4
On Tue, May 30, 2023 at 05:56:48PM +0200, David Sterba wrote:
> > > I'd appreciate more eyes on this patch, with the indirections and
> > > writeback involved it's not clear to me that we don't need the list at
> > > all.
> > 
> > My suspicision is that Aoto-san wanted the extra safety of the extra
> > reference because he didn't want to trust or hadn't noticed the
> > extent_buffer_under_io() magic.  Auto-san, can you confirm or deny? :)
> 
> The number of patches above this one in the queue is increasing so it
> would get harder to remove it. I took another look and agree that
> regarding the references it's safe but would still like a confirmation.

As stated, I am very confident that this is safe based on all my
recent work with the extent_buffer code base.  I'd love to hear
from Aota, but there's not much more I can add here myself.
Naohiro Aota May 31, 2023, 3:04 p.m. UTC | #5
On Wed, May 31, 2023 at 06:16:26AM +0200, Christoph Hellwig wrote:
> On Tue, May 30, 2023 at 05:56:48PM +0200, David Sterba wrote:
> > > > I'd appreciate more eyes on this patch, with the indirections and
> > > > writeback involved it's not clear to me that we don't need the list at
> > > > all.
> > > 
> > > My suspicision is that Aoto-san wanted the extra safety of the extra
> > > reference because he didn't want to trust or hadn't noticed the
> > > extent_buffer_under_io() magic.  Auto-san, can you confirm or deny? :)
> > 
> > The number of patches above this one in the queue is increasing so it
> > would get harder to remove it. I took another look and agree that
> > regarding the references it's safe but would still like a confirmation.
> 
> As stated, I am very confident that this is safe based on all my
> recent work with the extent_buffer code base.  I'd love to hear
> from Aota, but there's not much more I can add here myself.

Sorry. I missed this thread is on-going.

I ran my test runs on misc-next containing this patch, and got no issue
regarding this. So, the patch should be good.

I didn't notice the extent_buffer_under_io() magic. If we can remove it,
let's remove unnecessary variable from extent_buffer.

Also, I dig into the "redirty" history to make it sure. In the first place,
it used releasing_list to hold all the to-be-released extent buffers, and
decided which buffers to re-dirty at the commit time. Then, in a later
version, I change the behavior to re-dirty a necessary buffer and add
re-dirtied one to the list in btrfs_free_tree_block(). In short, the list
was there mostly for the patch series' historical reason.

So, not sure still I can add this but, for the whole series:

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
David Sterba June 5, 2023, 3:58 p.m. UTC | #6
On Wed, May 31, 2023 at 03:04:26PM +0000, Naohiro Aota wrote:
> On Wed, May 31, 2023 at 06:16:26AM +0200, Christoph Hellwig wrote:
> > On Tue, May 30, 2023 at 05:56:48PM +0200, David Sterba wrote:
> > > > > I'd appreciate more eyes on this patch, with the indirections and
> > > > > writeback involved it's not clear to me that we don't need the list at
> > > > > all.
> > > > 
> > > > My suspicision is that Aoto-san wanted the extra safety of the extra
> > > > reference because he didn't want to trust or hadn't noticed the
> > > > extent_buffer_under_io() magic.  Auto-san, can you confirm or deny? :)
> > > 
> > > The number of patches above this one in the queue is increasing so it
> > > would get harder to remove it. I took another look and agree that
> > > regarding the references it's safe but would still like a confirmation.
> > 
> > As stated, I am very confident that this is safe based on all my
> > recent work with the extent_buffer code base.  I'd love to hear
> > from Aota, but there's not much more I can add here myself.
> 
> Sorry. I missed this thread is on-going.
> 
> I ran my test runs on misc-next containing this patch, and got no issue
> regarding this. So, the patch should be good.
> 
> I didn't notice the extent_buffer_under_io() magic. If we can remove it,
> let's remove unnecessary variable from extent_buffer.
> 
> Also, I dig into the "redirty" history to make it sure. In the first place,
> it used releasing_list to hold all the to-be-released extent buffers, and
> decided which buffers to re-dirty at the commit time. Then, in a later
> version, I change the behavior to re-dirty a necessary buffer and add
> re-dirtied one to the list in btrfs_free_tree_block(). In short, the list
> was there mostly for the patch series' historical reason.
> 
> So, not sure still I can add this but, for the whole series:
> 
> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>

Perfect, thanks. Changelog updated and rev-by added.
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index acd8ebf2824d18..ae81a3b586eaed 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -5106,8 +5106,6 @@  void btrfs_cleanup_one_transaction(struct btrfs_transaction *cur_trans,
 				     EXTENT_DIRTY);
 	btrfs_destroy_pinned_extent(fs_info, &cur_trans->pinned_extents);
 
-	btrfs_free_redirty_list(cur_trans);
-
 	cur_trans->state =TRANS_STATE_COMPLETED;
 	wake_up(&cur_trans->commit_wait);
 }
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a829390632a538..d8becf1cdbc09e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3557,7 +3557,6 @@  __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 	init_rwsem(&eb->lock);
 
 	btrfs_leak_debug_add_eb(eb);
-	INIT_LIST_HEAD(&eb->release_list);
 
 	spin_lock_init(&eb->refs_lock);
 	atomic_set(&eb->refs, 1);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index f937654230d3c5..6f3cfadd232c95 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -89,7 +89,6 @@  struct extent_buffer {
 	struct rw_semaphore lock;
 
 	struct page *pages[INLINE_EXTENT_BUFFER_PAGES];
-	struct list_head release_list;
 #ifdef CONFIG_BTRFS_DEBUG
 	struct list_head leak_list;
 #endif
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 27c616fdfae274..fe0f00e717a834 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -374,8 +374,6 @@  static noinline int join_transaction(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&cur_trans->dirty_bgs_lock);
 	INIT_LIST_HEAD(&cur_trans->deleted_bgs);
 	spin_lock_init(&cur_trans->dropped_roots_lock);
-	INIT_LIST_HEAD(&cur_trans->releasing_ebs);
-	spin_lock_init(&cur_trans->releasing_ebs_lock);
 	list_add_tail(&cur_trans->list, &fs_info->trans_list);
 	extent_io_tree_init(fs_info, &cur_trans->dirty_pages,
 			IO_TREE_TRANS_DIRTY_PAGES);
@@ -2482,13 +2480,6 @@  int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
 		goto scrub_continue;
 	}
 
-	/*
-	 * At this point, we should have written all the tree blocks allocated
-	 * in this transaction. So it's now safe to free the redirtyied extent
-	 * buffers.
-	 */
-	btrfs_free_redirty_list(cur_trans);
-
 	ret = write_all_supers(fs_info, 0);
 	/*
 	 * the super is written, we can safely allow the tree-loggers
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index fa728ab8082614..8e9fa23bd7fed7 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -94,9 +94,6 @@  struct btrfs_transaction {
 	 */
 	atomic_t pending_ordered;
 	wait_queue_head_t pending_wait;
-
-	spinlock_t releasing_ebs_lock;
-	struct list_head releasing_ebs;
 };
 
 enum {
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 7095cfca2fdde1..5612731fc00d78 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1603,37 +1603,17 @@  void btrfs_calc_zone_unusable(struct btrfs_block_group *cache)
 void btrfs_redirty_list_add(struct btrfs_transaction *trans,
 			    struct extent_buffer *eb)
 {
-	struct btrfs_fs_info *fs_info = eb->fs_info;
-
-	if (!btrfs_is_zoned(fs_info) ||
-	    btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN) ||
-	    !list_empty(&eb->release_list))
+	if (!btrfs_is_zoned(eb->fs_info) ||
+	    btrfs_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN))
 		return;
 
+	ASSERT(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+
 	memzero_extent_buffer(eb, 0, eb->len);
 	set_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags);
 	set_extent_buffer_dirty(eb);
 	set_extent_bits_nowait(&trans->dirty_pages, eb->start,
 			       eb->start + eb->len - 1, EXTENT_DIRTY);
-
-	spin_lock(&trans->releasing_ebs_lock);
-	list_add_tail(&eb->release_list, &trans->releasing_ebs);
-	spin_unlock(&trans->releasing_ebs_lock);
-	atomic_inc(&eb->refs);
-}
-
-void btrfs_free_redirty_list(struct btrfs_transaction *trans)
-{
-	spin_lock(&trans->releasing_ebs_lock);
-	while (!list_empty(&trans->releasing_ebs)) {
-		struct extent_buffer *eb;
-
-		eb = list_first_entry(&trans->releasing_ebs,
-				      struct extent_buffer, release_list);
-		list_del_init(&eb->release_list);
-		free_extent_buffer(eb);
-	}
-	spin_unlock(&trans->releasing_ebs_lock);
 }
 
 bool btrfs_use_zone_append(struct btrfs_bio *bbio)
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index c0570d35fea291..3058ef559c9813 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -54,7 +54,6 @@  int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new);
 void btrfs_calc_zone_unusable(struct btrfs_block_group *cache);
 void btrfs_redirty_list_add(struct btrfs_transaction *trans,
 			    struct extent_buffer *eb);
-void btrfs_free_redirty_list(struct btrfs_transaction *trans);
 bool btrfs_use_zone_append(struct btrfs_bio *bbio);
 void btrfs_record_physical_zoned(struct btrfs_bio *bbio);
 void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered);
@@ -179,7 +178,6 @@  static inline void btrfs_calc_zone_unusable(struct btrfs_block_group *cache) { }
 
 static inline void btrfs_redirty_list_add(struct btrfs_transaction *trans,
 					  struct extent_buffer *eb) { }
-static inline void btrfs_free_redirty_list(struct btrfs_transaction *trans) { }
 
 static inline bool btrfs_use_zone_append(struct btrfs_bio *bbio)
 {