Btrfs: remove unnecessary delalloc mutex for inodes
diff mbox series

Message ID 20191025095242.15996-1-fdmanana@kernel.org
State New
Headers show
Series
  • Btrfs: remove unnecessary delalloc mutex for inodes
Related show

Commit Message

Filipe Manana Oct. 25, 2019, 9:52 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

The inode delalloc mutex was added a long time ago by commit f248679e86fea
("Btrfs: add a delalloc mutex to inodes for delalloc reservations"), and
the reason for its introduction is not very clear from the change log. It
claims it solves bogus warnings from lockdep, however it lacks an example
report/warning from lockdep, or any explanation.

Since we have enough concurrentcy protection from the locks of the space
info and block reserve objects, and such lockdep warnings don't seem to
exist anymore (at least on a 5.3 kernel I couldn't get them with fstests,
ltp, fs_mark, etc), remove it, simplifying things a bit and decreasing
the size of the btrfs_inode structure. With some quick fio tests doing
direct IO and mmap writes I couldn't observe any significant performance
increase either (direct IO writes that don't increase the file's size
don't hold the inode's lock for their entire duration and mmap writes
don't hold the inode's lock at all), which are the only type of writes
that could see any performance gain due to less serialization.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/btrfs_inode.h    |  3 ---
 fs/btrfs/delalloc-space.c | 21 +++++----------------
 fs/btrfs/inode.c          |  1 -
 3 files changed, 5 insertions(+), 20 deletions(-)

Comments

Josef Bacik Oct. 25, 2019, 12:08 p.m. UTC | #1
On Fri, Oct 25, 2019 at 10:52:42AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The inode delalloc mutex was added a long time ago by commit f248679e86fea
> ("Btrfs: add a delalloc mutex to inodes for delalloc reservations"), and
> the reason for its introduction is not very clear from the change log. It
> claims it solves bogus warnings from lockdep, however it lacks an example
> report/warning from lockdep, or any explanation.
> 
> Since we have enough concurrentcy protection from the locks of the space
> info and block reserve objects, and such lockdep warnings don't seem to
> exist anymore (at least on a 5.3 kernel I couldn't get them with fstests,
> ltp, fs_mark, etc), remove it, simplifying things a bit and decreasing
> the size of the btrfs_inode structure. With some quick fio tests doing
> direct IO and mmap writes I couldn't observe any significant performance
> increase either (direct IO writes that don't increase the file's size
> don't hold the inode's lock for their entire duration and mmap writes
> don't hold the inode's lock at all), which are the only type of writes
> that could see any performance gain due to less serialization.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

The problem was taking the i_mutex in mmap, which is how I was protecting
delalloc reservations originally.  The delalloc mutex didn't come with all of
the other dependencies.  That's what the lockdep messages were about, removing
the lock isn't going to make them appear again.

We _had_ to lock around this because we used to do tricks to keep from
over-reserving, and if we didn't serialize delalloc reservations we'd end up
with ugly accounting problems when we tried to clean things up.

However with my recentish changes this isn't the case anymore.  Every operation
is responsible for reserving its space, and then adding it to the inode.  Then
cleaning up is straightforward and can't be mucked up by other users.  So we no
longer need the delalloc mutex to safe us from ourselves.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
Filipe Manana Oct. 25, 2019, 12:11 p.m. UTC | #2
On Fri, Oct 25, 2019 at 1:08 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Fri, Oct 25, 2019 at 10:52:42AM +0100, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > The inode delalloc mutex was added a long time ago by commit f248679e86fea
> > ("Btrfs: add a delalloc mutex to inodes for delalloc reservations"), and
> > the reason for its introduction is not very clear from the change log. It
> > claims it solves bogus warnings from lockdep, however it lacks an example
> > report/warning from lockdep, or any explanation.
> >
> > Since we have enough concurrentcy protection from the locks of the space
> > info and block reserve objects, and such lockdep warnings don't seem to
> > exist anymore (at least on a 5.3 kernel I couldn't get them with fstests,
> > ltp, fs_mark, etc), remove it, simplifying things a bit and decreasing
> > the size of the btrfs_inode structure. With some quick fio tests doing
> > direct IO and mmap writes I couldn't observe any significant performance
> > increase either (direct IO writes that don't increase the file's size
> > don't hold the inode's lock for their entire duration and mmap writes
> > don't hold the inode's lock at all), which are the only type of writes
> > that could see any performance gain due to less serialization.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> The problem was taking the i_mutex in mmap, which is how I was protecting
> delalloc reservations originally.  The delalloc mutex didn't come with all of
> the other dependencies.  That's what the lockdep messages were about, removing
> the lock isn't going to make them appear again.
>
> We _had_ to lock around this because we used to do tricks to keep from
> over-reserving, and if we didn't serialize delalloc reservations we'd end up
> with ugly accounting problems when we tried to clean things up.
>
> However with my recentish changes this isn't the case anymore.  Every operation
> is responsible for reserving its space, and then adding it to the inode.  Then
> cleaning up is straightforward and can't be mucked up by other users.  So we no
> longer need the delalloc mutex to safe us from ourselves.

Yes, thanks. That's what I thought, and couldn't see any reason for it
being needed given the current (much better) way of reserving space.

>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>
> Thanks,
>
> Josef
David Sterba Oct. 25, 2019, 4:32 p.m. UTC | #3
On Fri, Oct 25, 2019 at 10:52:42AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The inode delalloc mutex was added a long time ago by commit f248679e86fea
> ("Btrfs: add a delalloc mutex to inodes for delalloc reservations"), and
> the reason for its introduction is not very clear from the change log. It
> claims it solves bogus warnings from lockdep, however it lacks an example
> report/warning from lockdep, or any explanation.
> 
> Since we have enough concurrentcy protection from the locks of the space
> info and block reserve objects, and such lockdep warnings don't seem to
> exist anymore (at least on a 5.3 kernel I couldn't get them with fstests,
> ltp, fs_mark, etc), remove it, simplifying things a bit and decreasing
> the size of the btrfs_inode structure. With some quick fio tests doing
> direct IO and mmap writes I couldn't observe any significant performance
> increase either (direct IO writes that don't increase the file's size
> don't hold the inode's lock for their entire duration and mmap writes
> don't hold the inode's lock at all), which are the only type of writes
> that could see any performance gain due to less serialization.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next with Josef's comment. Thanks.

Patch
diff mbox series

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index f853835c409c..4e12a477d32e 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -63,9 +63,6 @@  struct btrfs_inode {
 	/* held while logging the inode in tree-log.c */
 	struct mutex log_mutex;
 
-	/* held while doing delalloc reservations */
-	struct mutex delalloc_mutex;
-
 	/* used to order data wrt metadata */
 	struct btrfs_ordered_inode_tree ordered_tree;
 
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index fe68d0e078bd..4ed9ed3ff917 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -307,7 +307,6 @@  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	unsigned nr_extents;
 	enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_ALL;
 	int ret = 0;
-	bool delalloc_lock = true;
 
 	/*
 	 * If we are a free space inode we need to not flush since we will be in
@@ -320,7 +319,6 @@  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	 */
 	if (btrfs_is_free_space_inode(inode)) {
 		flush = BTRFS_RESERVE_NO_FLUSH;
-		delalloc_lock = false;
 	} else {
 		if (current->journal_info)
 			flush = BTRFS_RESERVE_FLUSH_LIMIT;
@@ -329,9 +327,6 @@  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 			schedule_timeout(1);
 	}
 
-	if (delalloc_lock)
-		mutex_lock(&inode->delalloc_mutex);
-
 	num_bytes = ALIGN(num_bytes, fs_info->sectorsize);
 
 	/*
@@ -348,10 +343,12 @@  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 				&qgroup_reserve);
 	ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserve, true);
 	if (ret)
-		goto out_fail;
+		return ret;
 	ret = btrfs_reserve_metadata_bytes(root, block_rsv, meta_reserve, flush);
-	if (ret)
-		goto out_qgroup;
+	if (ret) {
+		btrfs_qgroup_free_meta_prealloc(root, qgroup_reserve);
+		return ret;
+	}
 
 	/*
 	 * Now we need to update our outstanding extents and csum bytes _first_
@@ -375,15 +372,7 @@  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes)
 	block_rsv->qgroup_rsv_reserved += qgroup_reserve;
 	spin_unlock(&block_rsv->lock);
 
-	if (delalloc_lock)
-		mutex_unlock(&inode->delalloc_mutex);
 	return 0;
-out_qgroup:
-	btrfs_qgroup_free_meta_prealloc(root, qgroup_reserve);
-out_fail:
-	if (delalloc_lock)
-		mutex_unlock(&inode->delalloc_mutex);
-	return ret;
 }
 
 /**
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f23b14ec743a..b894517d1e0e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9312,7 +9312,6 @@  struct inode *btrfs_alloc_inode(struct super_block *sb)
 	ei->io_failure_tree.track_uptodate = true;
 	atomic_set(&ei->sync_writers, 0);
 	mutex_init(&ei->log_mutex);
-	mutex_init(&ei->delalloc_mutex);
 	btrfs_ordered_inode_tree_init(&ei->ordered_tree);
 	INIT_LIST_HEAD(&ei->delalloc_inodes);
 	INIT_LIST_HEAD(&ei->delayed_iput);