btrfs: Remove BTRFS_INODE_IN_DELALLOC_LIST flag
diff mbox series

Message ID 20200527101104.7441-1-nborisov@suse.com
State New
Headers show
Series
  • btrfs: Remove BTRFS_INODE_IN_DELALLOC_LIST flag
Related show

Commit Message

Nikolay Borisov May 27, 2020, 10:11 a.m. UTC
The flag simply replicates whether btrfs_inode::delallocs_inodes list
is empty or not. Just defer this check to the list management functions
(btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are
always called under btrfs_root::delalloc_lock.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/btrfs_inode.h |  1 -
 fs/btrfs/inode.c       | 11 ++---------
 2 files changed, 2 insertions(+), 10 deletions(-)

--
2.17.1

Comments

David Sterba May 27, 2020, 2:03 p.m. UTC | #1
On Wed, May 27, 2020 at 01:11:04PM +0300, Nikolay Borisov wrote:
> The flag simply replicates whether btrfs_inode::delallocs_inodes list
> is empty or not. Just defer this check to the list management functions
> (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are
> always called under btrfs_root::delalloc_lock.
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>

Nice. Added to misc-next, thanks.
Filipe Manana May 27, 2020, 7:48 p.m. UTC | #2
On Wed, May 27, 2020 at 12:42 PM Nikolay Borisov <nborisov@suse.com> wrote:
>
> The flag simply replicates whether btrfs_inode::delallocs_inodes list
> is empty or not. Just defer this check to the list management functions
> (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are
> always called under btrfs_root::delalloc_lock.

The flag is there to avoid taking the root's delalloc_lock spinlock
everytime a range is marked for delalloc for any inode of the
subvolume.
Have you measured performance with very high concurrency of buffered
writes against files in the same subvolume?

Thanks.

>
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
>  fs/btrfs/btrfs_inode.h |  1 -
>  fs/btrfs/inode.c       | 11 ++---------
>  2 files changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index aeff56a0e105..da6743c70412 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -27,7 +27,6 @@ enum {
>         BTRFS_INODE_HAS_ASYNC_EXTENT,
>         BTRFS_INODE_NEEDS_FULL_SYNC,
>         BTRFS_INODE_COPY_EVERYTHING,
> -       BTRFS_INODE_IN_DELALLOC_LIST,
>         BTRFS_INODE_HAS_PROPS,
>         BTRFS_INODE_SNAPSHOT_FLUSH,
>  };
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 7d2f6e55a234..3e87a6644e09 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1865,8 +1865,6 @@ static void btrfs_add_delalloc_inodes(struct btrfs_root *root,
>         if (list_empty(&BTRFS_I(inode)->delalloc_inodes)) {
>                 list_add_tail(&BTRFS_I(inode)->delalloc_inodes,
>                               &root->delalloc_inodes);
> -               set_bit(BTRFS_INODE_IN_DELALLOC_LIST,
> -                       &BTRFS_I(inode)->runtime_flags);
>                 root->nr_delalloc_inodes++;
>                 if (root->nr_delalloc_inodes == 1) {
>                         spin_lock(&fs_info->delalloc_root_lock);
> @@ -1887,8 +1885,6 @@ void __btrfs_del_delalloc_inode(struct btrfs_root *root,
>
>         if (!list_empty(&inode->delalloc_inodes)) {
>                 list_del_init(&inode->delalloc_inodes);
> -               clear_bit(BTRFS_INODE_IN_DELALLOC_LIST,
> -                         &inode->runtime_flags);
>                 root->nr_delalloc_inodes--;
>                 if (!root->nr_delalloc_inodes) {
>                         ASSERT(list_empty(&root->delalloc_inodes));
> @@ -1944,8 +1940,7 @@ void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state,
>                 BTRFS_I(inode)->delalloc_bytes += len;
>                 if (*bits & EXTENT_DEFRAG)
>                         BTRFS_I(inode)->defrag_bytes += len;
> -               if (do_list && !test_bit(BTRFS_INODE_IN_DELALLOC_LIST,
> -                                        &BTRFS_I(inode)->runtime_flags))
> +               if (do_list)
>                         btrfs_add_delalloc_inodes(root, inode);
>                 spin_unlock(&BTRFS_I(inode)->lock);
>         }
> @@ -2014,9 +2009,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode,
>                                          fs_info->delalloc_batch);
>                 spin_lock(&inode->lock);
>                 inode->delalloc_bytes -= len;
> -               if (do_list && inode->delalloc_bytes == 0 &&
> -                   test_bit(BTRFS_INODE_IN_DELALLOC_LIST,
> -                                       &inode->runtime_flags))
> +               if (do_list && inode->delalloc_bytes == 0)
>                         btrfs_del_delalloc_inode(root, inode);
>                 spin_unlock(&inode->lock);
>         }
> --
> 2.17.1
>
David Sterba May 27, 2020, 9:24 p.m. UTC | #3
On Wed, May 27, 2020 at 08:48:31PM +0100, Filipe Manana wrote:
> On Wed, May 27, 2020 at 12:42 PM Nikolay Borisov <nborisov@suse.com> wrote:
> >
> > The flag simply replicates whether btrfs_inode::delallocs_inodes list
> > is empty or not. Just defer this check to the list management functions
> > (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are
> > always called under btrfs_root::delalloc_lock.
> 
> The flag is there to avoid taking the root's delalloc_lock spinlock
> everytime a range is marked for delalloc for any inode of the
> subvolume.

I overlooked that not all uses of the bit are under delalloc_lock, which
would make it redundant, but both test_bit are inside inode lock, not
the delalloc lock.

> Have you measured performance with very high concurrency of buffered
> writes against files in the same subvolume?

I'll remove the patch for now unless the performance is verified to be
ok.
Nikolay Borisov May 28, 2020, 7:09 a.m. UTC | #4
On 27.05.20 г. 22:48 ч., Filipe Manana wrote:
> On Wed, May 27, 2020 at 12:42 PM Nikolay Borisov <nborisov@suse.com> wrote:
>>
>> The flag simply replicates whether btrfs_inode::delallocs_inodes list
>> is empty or not. Just defer this check to the list management functions
>> (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are
>> always called under btrfs_root::delalloc_lock.
> 
> The flag is there to avoid taking the root's delalloc_lock spinlock
> everytime a range is marked for delalloc for any inode of the
> subvolume.
> Have you measured performance with very high concurrency of buffered
> writes against files in the same subvolume?
> 
> Thanks.


I performed the following test on a 16-core VM 
(physical cores are 12 on the host): 

fio --direct=0 --ioengine=sync --thread --directory=/media/scratch/ --invalidate=1 --group_reporting=1 \
--fallocate=posix --name=RandomWrites-async-64512-4k-4 --new_group --rw=randwrite --size=50m --numjobs=200 \
--bs=4k --fsync_on_close=0 --fallocate=none --end_fsync=0 --filename_format=FioWorkloads.\$jobnum

And here's what /proc/lock_stat report: 

With BTRFS_INODE_IN_DELALLOC_LIST: 

class name                     con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
 &root->delalloc_lock:           245              245             0.08          4.14          88.10          0.36            62168         122055         0.05          60.41        32721.41           0.27

Fio output:
 WRITE: bw=43.9MiB/s (45.0MB/s), 43.9MiB/s-43.9MiB/s (45.0MB/s-45.0MB/s), io=9.77GiB (10.5GB), run=228044-228044msec


Without BTRFS_INODE_IN_DELALLOC_LIST:
class name                           con-bounces    contentions   waittime-min   waittime-max waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
 &root->delalloc_lock:                   8824          8838           0.05           210.92        3451.03           0.39        2542011        2685019           0.03         301.63      451369.98           0.17

  WRITE: bw=33.8MiB/s (35.5MB/s), 33.8MiB/s-33.8MiB/s (35.5MB/s-35.5MB/s), io=9.77GiB (10.5GB), run=295770-295770msec


So yeah, it does have noticeable effect, and massively reduces lock contentions on the delalloc_lock but 
it increases the critical section, due to the added avg times. But the improvement in performance 
in terms of throughput and reduced acquires/contentions is indisputable. So yeah, this patch 
should be dropped. 

Thanks for spotting this.

Patch
diff mbox series

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index aeff56a0e105..da6743c70412 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -27,7 +27,6 @@  enum {
 	BTRFS_INODE_HAS_ASYNC_EXTENT,
 	BTRFS_INODE_NEEDS_FULL_SYNC,
 	BTRFS_INODE_COPY_EVERYTHING,
-	BTRFS_INODE_IN_DELALLOC_LIST,
 	BTRFS_INODE_HAS_PROPS,
 	BTRFS_INODE_SNAPSHOT_FLUSH,
 };
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7d2f6e55a234..3e87a6644e09 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1865,8 +1865,6 @@  static void btrfs_add_delalloc_inodes(struct btrfs_root *root,
 	if (list_empty(&BTRFS_I(inode)->delalloc_inodes)) {
 		list_add_tail(&BTRFS_I(inode)->delalloc_inodes,
 			      &root->delalloc_inodes);
-		set_bit(BTRFS_INODE_IN_DELALLOC_LIST,
-			&BTRFS_I(inode)->runtime_flags);
 		root->nr_delalloc_inodes++;
 		if (root->nr_delalloc_inodes == 1) {
 			spin_lock(&fs_info->delalloc_root_lock);
@@ -1887,8 +1885,6 @@  void __btrfs_del_delalloc_inode(struct btrfs_root *root,

 	if (!list_empty(&inode->delalloc_inodes)) {
 		list_del_init(&inode->delalloc_inodes);
-		clear_bit(BTRFS_INODE_IN_DELALLOC_LIST,
-			  &inode->runtime_flags);
 		root->nr_delalloc_inodes--;
 		if (!root->nr_delalloc_inodes) {
 			ASSERT(list_empty(&root->delalloc_inodes));
@@ -1944,8 +1940,7 @@  void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state,
 		BTRFS_I(inode)->delalloc_bytes += len;
 		if (*bits & EXTENT_DEFRAG)
 			BTRFS_I(inode)->defrag_bytes += len;
-		if (do_list && !test_bit(BTRFS_INODE_IN_DELALLOC_LIST,
-					 &BTRFS_I(inode)->runtime_flags))
+		if (do_list)
 			btrfs_add_delalloc_inodes(root, inode);
 		spin_unlock(&BTRFS_I(inode)->lock);
 	}
@@ -2014,9 +2009,7 @@  void btrfs_clear_delalloc_extent(struct inode *vfs_inode,
 					 fs_info->delalloc_batch);
 		spin_lock(&inode->lock);
 		inode->delalloc_bytes -= len;
-		if (do_list && inode->delalloc_bytes == 0 &&
-		    test_bit(BTRFS_INODE_IN_DELALLOC_LIST,
-					&inode->runtime_flags))
+		if (do_list && inode->delalloc_bytes == 0)
 			btrfs_del_delalloc_inode(root, inode);
 		spin_unlock(&inode->lock);
 	}