Message ID | 20200527101104.7441-1-nborisov@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: Remove BTRFS_INODE_IN_DELALLOC_LIST flag | expand |
On Wed, May 27, 2020 at 01:11:04PM +0300, Nikolay Borisov wrote: > The flag simply replicates whether btrfs_inode::delallocs_inodes list > is empty or not. Just defer this check to the list management functions > (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are > always called under btrfs_root::delalloc_lock. > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> Nice. Added to misc-next, thanks.
On Wed, May 27, 2020 at 12:42 PM Nikolay Borisov <nborisov@suse.com> wrote: > > The flag simply replicates whether btrfs_inode::delallocs_inodes list > is empty or not. Just defer this check to the list management functions > (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are > always called under btrfs_root::delalloc_lock. The flag is there to avoid taking the root's delalloc_lock spinlock everytime a range is marked for delalloc for any inode of the subvolume. Have you measured performance with very high concurrency of buffered writes against files in the same subvolume? Thanks. > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > --- > fs/btrfs/btrfs_inode.h | 1 - > fs/btrfs/inode.c | 11 ++--------- > 2 files changed, 2 insertions(+), 10 deletions(-) > > diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h > index aeff56a0e105..da6743c70412 100644 > --- a/fs/btrfs/btrfs_inode.h > +++ b/fs/btrfs/btrfs_inode.h > @@ -27,7 +27,6 @@ enum { > BTRFS_INODE_HAS_ASYNC_EXTENT, > BTRFS_INODE_NEEDS_FULL_SYNC, > BTRFS_INODE_COPY_EVERYTHING, > - BTRFS_INODE_IN_DELALLOC_LIST, > BTRFS_INODE_HAS_PROPS, > BTRFS_INODE_SNAPSHOT_FLUSH, > }; > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 7d2f6e55a234..3e87a6644e09 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1865,8 +1865,6 @@ static void btrfs_add_delalloc_inodes(struct btrfs_root *root, > if (list_empty(&BTRFS_I(inode)->delalloc_inodes)) { > list_add_tail(&BTRFS_I(inode)->delalloc_inodes, > &root->delalloc_inodes); > - set_bit(BTRFS_INODE_IN_DELALLOC_LIST, > - &BTRFS_I(inode)->runtime_flags); > root->nr_delalloc_inodes++; > if (root->nr_delalloc_inodes == 1) { > spin_lock(&fs_info->delalloc_root_lock); > @@ -1887,8 +1885,6 @@ void __btrfs_del_delalloc_inode(struct btrfs_root *root, > > if (!list_empty(&inode->delalloc_inodes)) { > list_del_init(&inode->delalloc_inodes); > - clear_bit(BTRFS_INODE_IN_DELALLOC_LIST, > - &inode->runtime_flags); > root->nr_delalloc_inodes--; > if (!root->nr_delalloc_inodes) { > ASSERT(list_empty(&root->delalloc_inodes)); > @@ -1944,8 +1940,7 @@ void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state, > BTRFS_I(inode)->delalloc_bytes += len; > if (*bits & EXTENT_DEFRAG) > BTRFS_I(inode)->defrag_bytes += len; > - if (do_list && !test_bit(BTRFS_INODE_IN_DELALLOC_LIST, > - &BTRFS_I(inode)->runtime_flags)) > + if (do_list) > btrfs_add_delalloc_inodes(root, inode); > spin_unlock(&BTRFS_I(inode)->lock); > } > @@ -2014,9 +2009,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, > fs_info->delalloc_batch); > spin_lock(&inode->lock); > inode->delalloc_bytes -= len; > - if (do_list && inode->delalloc_bytes == 0 && > - test_bit(BTRFS_INODE_IN_DELALLOC_LIST, > - &inode->runtime_flags)) > + if (do_list && inode->delalloc_bytes == 0) > btrfs_del_delalloc_inode(root, inode); > spin_unlock(&inode->lock); > } > -- > 2.17.1 >
On Wed, May 27, 2020 at 08:48:31PM +0100, Filipe Manana wrote: > On Wed, May 27, 2020 at 12:42 PM Nikolay Borisov <nborisov@suse.com> wrote: > > > > The flag simply replicates whether btrfs_inode::delallocs_inodes list > > is empty or not. Just defer this check to the list management functions > > (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are > > always called under btrfs_root::delalloc_lock. > > The flag is there to avoid taking the root's delalloc_lock spinlock > everytime a range is marked for delalloc for any inode of the > subvolume. I overlooked that not all uses of the bit are under delalloc_lock, which would make it redundant, but both test_bit are inside inode lock, not the delalloc lock. > Have you measured performance with very high concurrency of buffered > writes against files in the same subvolume? I'll remove the patch for now unless the performance is verified to be ok.
On 27.05.20 г. 22:48 ч., Filipe Manana wrote: > On Wed, May 27, 2020 at 12:42 PM Nikolay Borisov <nborisov@suse.com> wrote: >> >> The flag simply replicates whether btrfs_inode::delallocs_inodes list >> is empty or not. Just defer this check to the list management functions >> (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are >> always called under btrfs_root::delalloc_lock. > > The flag is there to avoid taking the root's delalloc_lock spinlock > everytime a range is marked for delalloc for any inode of the > subvolume. > Have you measured performance with very high concurrency of buffered > writes against files in the same subvolume? > > Thanks. I performed the following test on a 16-core VM (physical cores are 12 on the host): fio --direct=0 --ioengine=sync --thread --directory=/media/scratch/ --invalidate=1 --group_reporting=1 \ --fallocate=posix --name=RandomWrites-async-64512-4k-4 --new_group --rw=randwrite --size=50m --numjobs=200 \ --bs=4k --fsync_on_close=0 --fallocate=none --end_fsync=0 --filename_format=FioWorkloads.\$jobnum And here's what /proc/lock_stat report: With BTRFS_INODE_IN_DELALLOC_LIST: class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg &root->delalloc_lock: 245 245 0.08 4.14 88.10 0.36 62168 122055 0.05 60.41 32721.41 0.27 Fio output: WRITE: bw=43.9MiB/s (45.0MB/s), 43.9MiB/s-43.9MiB/s (45.0MB/s-45.0MB/s), io=9.77GiB (10.5GB), run=228044-228044msec Without BTRFS_INODE_IN_DELALLOC_LIST: class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg &root->delalloc_lock: 8824 8838 0.05 210.92 3451.03 0.39 2542011 2685019 0.03 301.63 451369.98 0.17 WRITE: bw=33.8MiB/s (35.5MB/s), 33.8MiB/s-33.8MiB/s (35.5MB/s-35.5MB/s), io=9.77GiB (10.5GB), run=295770-295770msec So yeah, it does have noticeable effect, and massively reduces lock contentions on the delalloc_lock but it increases the critical section, due to the added avg times. But the improvement in performance in terms of throughput and reduced acquires/contentions is indisputable. So yeah, this patch should be dropped. Thanks for spotting this.
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index aeff56a0e105..da6743c70412 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -27,7 +27,6 @@ enum { BTRFS_INODE_HAS_ASYNC_EXTENT, BTRFS_INODE_NEEDS_FULL_SYNC, BTRFS_INODE_COPY_EVERYTHING, - BTRFS_INODE_IN_DELALLOC_LIST, BTRFS_INODE_HAS_PROPS, BTRFS_INODE_SNAPSHOT_FLUSH, }; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 7d2f6e55a234..3e87a6644e09 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1865,8 +1865,6 @@ static void btrfs_add_delalloc_inodes(struct btrfs_root *root, if (list_empty(&BTRFS_I(inode)->delalloc_inodes)) { list_add_tail(&BTRFS_I(inode)->delalloc_inodes, &root->delalloc_inodes); - set_bit(BTRFS_INODE_IN_DELALLOC_LIST, - &BTRFS_I(inode)->runtime_flags); root->nr_delalloc_inodes++; if (root->nr_delalloc_inodes == 1) { spin_lock(&fs_info->delalloc_root_lock); @@ -1887,8 +1885,6 @@ void __btrfs_del_delalloc_inode(struct btrfs_root *root, if (!list_empty(&inode->delalloc_inodes)) { list_del_init(&inode->delalloc_inodes); - clear_bit(BTRFS_INODE_IN_DELALLOC_LIST, - &inode->runtime_flags); root->nr_delalloc_inodes--; if (!root->nr_delalloc_inodes) { ASSERT(list_empty(&root->delalloc_inodes)); @@ -1944,8 +1940,7 @@ void btrfs_set_delalloc_extent(struct inode *inode, struct extent_state *state, BTRFS_I(inode)->delalloc_bytes += len; if (*bits & EXTENT_DEFRAG) BTRFS_I(inode)->defrag_bytes += len; - if (do_list && !test_bit(BTRFS_INODE_IN_DELALLOC_LIST, - &BTRFS_I(inode)->runtime_flags)) + if (do_list) btrfs_add_delalloc_inodes(root, inode); spin_unlock(&BTRFS_I(inode)->lock); } @@ -2014,9 +2009,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, fs_info->delalloc_batch); spin_lock(&inode->lock); inode->delalloc_bytes -= len; - if (do_list && inode->delalloc_bytes == 0 && - test_bit(BTRFS_INODE_IN_DELALLOC_LIST, - &inode->runtime_flags)) + if (do_list && inode->delalloc_bytes == 0) btrfs_del_delalloc_inode(root, inode); spin_unlock(&inode->lock); }
The flag simply replicates whether btrfs_inode::delallocs_inodes list is empty or not. Just defer this check to the list management functions (btrfs_add_delalloc_inodes/__btrfs_del_delalloc_inode) which are always called under btrfs_root::delalloc_lock. Signed-off-by: Nikolay Borisov <nborisov@suse.com> --- fs/btrfs/btrfs_inode.h | 1 - fs/btrfs/inode.c | 11 ++--------- 2 files changed, 2 insertions(+), 10 deletions(-) -- 2.17.1