Message ID | 20190927102318.12830-2-nborisov@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs llseek improvement, take 2 | expand |
On Fri, Sep 27, 2019 at 01:23:16PM +0300, Nikolay Borisov wrote: > Modifying the file position is done on a per-file basis. This renders > holding the inode lock for writing useless and makes the performance of > concurrent llseek's abysmal. > > Fix this by holding the inode for read. This provides protection against > concurrent truncates and find_desired_extent already includes proper > extent locking for the range which ensures proper locking against > concurrent writes. SEEK_CUR and SEEK_END can be done lockessly. > The former is synchronized by file::f_lock spinlock. SEEK_END is not > synchronized but atomic, but that's OK since there is not guarantee > that SEEK_END will always be at the end of the file in the face of > tail modifications. > > This change brings ~82% performance improvement when doing a lot of > parallel fseeks. The workload essentially does: > > for (d=0; d<num_seek_read; d++) > { > /* offset %= 16777216; */ > fseek (f, 256 * d % 16777216, SEEK_SET); > fread (buffer, 64, 1, f); > } > > Without patch: > > num workprocesses = 16 > num fseek/fread = 8000000 > step = 256 > fork 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > > real 0m41.412s > user 0m28.777s > sys 2m16.510s > > With patch: > > num workprocesses = 16 > num fseek/fread = 8000000 > step = 256 > fork 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 > > real 0m11.479s > user 0m27.629s > sys 0m21.040s > > Signed-off-by: Nikolay Borisov <nborisov@suse.com> > --- > fs/btrfs/file.c | 26 ++++++++++---------------- > 1 file changed, 10 insertions(+), 16 deletions(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 12688ae6e6f2..000b7bd89bf0 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -3347,13 +3347,14 @@ static int find_desired_extent(struct inode *inode, loff_t *offset, int whence) > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > struct extent_map *em = NULL; > struct extent_state *cached_state = NULL; > + loff_t i_size = inode->i_size; We don't actually need to do all this now that we're holding the inode_lock right? Also I've gone through and looked at stuff and we're good with just a shared lock here, the only thing that adjusts i_size outsize of the extent lock is truncate, so we're safe. Thanks, Josef
On 27.09.19 г. 20:10 ч., Josef Bacik wrote: > On Fri, Sep 27, 2019 at 01:23:16PM +0300, Nikolay Borisov wrote: >> Modifying the file position is done on a per-file basis. This renders >> holding the inode lock for writing useless and makes the performance of >> concurrent llseek's abysmal. >> >> Fix this by holding the inode for read. This provides protection against >> concurrent truncates and find_desired_extent already includes proper >> extent locking for the range which ensures proper locking against >> concurrent writes. SEEK_CUR and SEEK_END can be done lockessly. >> The former is synchronized by file::f_lock spinlock. SEEK_END is not >> synchronized but atomic, but that's OK since there is not guarantee >> that SEEK_END will always be at the end of the file in the face of >> tail modifications. >> >> This change brings ~82% performance improvement when doing a lot of >> parallel fseeks. The workload essentially does: >> >> for (d=0; d<num_seek_read; d++) >> { >> /* offset %= 16777216; */ >> fseek (f, 256 * d % 16777216, SEEK_SET); >> fread (buffer, 64, 1, f); >> } >> >> Without patch: >> >> num workprocesses = 16 >> num fseek/fread = 8000000 >> step = 256 >> fork 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >> >> real 0m41.412s >> user 0m28.777s >> sys 2m16.510s >> >> With patch: >> >> num workprocesses = 16 >> num fseek/fread = 8000000 >> step = 256 >> fork 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >> >> real 0m11.479s >> user 0m27.629s >> sys 0m21.040s >> >> Signed-off-by: Nikolay Borisov <nborisov@suse.com> >> --- >> fs/btrfs/file.c | 26 ++++++++++---------------- >> 1 file changed, 10 insertions(+), 16 deletions(-) >> >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> index 12688ae6e6f2..000b7bd89bf0 100644 >> --- a/fs/btrfs/file.c >> +++ b/fs/btrfs/file.c >> @@ -3347,13 +3347,14 @@ static int find_desired_extent(struct inode *inode, loff_t *offset, int whence) >> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >> struct extent_map *em = NULL; >> struct extent_state *cached_state = NULL; >> + loff_t i_size = inode->i_size; > > We don't actually need to do all this now that we're holding the inode_lock > right? Also I've gone through and looked at stuff and we're good with just a > shared lock here, the only thing that adjusts i_size outsize of the extent lock > is truncate, so we're safe. Thanks, Yeah, holding the shared inode lock means we can just do inode->i_size but dunno if the multiple dereferences gets optimised. Though at this point we are entering into microoptimisation territory. For the sake of completeness I will check on monday what's the difference in assembly and if there is none I'll revert the code back to accessing inode->i_size. > > Josef >
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 12688ae6e6f2..000b7bd89bf0 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -3347,13 +3347,14 @@ static int find_desired_extent(struct inode *inode, loff_t *offset, int whence) struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_map *em = NULL; struct extent_state *cached_state = NULL; + loff_t i_size = inode->i_size; u64 lockstart; u64 lockend; u64 start; u64 len; int ret = 0; - if (inode->i_size == 0) + if (i_size == 0 || *offset >= i_size) return -ENXIO; /* @@ -3363,8 +3364,7 @@ static int find_desired_extent(struct inode *inode, loff_t *offset, int whence) start = max_t(loff_t, 0, *offset); lockstart = round_down(start, fs_info->sectorsize); - lockend = round_up(i_size_read(inode), - fs_info->sectorsize); + lockend = round_up(i_size, fs_info->sectorsize); if (lockend <= lockstart) lockend = lockstart + fs_info->sectorsize; lockend--; @@ -3373,7 +3373,7 @@ static int find_desired_extent(struct inode *inode, loff_t *offset, int whence) lock_extent_bits(&BTRFS_I(inode)->io_tree, lockstart, lockend, &cached_state); - while (start < inode->i_size) { + while (start < i_size) { em = btrfs_get_extent_fiemap(BTRFS_I(inode), start, len); if (IS_ERR(em)) { ret = PTR_ERR(em); @@ -3397,10 +3397,10 @@ static int find_desired_extent(struct inode *inode, loff_t *offset, int whence) } free_extent_map(em); if (!ret) { - if (whence == SEEK_DATA && start >= inode->i_size) + if (whence == SEEK_DATA && start >= i_size) ret = -ENXIO; else - *offset = min_t(loff_t, start, inode->i_size); + *offset = min_t(loff_t, start, i_size); } unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart, lockend, &cached_state); @@ -3412,7 +3412,6 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence) struct inode *inode = file->f_mapping->host; int ret; - inode_lock(inode); switch (whence) { case SEEK_END: case SEEK_CUR: @@ -3420,21 +3419,16 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence) goto out; case SEEK_DATA: case SEEK_HOLE: - if (offset >= i_size_read(inode)) { - inode_unlock(inode); - return -ENXIO; - } - + inode_lock_shared(inode); ret = find_desired_extent(inode, &offset, whence); - if (ret) { - inode_unlock(inode); + inode_unlock_shared(inode); + + if (ret) return ret; - } } offset = vfs_setpos(file, offset, inode->i_sb->s_maxbytes); out: - inode_unlock(inode); return offset; }
Modifying the file position is done on a per-file basis. This renders holding the inode lock for writing useless and makes the performance of concurrent llseek's abysmal. Fix this by holding the inode for read. This provides protection against concurrent truncates and find_desired_extent already includes proper extent locking for the range which ensures proper locking against concurrent writes. SEEK_CUR and SEEK_END can be done lockessly. The former is synchronized by file::f_lock spinlock. SEEK_END is not synchronized but atomic, but that's OK since there is not guarantee that SEEK_END will always be at the end of the file in the face of tail modifications. This change brings ~82% performance improvement when doing a lot of parallel fseeks. The workload essentially does: for (d=0; d<num_seek_read; d++) { /* offset %= 16777216; */ fseek (f, 256 * d % 16777216, SEEK_SET); fread (buffer, 64, 1, f); } Without patch: num workprocesses = 16 num fseek/fread = 8000000 step = 256 fork 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 real 0m41.412s user 0m28.777s sys 2m16.510s With patch: num workprocesses = 16 num fseek/fread = 8000000 step = 256 fork 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 real 0m11.479s user 0m27.629s sys 0m21.040s Signed-off-by: Nikolay Borisov <nborisov@suse.com> --- fs/btrfs/file.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-)