Message ID | 1422960338-12187-1-git-send-email-bo.li.liu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Sorry, this is only btrfs concerned, I made a typo in email address. Thanks, -liubo On Tue, Feb 03, 2015 at 06:45:38PM +0800, Liu Bo wrote: > Commit 1ae399382512 ("Btrfs: do not use async submit for small DIO io's") > benefits small DIO io's. > > However, if we're owning the SINGLE profile, this also affects large DIO io's > since in that case, map_length is (chunk_length - bio's offset_in_chunk), > it's farily large so that it's very likely to be larger than a large bio's > size, which avoids asynchronous submit. > For instance, if we have a 512k bio, the efforts of calculating (512k/4k=128) > checksums will be taken by the DIO task. > > Test results with fio (tested on a hard disk, not tested on ssd, 4cpu, 8g memory) > > bs async sync async sync > bw bw(KB/S) iops iop > 4k 115312 115480 28827.6 28869.6 > 8k 114381 115586 14297.4 14447.6 > 16k 115393 116290 7211.4 7267.6 > 32k 114268 116589 3570.4 3643 > 64k 115421 113417 1803 1771.8 <-----ASYNC wins here > 128k 115545 112585 902 879 > 256k 115178 111521 449.2 435 > 512k 115874 111620 226 217.6 > > This adds a limit 'BTRFS_STRIPE_LEN(64k)' to decide if it's small enough to avoid > asynchronous submit. > > Still, in this case we don't need to split the bio and can submit it directly. > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com> > --- > The job in the test, > > [global] > rw=write > ioengine=libaio > direct=1 > iodepth=64 > iodepth_batch=64 > iodepth_batch_complete=64 > iodepth_low=64 > bs=4k > size=8g > sync=0 > group_reporting > fallocate=posix > invalidate=1 > runtime=30 > > [dio] > directory=/mnt/btrfs > filename=foobar > > fs/btrfs/inode.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index e687bb0..c640d7e 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -7792,6 +7792,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, > int nr_pages = 0; > int ret; > int async_submit = 0; > + u64 alloc_profile; > > map_length = orig_bio->bi_iter.bi_size; > ret = btrfs_map_block(root->fs_info, rw, start_sector << 9, > @@ -7799,15 +7800,26 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, > if (ret) > return -EIO; > > + alloc_profile = btrfs_get_alloc_profile(root, 1); > + > if (map_length >= orig_bio->bi_iter.bi_size) { > bio = orig_bio; > dip->flags |= BTRFS_DIO_ORIG_BIO_SUBMITTED; > + > + /* > + * In the case of 'single' profile, the above check is very > + * likely to be true as map_length is (chunk_length - offset), > + * so checking BTRFS_STRIPE_LEN here. > + */ > + if ((alloc_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0 && > + orig_bio->bi_iter.bi_size >= BTRFS_STRIPE_LEN) > + async_submit = 1; > + > goto submit; > } > > /* async crcs make it difficult to collect full stripe writes. */ > - if (btrfs_get_alloc_profile(root, 1) & > - (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) > + if (alloc_profile & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) > async_submit = 0; > else > async_submit = 1; > -- > 1.8.1.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e687bb0..c640d7e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7792,6 +7792,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, int nr_pages = 0; int ret; int async_submit = 0; + u64 alloc_profile; map_length = orig_bio->bi_iter.bi_size; ret = btrfs_map_block(root->fs_info, rw, start_sector << 9, @@ -7799,15 +7800,26 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, if (ret) return -EIO; + alloc_profile = btrfs_get_alloc_profile(root, 1); + if (map_length >= orig_bio->bi_iter.bi_size) { bio = orig_bio; dip->flags |= BTRFS_DIO_ORIG_BIO_SUBMITTED; + + /* + * In the case of 'single' profile, the above check is very + * likely to be true as map_length is (chunk_length - offset), + * so checking BTRFS_STRIPE_LEN here. + */ + if ((alloc_profile & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0 && + orig_bio->bi_iter.bi_size >= BTRFS_STRIPE_LEN) + async_submit = 1; + goto submit; } /* async crcs make it difficult to collect full stripe writes. */ - if (btrfs_get_alloc_profile(root, 1) & - (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) + if (alloc_profile & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6)) async_submit = 0; else async_submit = 1;
Commit 1ae399382512 ("Btrfs: do not use async submit for small DIO io's") benefits small DIO io's. However, if we're owning the SINGLE profile, this also affects large DIO io's since in that case, map_length is (chunk_length - bio's offset_in_chunk), it's farily large so that it's very likely to be larger than a large bio's size, which avoids asynchronous submit. For instance, if we have a 512k bio, the efforts of calculating (512k/4k=128) checksums will be taken by the DIO task. Test results with fio (tested on a hard disk, not tested on ssd, 4cpu, 8g memory) bs async sync async sync bw bw(KB/S) iops iop 4k 115312 115480 28827.6 28869.6 8k 114381 115586 14297.4 14447.6 16k 115393 116290 7211.4 7267.6 32k 114268 116589 3570.4 3643 64k 115421 113417 1803 1771.8 <-----ASYNC wins here 128k 115545 112585 902 879 256k 115178 111521 449.2 435 512k 115874 111620 226 217.6 This adds a limit 'BTRFS_STRIPE_LEN(64k)' to decide if it's small enough to avoid asynchronous submit. Still, in this case we don't need to split the bio and can submit it directly. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> --- The job in the test, [global] rw=write ioengine=libaio direct=1 iodepth=64 iodepth_batch=64 iodepth_batch_complete=64 iodepth_low=64 bs=4k size=8g sync=0 group_reporting fallocate=posix invalidate=1 runtime=30 [dio] directory=/mnt/btrfs filename=foobar fs/btrfs/inode.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)