diff mbox series

[04/34] btrfs: remove the direct I/O read checksum lookup optimization

Message ID 20230121065031.1139353-5-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/34] block: export bio_split_rw | expand

Commit Message

Christoph Hellwig Jan. 21, 2023, 6:50 a.m. UTC
To prepare for pending changes drop the optimization to only look up
csums once per bio that is submitted from the iomap layer.  In the
short run this does cause additional lookups for fragmented direct
reads, but later in the series, the bio based lookup will be used on
the entire bio submitted from iomap, restoring the old behavior
in common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 32 +++++---------------------------
 1 file changed, 5 insertions(+), 27 deletions(-)

Comments

Johannes Thumshirn Jan. 23, 2023, 3:59 p.m. UTC | #1
On 21.01.23 07:50, Christoph Hellwig wrote:
> To prepare for pending changes drop the optimization to only look up
> csums once per bio that is submitted from the iomap layer.  In the
> short run this does cause additional lookups for fragmented direct
> reads, but later in the series, the bio based lookup will be used on
> the entire bio submitted from iomap, restoring the old behavior
> in common code.

I was wondering how that would impact performance, when we do a csum
tree lookup on every DIO read until I realized you already accounted
for that in the commit message.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Anand Jain Jan. 24, 2023, 2:55 p.m. UTC | #2
On 21/01/2023 14:50, Christoph Hellwig wrote:
> To prepare for pending changes drop the optimization to only look up
> csums once per bio that is submitted from the iomap layer.  In the
> short run this does cause additional lookups for fragmented direct
> reads, but later in the series, the bio based lookup will be used on
> the entire bio submitted from iomap, restoring the old behavior
> in common code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Anand Jain <anand.jain@oracle.com>


I was curious about the commit message.
I ran fio to test the performance before and after the change.
The results were similar.

fio --group_reporting=1 --directory /mnt/test --name dioread --direct=1 
--size=1g --rw=read  --runtime=60 --iodepth=1024 --nrfiles=16 --numjobs=16

before this patch
READ: bw=8208KiB/s (8405kB/s), 8208KiB/s-8208KiB/s (8405kB/s-8405kB/s), 
io=481MiB (504MB), run=60017-60017msec

after this patch
READ: bw=8353KiB/s (8554kB/s), 8353KiB/s-8353KiB/s (8554kB/s-8554kB/s), 
io=490MiB (513MB), run=60013-60013msec
Christoph Hellwig Jan. 24, 2023, 7:55 p.m. UTC | #3
On Tue, Jan 24, 2023 at 10:55:25PM +0800, Anand Jain wrote:
>
> I was curious about the commit message.
> I ran fio to test the performance before and after the change.
> The results were similar.
>
> fio --group_reporting=1 --directory /mnt/test --name dioread --direct=1 
> --size=1g --rw=read  --runtime=60 --iodepth=1024 --nrfiles=16 --numjobs=16
>
> before this patch
> READ: bw=8208KiB/s (8405kB/s), 8208KiB/s-8208KiB/s (8405kB/s-8405kB/s), 
> io=481MiB (504MB), run=60017-60017msec
>
> after this patch
> READ: bw=8353KiB/s (8554kB/s), 8353KiB/s-8353KiB/s (8554kB/s-8554kB/s), 
> io=490MiB (513MB), run=60013-60013msec

That's 4k reads.  The will benefit from the inline csum array in the
btrfs_bio, but won't benefit from the existing batching, so this is
kind of expected.

The good news is that the final series will still use the inline
csum array for small reads, while also only doing a single csum tree
lookup for larger reads, so you'll get the best of both worlds.
Anand Jain Jan. 25, 2023, 7:42 a.m. UTC | #4
On 25/01/2023 03:55, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 10:55:25PM +0800, Anand Jain wrote:
>>
>> I was curious about the commit message.
>> I ran fio to test the performance before and after the change.
>> The results were similar.
>>
>> fio --group_reporting=1 --directory /mnt/test --name dioread --direct=1
>> --size=1g --rw=read  --runtime=60 --iodepth=1024 --nrfiles=16 --numjobs=16
>>
>> before this patch
>> READ: bw=8208KiB/s (8405kB/s), 8208KiB/s-8208KiB/s (8405kB/s-8405kB/s),
>> io=481MiB (504MB), run=60017-60017msec
>>
>> after this patch
>> READ: bw=8353KiB/s (8554kB/s), 8353KiB/s-8353KiB/s (8554kB/s-8554kB/s),
>> io=490MiB (513MB), run=60013-60013msec
> 
> That's 4k reads.  The will benefit from the inline csum array in the
> btrfs_bio, but won't benefit from the existing batching, so this is
> kind of expected.
> 
> The good news is that the final series will still use the inline
> csum array for small reads, while also only doing a single csum tree
> lookup for larger reads, so you'll get the best of both worlds.
> 

Ok. Got this results for the whole series from an aarch64 
(pagesize=64k); Results finds little improvement/same.


Before:
Last commit:
b3b1ba7b8c0d btrfs: skip backref walking during fiemap if we know the 
leaf is shared

---- mkfs.btrfs /dev/vdb ..... :0 ----
---- mount -o max_inline=0 /dev/vdb /btrfs ..... :0 ----
---- fio --group_reporting=1 --directory /btrfs --name dioread 
--direct=1 --size=1g --rw=read --runtime=60 --iodepth=1024 --nrfiles=16 
--numjobs=16 | egrep "fio|READ" ..... :0 ----
fio-3.19
    READ: bw=6052MiB/s (6346MB/s), 6052MiB/s-6052MiB/s 
(6346MB/s-6346MB/s), io=16.0GiB (17.2GB), run=2707-2707msec

---- mkfs.btrfs /dev/vdb ..... :0 ----
---- mount -o max_inline=64K /dev/vdb /btrfs ..... :0 ----
---- fio --group_reporting=1 --directory /btrfs --name dioread 
--direct=1 --size=1g --rw=read --runtime=60 --iodepth=1024 --nrfiles=16 
--numjobs=16 | egrep "fio|READ" ..... :0 ----
fio-3.19
    READ: bw=6139MiB/s (6437MB/s), 6139MiB/s-6139MiB/s 
(6437MB/s-6437MB/s), io=16.0GiB (17.2GB), run=2669-2669msec


After:
last commit:
b488ab9aed15 iomap: remove IOMAP_F_ZONE_APPEND

---- mkfs.btrfs /dev/vdb ..... :0 ----
---- mount -o max_inline=0 /dev/vdb /btrfs ..... :0 ----
---- fio --group_reporting=1 --directory /btrfs --name dioread 
--direct=1 --size=1g --rw=read --runtime=60 --iodepth=1024 --nrfiles=16 
--numjobs=16 | egrep "fio|READ" ..... :0 ----
fio-3.19
    READ: bw=6100MiB/s (6396MB/s), 6100MiB/s-6100MiB/s 
(6396MB/s-6396MB/s), io=16.0GiB (17.2GB), run=2686-2686msec

---- mkfs.btrfs /dev/vdb ..... :0 ----
---- mount /dev/vdb /btrfs ..... :0 ----
---- fio --group_reporting=1 --directory /btrfs --name dioread 
--direct=1 --size=1g --rw=read --runtime=60 --iodepth=1024 --nrfiles=16 
--numjobs=16 | egrep "fio|READ" ..... :0 ----
fio-3.19
    READ: bw=6157MiB/s (6456MB/s), 6157MiB/s-6157MiB/s 
(6456MB/s-6456MB/s), io=16.0GiB (17.2GB), run=2661-2661msec
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0a85e42f114cc5..863a5527853c66 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -100,9 +100,6 @@  struct btrfs_dio_private {
 	 */
 	refcount_t refs;
 
-	/* Array of checksums */
-	u8 *csums;
-
 	/* This must be last */
 	struct bio bio;
 };
@@ -7907,7 +7904,6 @@  static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
 			      dip->file_offset + dip->bytes - 1, NULL);
 	}
 
-	kfree(dip->csums);
 	bio_endio(&dip->bio);
 }
 
@@ -7990,7 +7986,6 @@  static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
 				 u64 file_offset, int async_submit)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_dio_private *dip = btrfs_bio(bio)->private;
 	blk_status_t ret;
 
 	/* Save the original iter for read repair */
@@ -8017,8 +8012,11 @@  static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
 			return;
 		}
 	} else {
-		btrfs_bio(bio)->csum = btrfs_csum_ptr(fs_info, dip->csums,
-						      file_offset - dip->file_offset);
+		ret = btrfs_lookup_bio_sums(&inode->vfs_inode, bio, NULL);
+		if (ret) {
+			btrfs_bio_end_io(btrfs_bio(bio), ret);
+			return;
+		}
 	}
 map:
 	btrfs_submit_bio(fs_info, bio, 0);
@@ -8030,7 +8028,6 @@  static void btrfs_submit_direct(const struct iomap_iter *iter,
 	struct btrfs_dio_private *dip =
 		container_of(dio_bio, struct btrfs_dio_private, bio);
 	struct inode *inode = iter->inode;
-	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
 			     BTRFS_BLOCK_GROUP_RAID56_MASK);
@@ -8051,25 +8048,6 @@  static void btrfs_submit_direct(const struct iomap_iter *iter,
 	dip->file_offset = file_offset;
 	dip->bytes = dio_bio->bi_iter.bi_size;
 	refcount_set(&dip->refs, 1);
-	dip->csums = NULL;
-
-	if (!write && !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
-		unsigned int nr_sectors =
-			(dio_bio->bi_iter.bi_size >> fs_info->sectorsize_bits);
-
-		/*
-		 * Load the csums up front to reduce csum tree searches and
-		 * contention when submitting bios.
-		 */
-		status = BLK_STS_RESOURCE;
-		dip->csums = kcalloc(nr_sectors, fs_info->csum_size, GFP_NOFS);
-		if (!dip->csums)
-			goto out_err;
-
-		status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums);
-		if (status != BLK_STS_OK)
-			goto out_err;
-	}
 
 	start_sector = dio_bio->bi_iter.bi_sector;
 	submit_len = dio_bio->bi_iter.bi_size;