diff mbox series

[2/2] btrfs: use more straightforward disk_bytenr to replace icsum for check_data_csum()

Message ID 20201109115410.605880-3-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: paramater refactors for data and metadata endio call backs | expand

Commit Message

Qu Wenruo Nov. 9, 2020, 11:54 a.m. UTC
Parameter @icsum for check_data_csum() is a little hard to understand.
It is the offset in sectors compared to io_bio->logical.

Instead of using the calculated value, let's go with disk_bytenr, as the
new name is not only straightforward,  but also utilized in a lot of
existing code for file items.

To get the old @icsum value, we simply use
(disk_bytenr - (io_bio->bio.bi_iter.bi_sector << 9)) >>
fs_info->sectorsize_bits;

This patch would separate file offset with disk_bytenr completely, to
reduce the confusion.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 14 ++++++++------
 fs/btrfs/inode.c     | 35 ++++++++++++++++++++++++++---------
 2 files changed, 34 insertions(+), 15 deletions(-)

Comments

Nikolay Borisov Nov. 9, 2020, 12:19 p.m. UTC | #1
On 9.11.20 г. 13:54 ч., Qu Wenruo wrote:
> Parameter @icsum for check_data_csum() is a little hard to understand.
> It is the offset in sectors compared to io_bio->logical.

This second sentence is confusing because io_bio->logical is used for repair/dio bios and not buffered whilst  icsum is calculated independently of io_bio->logical so I'd suggest you remove it. 
> 
> Instead of using the calculated value, let's go with disk_bytenr, as the
> new name is not only straightforward,  but also utilized in a lot of
> existing code for file items.

Just say that instead of passing in the calculated offset couple of levels deep you modify the code to instead pass disk_bytenr of currently processed biovec and use that to calculate the offset closer to actual users of it. Kind of like what I did below. 

> 
> To get the old @icsum value, we simply use
> (disk_bytenr - (io_bio->bio.bi_iter.bi_sector << 9)) >>
> fs_info->sectorsize_bits;
> 
> This patch would separate file offset with disk_bytenr completely, to
> reduce the confusion.

I find this description somewhat confusing, what you are doing is just moving the sector offset calculation closer to where it's being used, rather than calculating it in the top level endio handler and passing it several levels down to where it's actually used - in the csum verification function. So where is file offset involved?

Otherwise the code LGTM apart from some minor nits below. 

> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 14 ++++++++------
>  fs/btrfs/inode.c     | 35 ++++++++++++++++++++++++++---------
>  2 files changed, 34 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index bd5a22bfee68..f8b5d3d4e5b0 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2878,7 +2878,7 @@ static void end_bio_extent_readpage(struct bio *bio)
>  	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
>  	struct extent_io_tree *tree, *failure_tree;
>  	struct processed_extent processed = { 0 };
> -	u64 offset = 0;
> +	u64 disk_bytenr = (bio->bi_iter.bi_sector << 9);

needless parentheses.

>  	u64 start;
>  	u64 end;
>  	u64 len;

<snip>

> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index c54e0ed0b938..eff987931f0d 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2843,19 +2843,27 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
>   * The length of such check is always one sector size.
>   */

It's not evident from the hunk but you should also modify the parameter description of this function since we no longer have 'icsum'

>  static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
> -			   int icsum, struct page *page, int pgoff)
> +			   u64 disk_bytenr, struct page *page, int pgoff)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>  	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>  	char *kaddr;
>  	u32 len = fs_info->sectorsize;
>  	const u32 csum_size = fs_info->csum_size;
> +	u64 bio_disk_bytenr = (io_bio->bio.bi_iter.bi_sector << 9);

Again, extra parentheses, they don't bring anything in this particular expression. 

> +	int offset_sectors;
>  	u8 *csum_expected;
>  	u8 csum[BTRFS_CSUM_SIZE];
>  
>  	ASSERT(pgoff + len <= PAGE_SIZE);
>  
> -	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
> +	/* Our disk_bytenr should be inside the io_bio */
> +	ASSERT(bio_disk_bytenr <= disk_bytenr &&
> +	       disk_bytenr < bio_disk_bytenr + io_bio->bio.bi_iter.bi_size);

nit: in_range(disk_bytenr, bio_disk_bytenr, io_bio->bio.bi_iter.bi_size); 

IMO the assert is redundant since it's obvious disk_bytenr will always be within range, but perhahps it's needed for your future subpage work so I'm not going to insist on removing it. 

> +
> +	offset_sectors = (disk_bytenr - bio_disk_bytenr) >>
> +			 fs_info->sectorsize_bits;
> +	csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size;
>  
>  	kaddr = kmap_atomic(page);
>  	shash->tfm = fs_info->csum_shash;
> @@ -2883,8 +2891,13 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,

<snip>
Qu Wenruo Nov. 9, 2020, 12:34 p.m. UTC | #2
On 2020/11/9 下午8:19, Nikolay Borisov wrote:
>
>
> On 9.11.20 г. 13:54 ч., Qu Wenruo wrote:
>> Parameter @icsum for check_data_csum() is a little hard to understand.
>> It is the offset in sectors compared to io_bio->logical.
>
> This second sentence is confusing because io_bio->logical is used for repair/dio bios and not buffered whilst  icsum is calculated independently of io_bio->logical so I'd suggest you remove it.

Right, I also find the io_bio::logical inconsistency.
What I want to say is (io_bio->bio.bi_iter.bi_sector << 9).
>>
>> Instead of using the calculated value, let's go with disk_bytenr, as the
>> new name is not only straightforward,  but also utilized in a lot of
>> existing code for file items.
>
> Just say that instead of passing in the calculated offset couple of levels deep you modify the code to instead pass disk_bytenr of currently processed biovec and use that to calculate the offset closer to actual users of it. Kind of like what I did below.
>
>>
>> To get the old @icsum value, we simply use
>> (disk_bytenr - (io_bio->bio.bi_iter.bi_sector << 9)) >>
>> fs_info->sectorsize_bits;
>>
>> This patch would separate file offset with disk_bytenr completely, to
>> reduce the confusion.
>
> I find this description somewhat confusing, what you are doing is just moving the sector offset calculation closer to where it's being used, rather than calculating it in the top level endio handler and passing it several levels down to where it's actually used - in the csum verification function. So where is file offset involved?

Well, you see the parameter @start and @end of btrfs_verify_data_csum()
right?

That's why we want to distinguish the file offset from on-disk bytenr.

>
> Otherwise the code LGTM apart from some minor nits below.
>
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent_io.c | 14 ++++++++------
>>  fs/btrfs/inode.c     | 35 ++++++++++++++++++++++++++---------
>>  2 files changed, 34 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index bd5a22bfee68..f8b5d3d4e5b0 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2878,7 +2878,7 @@ static void end_bio_extent_readpage(struct bio *bio)
>>  	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
>>  	struct extent_io_tree *tree, *failure_tree;
>>  	struct processed_extent processed = { 0 };
>> -	u64 offset = 0;
>> +	u64 disk_bytenr = (bio->bi_iter.bi_sector << 9);
>
> needless parentheses.
>
>>  	u64 start;
>>  	u64 end;
>>  	u64 len;
>
> <snip>
>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index c54e0ed0b938..eff987931f0d 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -2843,19 +2843,27 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
>>   * The length of such check is always one sector size.
>>   */
>
> It's not evident from the hunk but you should also modify the parameter description of this function since we no longer have 'icsum'
>
>>  static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
>> -			   int icsum, struct page *page, int pgoff)
>> +			   u64 disk_bytenr, struct page *page, int pgoff)
>>  {
>>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>  	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>>  	char *kaddr;
>>  	u32 len = fs_info->sectorsize;
>>  	const u32 csum_size = fs_info->csum_size;
>> +	u64 bio_disk_bytenr = (io_bio->bio.bi_iter.bi_sector << 9);
>
> Again, extra parentheses, they don't bring anything in this particular expression.
>
>> +	int offset_sectors;
>>  	u8 *csum_expected;
>>  	u8 csum[BTRFS_CSUM_SIZE];
>>
>>  	ASSERT(pgoff + len <= PAGE_SIZE);
>>
>> -	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
>> +	/* Our disk_bytenr should be inside the io_bio */
>> +	ASSERT(bio_disk_bytenr <= disk_bytenr &&
>> +	       disk_bytenr < bio_disk_bytenr + io_bio->bio.bi_iter.bi_size);
>
> nit: in_range(disk_bytenr, bio_disk_bytenr, io_bio->bio.bi_iter.bi_size);
>
> IMO the assert is redundant since it's obvious disk_bytenr will always be within range, but perhahps it's needed for your future subpage work so I'm not going to insist on removing it.

The "bio_disk_bytenr <= disk_bytenr" is obvious and I'm fine to remove
it, but the later "disk_bytenr < bio_disk_bytenr +
io_bio->bio.bi_iter.bi_size" is a completely valid check.

This is even more important since we use disk_bytenr to calculate where
the expected csum is.
If disk_bytenr goes beyond bio range, it will cause memory access out of
boudary.

In fact, this assert() has already caught cases in my later patches
where I forgot bv_len can go beyond current page.

Thanks,
Qu
>
>> +
>> +	offset_sectors = (disk_bytenr - bio_disk_bytenr) >>
>> +			 fs_info->sectorsize_bits;
>> +	csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size;
>>
>>  	kaddr = kmap_atomic(page);
>>  	shash->tfm = fs_info->csum_shash;
>> @@ -2883,8 +2891,13 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
>
> <snip>
>
diff mbox series

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bd5a22bfee68..f8b5d3d4e5b0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2878,7 +2878,7 @@  static void end_bio_extent_readpage(struct bio *bio)
 	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
 	struct extent_io_tree *tree, *failure_tree;
 	struct processed_extent processed = { 0 };
-	u64 offset = 0;
+	u64 disk_bytenr = (bio->bi_iter.bi_sector << 9);
 	u64 start;
 	u64 end;
 	u64 len;
@@ -2924,8 +2924,9 @@  static void end_bio_extent_readpage(struct bio *bio)
 		mirror = io_bio->mirror_num;
 		if (likely(uptodate)) {
 			if (is_data_inode(inode))
-				ret = btrfs_verify_data_csum(io_bio, offset, page,
-							     start, end, mirror);
+				ret = btrfs_verify_data_csum(io_bio,
+						disk_bytenr, page, start, end,
+						mirror);
 			else
 				ret = btrfs_validate_metadata_buffer(io_bio,
 					page, start, end, mirror);
@@ -2953,12 +2954,13 @@  static void end_bio_extent_readpage(struct bio *bio)
 			 * If it can't handle the error it will return -EIO and
 			 * we remain responsible for that page.
 			 */
-			if (!btrfs_submit_read_repair(inode, bio, offset, page,
+			if (!btrfs_submit_read_repair(inode, bio, disk_bytenr,
+						page,
 						start - page_offset(page),
 						start, end, mirror,
 						btrfs_submit_data_bio)) {
 				uptodate = !bio->bi_status;
-				offset += len;
+				disk_bytenr += len;
 				continue;
 			}
 		} else {
@@ -2983,7 +2985,7 @@  static void end_bio_extent_readpage(struct bio *bio)
 			if (page->index == end_index && off)
 				zero_user_segment(page, off, PAGE_SIZE);
 		}
-		offset += len;
+		disk_bytenr += len;
 
 		endio_readpage_update_page_status(page, uptodate);
 		endio_readpage_release_extent(&processed, BTRFS_I(inode),
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c54e0ed0b938..eff987931f0d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2843,19 +2843,27 @@  void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
  * The length of such check is always one sector size.
  */
 static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
-			   int icsum, struct page *page, int pgoff)
+			   u64 disk_bytenr, struct page *page, int pgoff)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	char *kaddr;
 	u32 len = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
+	u64 bio_disk_bytenr = (io_bio->bio.bi_iter.bi_sector << 9);
+	int offset_sectors;
 	u8 *csum_expected;
 	u8 csum[BTRFS_CSUM_SIZE];
 
 	ASSERT(pgoff + len <= PAGE_SIZE);
 
-	csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
+	/* Our disk_bytenr should be inside the io_bio */
+	ASSERT(bio_disk_bytenr <= disk_bytenr &&
+	       disk_bytenr < bio_disk_bytenr + io_bio->bio.bi_iter.bi_size);
+
+	offset_sectors = (disk_bytenr - bio_disk_bytenr) >>
+			 fs_info->sectorsize_bits;
+	csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size;
 
 	kaddr = kmap_atomic(page);
 	shash->tfm = fs_info->csum_shash;
@@ -2883,8 +2891,13 @@  static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
  * when reads are done, we need to check csums to verify the data is correct
  * if there's a match, we allow the bio to finish.  If not, the code in
  * extent_io.c will try to find good copies for us.
+ *
+ * @disk_bytenr: The on-disk bytenr of the range start
+ * @start:	 The file offset of the range start
+ * @end:	 The file offset of the range end (inclusive)
+ * @mirror:	 The mirror number
  */
-int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset,
+int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 disk_bytenr,
 			   struct page *page, u64 start, u64 end, int mirror)
 {
 	size_t offset = start - page_offset(page);
@@ -2909,8 +2922,7 @@  int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset,
 		return 0;
 	}
 
-	phy_offset >>= root->fs_info->sectorsize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset);
+	return check_data_csum(inode, io_bio, disk_bytenr, page, offset);
 }
 
 /*
@@ -7616,7 +7628,12 @@  static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 	u64 start = io_bio->logical;
-	int icsum = 0;
+	/*
+	 * Dio io_bio uses file offset other than disk bytenr for
+	 * io_bio->logical.
+	 * Thus we need to grab disk_bytenr from bio.
+	 */
+	u64 disk_bytenr = (io_bio->bio.bi_iter.bi_sector << 9);
 	blk_status_t err = BLK_STS_OK;
 
 	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
@@ -7627,8 +7644,8 @@  static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 		for (i = 0; i < nr_sectors; i++) {
 			ASSERT(pgoff < PAGE_SIZE);
 			if (uptodate &&
-			    (!csum || !check_data_csum(inode, io_bio, icsum,
-						       bvec.bv_page, pgoff))) {
+			    (!csum || !check_data_csum(inode, io_bio,
+					disk_bytenr, bvec.bv_page, pgoff))) {
 				clean_io_failure(fs_info, failure_tree, io_tree,
 						 start, bvec.bv_page,
 						 btrfs_ino(BTRFS_I(inode)),
@@ -7648,7 +7665,7 @@  static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
 					err = status;
 			}
 			start += sectorsize;
-			icsum++;
+			disk_bytenr += sectorsize;
 			pgoff += sectorsize;
 		}
 	}