diff mbox series

[14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size

Message ID 20200908075230.86856-15-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: add read-only support for subpage sector size | expand

Commit Message

Qu Wenruo Sept. 8, 2020, 7:52 a.m. UTC
Currently btrfs_readpage_end_io_hook() just pass the whole page to
check_data_csum(), which is fine since we only support sectorsize ==
PAGE_SIZE.

To support subpage RO support, we need to properly honor per-sector
checksum verification, just like what we did in dio read path.

This patch will do the csum verification in a for loop, starts with
pg_off == start - page_offset(page), with sectorsize increasement for
each loop.

For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
will only finish with just one loop.

For subpage, we do the proper loop.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Goldwyn Rodrigues Sept. 9, 2020, 5:34 p.m. UTC | #1
On 15:52 08/09, Qu Wenruo wrote:
> Currently btrfs_readpage_end_io_hook() just pass the whole page to
> check_data_csum(), which is fine since we only support sectorsize ==
> PAGE_SIZE.
> 
> To support subpage RO support, we need to properly honor per-sector
> checksum verification, just like what we did in dio read path.
> 
> This patch will do the csum verification in a for loop, starts with
> pg_off == start - page_offset(page), with sectorsize increasement for
> each loop.
> 
> For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
> will only finish with just one loop.
> 
> For subpage, we do the proper loop.
> 
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/inode.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 078735aa0f68..8bd14dda2067 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2851,9 +2851,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>  				      u64 start, u64 end, int mirror)
>  {
>  	size_t offset = start - page_offset(page);
> +	size_t pg_off;
>  	struct inode *inode = page->mapping->host;
>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
> +	u32 sectorsize = root->fs_info->sectorsize;
> +	bool found_err = false;
>  
>  	if (PageChecked(page)) {
>  		ClearPageChecked(page);
> @@ -2870,7 +2873,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>  	}
>  
>  	phy_offset >>= inode->i_sb->s_blocksize_bits;
> -	return check_data_csum(inode, io_bio, phy_offset, page, offset);
> +	for (pg_off = offset; pg_off < end - page_offset(page);
> +	     pg_off += sectorsize, phy_offset++) {
> +		int ret;
> +
> +		ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
> +		if (ret < 0)
> +			found_err = true;
> +	}
> +	if (found_err)
> +		return -EIO;
> +	return 0;
>  }

We don't need found_err here. Just return ret when you encounter the
first error.
Qu Wenruo Sept. 10, 2020, 12:05 a.m. UTC | #2
On 2020/9/10 上午1:34, Goldwyn Rodrigues wrote:
> On 15:52 08/09, Qu Wenruo wrote:
>> Currently btrfs_readpage_end_io_hook() just pass the whole page to
>> check_data_csum(), which is fine since we only support sectorsize ==
>> PAGE_SIZE.
>>
>> To support subpage RO support, we need to properly honor per-sector
>> checksum verification, just like what we did in dio read path.
>>
>> This patch will do the csum verification in a for loop, starts with
>> pg_off == start - page_offset(page), with sectorsize increasement for
>> each loop.
>>
>> For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
>> will only finish with just one loop.
>>
>> For subpage, we do the proper loop.
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/inode.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 078735aa0f68..8bd14dda2067 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -2851,9 +2851,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>>  				      u64 start, u64 end, int mirror)
>>  {
>>  	size_t offset = start - page_offset(page);
>> +	size_t pg_off;
>>  	struct inode *inode = page->mapping->host;
>>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>> +	u32 sectorsize = root->fs_info->sectorsize;
>> +	bool found_err = false;
>>  
>>  	if (PageChecked(page)) {
>>  		ClearPageChecked(page);
>> @@ -2870,7 +2873,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>>  	}
>>  
>>  	phy_offset >>= inode->i_sb->s_blocksize_bits;
>> -	return check_data_csum(inode, io_bio, phy_offset, page, offset);
>> +	for (pg_off = offset; pg_off < end - page_offset(page);
>> +	     pg_off += sectorsize, phy_offset++) {
>> +		int ret;
>> +
>> +		ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
>> +		if (ret < 0)
>> +			found_err = true;
>> +	}
>> +	if (found_err)
>> +		return -EIO;
>> +	return 0;
>>  }
> 
> We don't need found_err here. Just return ret when you encounter the
> first error.
> 
But that means, the whole range will be marked error, while some sectors
may still contain good data and pass the csum checking.

Thanks,
Qu
Goldwyn Rodrigues Sept. 10, 2020, 2:26 p.m. UTC | #3
On  8:05 10/09, Qu Wenruo wrote:
> 
> 
> On 2020/9/10 上午1:34, Goldwyn Rodrigues wrote:
> > On 15:52 08/09, Qu Wenruo wrote:
> >> Currently btrfs_readpage_end_io_hook() just pass the whole page to
> >> check_data_csum(), which is fine since we only support sectorsize ==
> >> PAGE_SIZE.
> >>
> >> To support subpage RO support, we need to properly honor per-sector
> >> checksum verification, just like what we did in dio read path.
> >>
> >> This patch will do the csum verification in a for loop, starts with
> >> pg_off == start - page_offset(page), with sectorsize increasement for
> >> each loop.
> >>
> >> For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
> >> will only finish with just one loop.
> >>
> >> For subpage, we do the proper loop.
> >>
> >> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>  fs/btrfs/inode.c | 15 ++++++++++++++-
> >>  1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 078735aa0f68..8bd14dda2067 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -2851,9 +2851,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> >>  				      u64 start, u64 end, int mirror)
> >>  {
> >>  	size_t offset = start - page_offset(page);
> >> +	size_t pg_off;
> >>  	struct inode *inode = page->mapping->host;
> >>  	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> >>  	struct btrfs_root *root = BTRFS_I(inode)->root;
> >> +	u32 sectorsize = root->fs_info->sectorsize;
> >> +	bool found_err = false;
> >>  
> >>  	if (PageChecked(page)) {
> >>  		ClearPageChecked(page);
> >> @@ -2870,7 +2873,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> >>  	}
> >>  
> >>  	phy_offset >>= inode->i_sb->s_blocksize_bits;
> >> -	return check_data_csum(inode, io_bio, phy_offset, page, offset);
> >> +	for (pg_off = offset; pg_off < end - page_offset(page);
> >> +	     pg_off += sectorsize, phy_offset++) {
> >> +		int ret;
> >> +
> >> +		ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
> >> +		if (ret < 0)
> >> +			found_err = true;
> >> +	}
> >> +	if (found_err)
> >> +		return -EIO;
> >> +	return 0;
> >>  }
> > 
> > We don't need found_err here. Just return ret when you encounter the
> > first error.
> > 
> But that means, the whole range will be marked error, while some sectors
> may still contain good data and pass the csum checking.
> 

It would have made sense if you are storing block-by-block value of the
validity of the page so it may be referenced later. The function is only
checking if the data read in the page is correct or not, whether a part
of the data is correct is of no consequence to the caller. The earlier it
returns on an error, the better.. rather than checking the whole range
just to return the same -EIO.
diff mbox series

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 078735aa0f68..8bd14dda2067 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2851,9 +2851,12 @@  static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 				      u64 start, u64 end, int mirror)
 {
 	size_t offset = start - page_offset(page);
+	size_t pg_off;
 	struct inode *inode = page->mapping->host;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	u32 sectorsize = root->fs_info->sectorsize;
+	bool found_err = false;
 
 	if (PageChecked(page)) {
 		ClearPageChecked(page);
@@ -2870,7 +2873,17 @@  static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
 	}
 
 	phy_offset >>= inode->i_sb->s_blocksize_bits;
-	return check_data_csum(inode, io_bio, phy_offset, page, offset);
+	for (pg_off = offset; pg_off < end - page_offset(page);
+	     pg_off += sectorsize, phy_offset++) {
+		int ret;
+
+		ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
+		if (ret < 0)
+			found_err = true;
+	}
+	if (found_err)
+		return -EIO;
+	return 0;
 }
 
 /*