Message ID | 20201109115410.605880-3-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: paramater refactors for data and metadata endio call backs | expand |
On 9.11.20 г. 13:54 ч., Qu Wenruo wrote: > Parameter @icsum for check_data_csum() is a little hard to understand. > It is the offset in sectors compared to io_bio->logical. This second sentence is confusing because io_bio->logical is used for repair/dio bios and not buffered whilst icsum is calculated independently of io_bio->logical so I'd suggest you remove it. > > Instead of using the calculated value, let's go with disk_bytenr, as the > new name is not only straightforward, but also utilized in a lot of > existing code for file items. Just say that instead of passing in the calculated offset couple of levels deep you modify the code to instead pass disk_bytenr of currently processed biovec and use that to calculate the offset closer to actual users of it. Kind of like what I did below. > > To get the old @icsum value, we simply use > (disk_bytenr - (io_bio->bio.bi_iter.bi_sector << 9)) >> > fs_info->sectorsize_bits; > > This patch would separate file offset with disk_bytenr completely, to > reduce the confusion. I find this description somewhat confusing, what you are doing is just moving the sector offset calculation closer to where it's being used, rather than calculating it in the top level endio handler and passing it several levels down to where it's actually used - in the csum verification function. So where is file offset involved? Otherwise the code LGTM apart from some minor nits below. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/extent_io.c | 14 ++++++++------ > fs/btrfs/inode.c | 35 ++++++++++++++++++++++++++--------- > 2 files changed, 34 insertions(+), 15 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index bd5a22bfee68..f8b5d3d4e5b0 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2878,7 +2878,7 @@ static void end_bio_extent_readpage(struct bio *bio) > struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); > struct extent_io_tree *tree, *failure_tree; > struct processed_extent processed = { 0 }; > - u64 offset = 0; > + u64 disk_bytenr = (bio->bi_iter.bi_sector << 9); needless parentheses. > u64 start; > u64 end; > u64 len; <snip> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index c54e0ed0b938..eff987931f0d 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -2843,19 +2843,27 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, > * The length of such check is always one sector size. > */ It's not evident from the hunk but you should also modify the parameter description of this function since we no longer have 'icsum' > static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, > - int icsum, struct page *page, int pgoff) > + u64 disk_bytenr, struct page *page, int pgoff) > { > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); > char *kaddr; > u32 len = fs_info->sectorsize; > const u32 csum_size = fs_info->csum_size; > + u64 bio_disk_bytenr = (io_bio->bio.bi_iter.bi_sector << 9); Again, extra parentheses, they don't bring anything in this particular expression. > + int offset_sectors; > u8 *csum_expected; > u8 csum[BTRFS_CSUM_SIZE]; > > ASSERT(pgoff + len <= PAGE_SIZE); > > - csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size; > + /* Our disk_bytenr should be inside the io_bio */ > + ASSERT(bio_disk_bytenr <= disk_bytenr && > + disk_bytenr < bio_disk_bytenr + io_bio->bio.bi_iter.bi_size); nit: in_range(disk_bytenr, bio_disk_bytenr, io_bio->bio.bi_iter.bi_size); IMO the assert is redundant since it's obvious disk_bytenr will always be within range, but perhahps it's needed for your future subpage work so I'm not going to insist on removing it. > + > + offset_sectors = (disk_bytenr - bio_disk_bytenr) >> > + fs_info->sectorsize_bits; > + csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size; > > kaddr = kmap_atomic(page); > shash->tfm = fs_info->csum_shash; > @@ -2883,8 +2891,13 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, <snip>
On 2020/11/9 下午8:19, Nikolay Borisov wrote: > > > On 9.11.20 г. 13:54 ч., Qu Wenruo wrote: >> Parameter @icsum for check_data_csum() is a little hard to understand. >> It is the offset in sectors compared to io_bio->logical. > > This second sentence is confusing because io_bio->logical is used for repair/dio bios and not buffered whilst icsum is calculated independently of io_bio->logical so I'd suggest you remove it. Right, I also find the io_bio::logical inconsistency. What I want to say is (io_bio->bio.bi_iter.bi_sector << 9). >> >> Instead of using the calculated value, let's go with disk_bytenr, as the >> new name is not only straightforward, but also utilized in a lot of >> existing code for file items. > > Just say that instead of passing in the calculated offset couple of levels deep you modify the code to instead pass disk_bytenr of currently processed biovec and use that to calculate the offset closer to actual users of it. Kind of like what I did below. > >> >> To get the old @icsum value, we simply use >> (disk_bytenr - (io_bio->bio.bi_iter.bi_sector << 9)) >> >> fs_info->sectorsize_bits; >> >> This patch would separate file offset with disk_bytenr completely, to >> reduce the confusion. > > I find this description somewhat confusing, what you are doing is just moving the sector offset calculation closer to where it's being used, rather than calculating it in the top level endio handler and passing it several levels down to where it's actually used - in the csum verification function. So where is file offset involved? Well, you see the parameter @start and @end of btrfs_verify_data_csum() right? That's why we want to distinguish the file offset from on-disk bytenr. > > Otherwise the code LGTM apart from some minor nits below. > >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/extent_io.c | 14 ++++++++------ >> fs/btrfs/inode.c | 35 ++++++++++++++++++++++++++--------- >> 2 files changed, 34 insertions(+), 15 deletions(-) >> >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index bd5a22bfee68..f8b5d3d4e5b0 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -2878,7 +2878,7 @@ static void end_bio_extent_readpage(struct bio *bio) >> struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); >> struct extent_io_tree *tree, *failure_tree; >> struct processed_extent processed = { 0 }; >> - u64 offset = 0; >> + u64 disk_bytenr = (bio->bi_iter.bi_sector << 9); > > needless parentheses. > >> u64 start; >> u64 end; >> u64 len; > > <snip> > >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index c54e0ed0b938..eff987931f0d 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -2843,19 +2843,27 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, >> * The length of such check is always one sector size. >> */ > > It's not evident from the hunk but you should also modify the parameter description of this function since we no longer have 'icsum' > >> static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, >> - int icsum, struct page *page, int pgoff) >> + u64 disk_bytenr, struct page *page, int pgoff) >> { >> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >> SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); >> char *kaddr; >> u32 len = fs_info->sectorsize; >> const u32 csum_size = fs_info->csum_size; >> + u64 bio_disk_bytenr = (io_bio->bio.bi_iter.bi_sector << 9); > > Again, extra parentheses, they don't bring anything in this particular expression. > >> + int offset_sectors; >> u8 *csum_expected; >> u8 csum[BTRFS_CSUM_SIZE]; >> >> ASSERT(pgoff + len <= PAGE_SIZE); >> >> - csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size; >> + /* Our disk_bytenr should be inside the io_bio */ >> + ASSERT(bio_disk_bytenr <= disk_bytenr && >> + disk_bytenr < bio_disk_bytenr + io_bio->bio.bi_iter.bi_size); > > nit: in_range(disk_bytenr, bio_disk_bytenr, io_bio->bio.bi_iter.bi_size); > > IMO the assert is redundant since it's obvious disk_bytenr will always be within range, but perhahps it's needed for your future subpage work so I'm not going to insist on removing it. The "bio_disk_bytenr <= disk_bytenr" is obvious and I'm fine to remove it, but the later "disk_bytenr < bio_disk_bytenr + io_bio->bio.bi_iter.bi_size" is a completely valid check. This is even more important since we use disk_bytenr to calculate where the expected csum is. If disk_bytenr goes beyond bio range, it will cause memory access out of boudary. In fact, this assert() has already caught cases in my later patches where I forgot bv_len can go beyond current page. Thanks, Qu > >> + >> + offset_sectors = (disk_bytenr - bio_disk_bytenr) >> >> + fs_info->sectorsize_bits; >> + csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size; >> >> kaddr = kmap_atomic(page); >> shash->tfm = fs_info->csum_shash; >> @@ -2883,8 +2891,13 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, > > <snip> >
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index bd5a22bfee68..f8b5d3d4e5b0 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2878,7 +2878,7 @@ static void end_bio_extent_readpage(struct bio *bio) struct btrfs_io_bio *io_bio = btrfs_io_bio(bio); struct extent_io_tree *tree, *failure_tree; struct processed_extent processed = { 0 }; - u64 offset = 0; + u64 disk_bytenr = (bio->bi_iter.bi_sector << 9); u64 start; u64 end; u64 len; @@ -2924,8 +2924,9 @@ static void end_bio_extent_readpage(struct bio *bio) mirror = io_bio->mirror_num; if (likely(uptodate)) { if (is_data_inode(inode)) - ret = btrfs_verify_data_csum(io_bio, offset, page, - start, end, mirror); + ret = btrfs_verify_data_csum(io_bio, + disk_bytenr, page, start, end, + mirror); else ret = btrfs_validate_metadata_buffer(io_bio, page, start, end, mirror); @@ -2953,12 +2954,13 @@ static void end_bio_extent_readpage(struct bio *bio) * If it can't handle the error it will return -EIO and * we remain responsible for that page. */ - if (!btrfs_submit_read_repair(inode, bio, offset, page, + if (!btrfs_submit_read_repair(inode, bio, disk_bytenr, + page, start - page_offset(page), start, end, mirror, btrfs_submit_data_bio)) { uptodate = !bio->bi_status; - offset += len; + disk_bytenr += len; continue; } } else { @@ -2983,7 +2985,7 @@ static void end_bio_extent_readpage(struct bio *bio) if (page->index == end_index && off) zero_user_segment(page, off, PAGE_SIZE); } - offset += len; + disk_bytenr += len; endio_readpage_update_page_status(page, uptodate); endio_readpage_release_extent(&processed, BTRFS_I(inode), diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c54e0ed0b938..eff987931f0d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2843,19 +2843,27 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, * The length of such check is always one sector size. */ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, - int icsum, struct page *page, int pgoff) + u64 disk_bytenr, struct page *page, int pgoff) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); char *kaddr; u32 len = fs_info->sectorsize; const u32 csum_size = fs_info->csum_size; + u64 bio_disk_bytenr = (io_bio->bio.bi_iter.bi_sector << 9); + int offset_sectors; u8 *csum_expected; u8 csum[BTRFS_CSUM_SIZE]; ASSERT(pgoff + len <= PAGE_SIZE); - csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size; + /* Our disk_bytenr should be inside the io_bio */ + ASSERT(bio_disk_bytenr <= disk_bytenr && + disk_bytenr < bio_disk_bytenr + io_bio->bio.bi_iter.bi_size); + + offset_sectors = (disk_bytenr - bio_disk_bytenr) >> + fs_info->sectorsize_bits; + csum_expected = ((u8 *)io_bio->csum) + offset_sectors * csum_size; kaddr = kmap_atomic(page); shash->tfm = fs_info->csum_shash; @@ -2883,8 +2891,13 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio, * when reads are done, we need to check csums to verify the data is correct * if there's a match, we allow the bio to finish. If not, the code in * extent_io.c will try to find good copies for us. + * + * @disk_bytenr: The on-disk bytenr of the range start + * @start: The file offset of the range start + * @end: The file offset of the range end (inclusive) + * @mirror: The mirror number */ -int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset, +int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 disk_bytenr, struct page *page, u64 start, u64 end, int mirror) { size_t offset = start - page_offset(page); @@ -2909,8 +2922,7 @@ int btrfs_verify_data_csum(struct btrfs_io_bio *io_bio, u64 phy_offset, return 0; } - phy_offset >>= root->fs_info->sectorsize_bits; - return check_data_csum(inode, io_bio, phy_offset, page, offset); + return check_data_csum(inode, io_bio, disk_bytenr, page, offset); } /* @@ -7616,7 +7628,12 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, struct bio_vec bvec; struct bvec_iter iter; u64 start = io_bio->logical; - int icsum = 0; + /* + * Dio io_bio uses file offset other than disk bytenr for + * io_bio->logical. + * Thus we need to grab disk_bytenr from bio. + */ + u64 disk_bytenr = (io_bio->bio.bi_iter.bi_sector << 9); blk_status_t err = BLK_STS_OK; __bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) { @@ -7627,8 +7644,8 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, for (i = 0; i < nr_sectors; i++) { ASSERT(pgoff < PAGE_SIZE); if (uptodate && - (!csum || !check_data_csum(inode, io_bio, icsum, - bvec.bv_page, pgoff))) { + (!csum || !check_data_csum(inode, io_bio, + disk_bytenr, bvec.bv_page, pgoff))) { clean_io_failure(fs_info, failure_tree, io_tree, start, bvec.bv_page, btrfs_ino(BTRFS_I(inode)), @@ -7648,7 +7665,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode, err = status; } start += sectorsize; - icsum++; + disk_bytenr += sectorsize; pgoff += sectorsize; } }
Parameter @icsum for check_data_csum() is a little hard to understand. It is the offset in sectors compared to io_bio->logical. Instead of using the calculated value, let's go with disk_bytenr, as the new name is not only straightforward, but also utilized in a lot of existing code for file items. To get the old @icsum value, we simply use (disk_bytenr - (io_bio->bio.bi_iter.bi_sector << 9)) >> fs_info->sectorsize_bits; This patch would separate file offset with disk_bytenr completely, to reduce the confusion. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 14 ++++++++------ fs/btrfs/inode.c | 35 ++++++++++++++++++++++++++--------- 2 files changed, 34 insertions(+), 15 deletions(-)