Message ID | 79a6604bc9ccb2a6e1355f9d897b45943c6bcca9.1679959770.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() | expand |
On Tue, Mar 28, 2023 at 07:30:53AM +0800, Qu Wenruo wrote: > The new helper, btrfs_submit_scrub_read(), would be mostly a subset of > btrfs_submit_bio(), with the following limitations: > > - Only supports read > - @mirror_num must be > 0 > - No read-time repair nor checksum verification > - The @bbio must not cross stripe boundary > > This would provide the basis for unified read repair for scrub, as we no > longer needs to handle RAID56 recovery all by scrub, and RAID56 data > stripes scrub can share the same code of read and repair. > > The repair part would be the same as non-RAID56, as we only need to try > the next mirror. Stupid question: what do we actually still need this and the write helper for now that I think the generic helpers should work just fine without bbio->inode?
On 2023/3/28 07:35, Christoph Hellwig wrote: > On Tue, Mar 28, 2023 at 07:30:53AM +0800, Qu Wenruo wrote: >> The new helper, btrfs_submit_scrub_read(), would be mostly a subset of >> btrfs_submit_bio(), with the following limitations: >> >> - Only supports read >> - @mirror_num must be > 0 >> - No read-time repair nor checksum verification >> - The @bbio must not cross stripe boundary >> >> This would provide the basis for unified read repair for scrub, as we no >> longer needs to handle RAID56 recovery all by scrub, and RAID56 data >> stripes scrub can share the same code of read and repair. >> >> The repair part would be the same as non-RAID56, as we only need to try >> the next mirror. > > Stupid question: what do we actually still need this and the write helper > for now that I think the generic helpers should work just fine without > bbio->inode? For read part, as long as we skip all the csum verification and csum search, it can go the generic helper. Although this may needs some coordinate during merge. But in the future, mostly for the ZNS RST patches, we need special handling for the garbage range. Johannes is already testing his RST branches upon this series, and he is hitting the ASSERT() on the mapped length due to RST changes. In that case, we need to skip certain ranges, and that's specific to scrub usage. Thus for read part, I'd still prefer a scrub specific helper, for the incoming RST expansion. For the write helper, it's very different. The btrfs_submit_bio() would duplicate the writes to all mirrors, which is exactly what I want to avoid for scrub usage. Thus the scrub specific write helper would always map the write to a single stripe, even for RAID56 writes. So the write helper must stay. Thanks, Qu
On Tue, Mar 28, 2023 at 08:16:38AM +0800, Qu Wenruo wrote: > For read part, as long as we skip all the csum verification and csum search, > it can go the generic helper. Yes, and we can just check for bbio->inode to decide that, like you did in the earliest versions of the series. > But in the future, mostly for the ZNS RST patches, we need special handling > for the garbage range. > > Johannes is already testing his RST branches upon this series, and he is > hitting the ASSERT() on the mapped length due to RST changes. > In that case, we need to skip certain ranges, and that's specific to scrub > usage. Can you explain what the issue is here? > The btrfs_submit_bio() would duplicate the writes to all mirrors, which is > exactly what I want to avoid for scrub usage. > Thus the scrub specific write helper would always map the write to a single > stripe, even for RAID56 writes. That's basically what btrfs_repair_io_failure does. Any chance we could share the code for those instead of going to back to creating multiple code paths for I/O submission?
On 2023/3/28 08:25, Christoph Hellwig wrote: > On Tue, Mar 28, 2023 at 08:16:38AM +0800, Qu Wenruo wrote: >> For read part, as long as we skip all the csum verification and csum search, >> it can go the generic helper. > > Yes, and we can just check for bbio->inode to decide that, like you did > in the earliest versions of the series. > >> But in the future, mostly for the ZNS RST patches, we need special handling >> for the garbage range. >> >> Johannes is already testing his RST branches upon this series, and he is >> hitting the ASSERT() on the mapped length due to RST changes. >> In that case, we need to skip certain ranges, and that's specific to scrub >> usage. > > Can you explain what the issue is here? The new RST would add a new layer between physical address and logical address. Thus every read would do an extra lookup (almost at extent level), that's done in btrfs_map_block(). However the new scrub code introduced a new behavior, which is never really utilized by any other code path. Now scrub would read the full 64K in one go, even this means we can read out sectors not utilized by any extent. The new behavior itself reduces IOPS, thus it should be an overall win. And for regular devices or non-RST zoned devices, we just read out some garbage, and scrub verification code would skip them. But for RST cases, there would be no mapping for the garbage range, thus we need to skip those garbage for RST cases. The dedicated read helper would provide the location for us to handle this RST scrub specific case. > >> The btrfs_submit_bio() would duplicate the writes to all mirrors, which is >> exactly what I want to avoid for scrub usage. >> Thus the scrub specific write helper would always map the write to a single >> stripe, even for RAID56 writes. > > That's basically what btrfs_repair_io_failure does. Any chance we > could share the code for those instead of going to back to creating > multiple code paths for I/O submission? That sounds like a pretty good candidate to merge. The core shares the same idea, only the support code is different. I'm fine merging the core logic, although I believe we still need different entrance functions. (e.g repair is only done for one sector, requires inode/file offset, and is done synchronously.) Thanks, Qu
On Tue, Mar 28, 2023 at 08:48:29AM +0800, Qu Wenruo wrote: > The new behavior itself reduces IOPS, thus it should be an overall win. > And for regular devices or non-RST zoned devices, we just read out some > garbage, and scrub verification code would skip them. > > But for RST cases, there would be no mapping for the garbage range, thus we > need to skip those garbage for RST cases. > > The dedicated read helper would provide the location for us to handle this > RST scrub specific case. Well, if did just call btrfs_submit_chunk, the RST lookup would ensure you only get the length of the RST mapping, and you get the behavior you want without duplication. We'd need to make it non-static (and document it), but I'd still be much happier about that than yet another I/O submission interface. > > The core shares the same idea, only the support code is different. > > I'm fine merging the core logic, although I believe we still need different > entrance functions. > (e.g repair is only done for one sector, requires inode/file offset, and is > done synchronously.) Well, the inode number and start is only used for a debug printk, so we can easily move that into the caller. As for the single sectors vs not, what we really should do is to move the bio allocation to the caller, which means the caller cna decided how large the bio is. Together with moving the printk that also makes the interface much simpler as we'll need a lot less arguments.
On 2023/3/28 08:53, Christoph Hellwig wrote: > On Tue, Mar 28, 2023 at 08:48:29AM +0800, Qu Wenruo wrote: >> The new behavior itself reduces IOPS, thus it should be an overall win. >> And for regular devices or non-RST zoned devices, we just read out some >> garbage, and scrub verification code would skip them. >> >> But for RST cases, there would be no mapping for the garbage range, thus we >> need to skip those garbage for RST cases. >> >> The dedicated read helper would provide the location for us to handle this >> RST scrub specific case. > > Well, if did just call btrfs_submit_chunk, the RST lookup would ensure > you only get the length of the RST mapping, and you get the behavior > you want without duplication. We'd need to make it non-static (and > document it), but I'd still be much happier about that than yet another > I/O submission interface. But in that case, wouldn't we just error out? As if RST can not find the mapping, it can error out. The behavior of handling missing RST mapping is different between the regular read path and scrub read path. For regular path, if we're reading some range without an RST map, we should error out. While for scrub path, we should skip (or if you want, we can also read some garbage) the unmapped range. And for non-RST cases, we should not hit stripe boundary (although this is just a debugging check). > >> >> The core shares the same idea, only the support code is different. >> >> I'm fine merging the core logic, although I believe we still need different >> entrance functions. >> (e.g repair is only done for one sector, requires inode/file offset, and is >> done synchronously.) > > Well, the inode number and start is only used for a debug printk, > so we can easily move that into the caller. As for the single sectors > vs not, what we really should do is to move the bio allocation > to the caller, which means the caller cna decided how large the bio > is. Together with moving the printk that also makes the interface > much simpler as we'll need a lot less arguments. Yep, I'm totally fine with a helper to do the core single-dev mapping helper. Just pass logical + mirror as parameters to the helper, and returns a smap or dev + physical, and let the caller do the submission. Thanks, Qu
On Tue, Mar 28, 2023 at 09:01:57AM +0800, Qu Wenruo wrote: > > Well, if did just call btrfs_submit_chunk, the RST lookup would ensure > > you only get the length of the RST mapping, and you get the behavior > > you want without duplication. We'd need to make it non-static (and > > document it), but I'd still be much happier about that than yet another > > I/O submission interface. > > But in that case, wouldn't we just error out? btrfs_submit_chunk is greedy. But yes, if there was no mapping at all I guess it would error out and we'd need to find a way to handle that. But that still seems better than duplicating the logic again.
On 2023/3/28 09:04, Christoph Hellwig wrote: > On Tue, Mar 28, 2023 at 09:01:57AM +0800, Qu Wenruo wrote: >>> Well, if did just call btrfs_submit_chunk, the RST lookup would ensure >>> you only get the length of the RST mapping, and you get the behavior >>> you want without duplication. We'd need to make it non-static (and >>> document it), but I'd still be much happier about that than yet another >>> I/O submission interface. >> >> But in that case, wouldn't we just error out? > > btrfs_submit_chunk is greedy. But yes, if there was no mapping at all > I guess it would error out and we'd need to find a way to handle that. > But that still seems better than duplicating the logic again. Well, to me the whole btrfs_submit_chunk() is already super modular (great work from you, very appreciated), thus new scrub_read path only goes less than 30 lines. At least to me, a few more lines for a special path seems worthy. Especially when that scrub path should never be utilized by anyone else. But I'm still open minded to merge if proven the change is small enough. Let us see what we can do after the next update of RST patchset and determine then. Thanks, Qu
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index c1edadc17260..bdef346c542c 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -333,8 +333,8 @@ static void btrfs_end_bio_work(struct work_struct *work) { struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, end_io_work); - /* Metadata reads are checked and repaired by the submitter. */ - if (bbio->bio.bi_opf & REQ_META) + /* Metadata or scrub reads are checked and repaired by the submitter. */ + if (bbio->bio.bi_opf & REQ_META || !bbio->inode) bbio->end_io(bbio); else btrfs_check_read_bio(bbio, bbio->bio.bi_private); @@ -344,7 +344,7 @@ static void btrfs_simple_end_io(struct bio *bio) { struct btrfs_bio *bbio = btrfs_bio(bio); struct btrfs_device *dev = bio->bi_private; - struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info; + struct btrfs_fs_info *fs_info = bbio->fs_info; btrfs_bio_counter_dec(fs_info); @@ -368,7 +368,8 @@ static void btrfs_raid56_end_io(struct bio *bio) btrfs_bio_counter_dec(bioc->fs_info); bbio->mirror_num = bioc->mirror_num; - if (bio_op(bio) == REQ_OP_READ && !(bbio->bio.bi_opf & REQ_META)) + if (bio_op(bio) == REQ_OP_READ && bbio->inode && + !(bbio->bio.bi_opf & REQ_META)) btrfs_check_read_bio(bbio, NULL); else btrfs_orig_bbio_end_io(bbio); @@ -714,6 +715,45 @@ static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num) return true; } +/* + * Scrub read special version, with extra limits: + * + * - Only support read for scrub usage + * - @mirror_num must be >0 + * - No read-time repair nor checksum verification. + * - The @bbio must not cross stripe boundary. + */ +void btrfs_submit_scrub_read(struct btrfs_bio *bbio, int mirror_num) +{ + struct btrfs_fs_info *fs_info = bbio->fs_info; + u64 logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT; + u64 length = bbio->bio.bi_iter.bi_size; + u64 map_length = length; + struct btrfs_io_context *bioc = NULL; + struct btrfs_io_stripe smap; + int ret; + + ASSERT(fs_info); + ASSERT(mirror_num > 0); + ASSERT(btrfs_op(&bbio->bio) == BTRFS_MAP_READ); + ASSERT(!bbio->inode); + + btrfs_bio_counter_inc_blocked(fs_info); + ret = __btrfs_map_block(fs_info, btrfs_op(&bbio->bio), logical, + &map_length, &bioc, &smap, &mirror_num, 1); + if (ret) + goto fail; + + /* Caller should ensure the @bbio doesn't cross stripe boundary. */ + ASSERT(map_length >= length); + __btrfs_submit_bio(&bbio->bio, bioc, &smap, mirror_num); + return; + +fail: + btrfs_bio_counter_dec(fs_info); + btrfs_bio_end_io(bbio, errno_to_blk_status(ret)); +} + void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num) { while (!btrfs_submit_chunk(bbio, mirror_num)) diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h index 3b97ce54140a..afbcf318fdda 100644 --- a/fs/btrfs/bio.h +++ b/fs/btrfs/bio.h @@ -106,6 +106,7 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status) #define REQ_BTRFS_ONE_ORDERED REQ_DRV void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num); +void btrfs_submit_scrub_read(struct btrfs_bio *bbio, int mirror_num); int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, u64 length, u64 logical, struct page *page, unsigned int pg_offset, int mirror_num);