Message ID | 72f4fa26c35f2e649bc562a80a40955d745f1118.1679826088.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() | expand |
On Sun, Mar 26, 2023 at 07:06:33PM +0800, Qu Wenruo wrote: > + /* Caller should ensure the @bbio doesn't cross stripe boundary. */ > + ASSERT(map_length >= length); > + if (btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE && btrfs_is_zoned(fs_info)) { > + bbio->bio.bi_opf &= ~REQ_OP_WRITE; > + bbio->bio.bi_opf |= REQ_OP_ZONE_APPEND; > + } Not crossing the stripe boundary is not enough, for zone append it also must not cross the zone append limit, which (at least in theory) can be arbitrarily small. > + if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) { > + int data_stripes = (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5) ? > + bioc->num_stripes - 1 : bioc->num_stripes - 2; This calculation feels like it should be in a well document simple helper. > + smap.physical = bioc->stripes[i].physical + > + ((logical - bioc->full_stripe_logical) & > + BTRFS_STRIPE_LEN_MASK); This calculation feels like another candidate for a self contained a well documented helper. > + goto submit; > + } > + ASSERT(mirror_num <= bioc->num_stripes); > + smap.dev = bioc->stripes[mirror_num - 1].dev; > + smap.physical = bioc->stripes[mirror_num - 1].physical; > +submit: Why no else instead of the goto here? That would read a lot easier.
On 2023/3/27 11:44, Christoph Hellwig wrote: > On Sun, Mar 26, 2023 at 07:06:33PM +0800, Qu Wenruo wrote: >> + /* Caller should ensure the @bbio doesn't cross stripe boundary. */ >> + ASSERT(map_length >= length); >> + if (btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE && btrfs_is_zoned(fs_info)) { >> + bbio->bio.bi_opf &= ~REQ_OP_WRITE; >> + bbio->bio.bi_opf |= REQ_OP_ZONE_APPEND; >> + } > > Not crossing the stripe boundary is not enough, for zone append it > also must not cross the zone append limit, which (at least in theory) > can be arbitrarily small. Do you mean that we can have some real zone append limit which is even smaller than 64K? If so, then the write helper can be more complex than I thought... Thanks, Qu > >> + if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) { >> + int data_stripes = (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5) ? >> + bioc->num_stripes - 1 : bioc->num_stripes - 2; > > This calculation feels like it should be in a well document simple > helper. > >> + smap.physical = bioc->stripes[i].physical + >> + ((logical - bioc->full_stripe_logical) & >> + BTRFS_STRIPE_LEN_MASK); > > This calculation feels like another candidate for a self contained > a well documented helper. > > >> + goto submit; >> + } >> + ASSERT(mirror_num <= bioc->num_stripes); >> + smap.dev = bioc->stripes[mirror_num - 1].dev; >> + smap.physical = bioc->stripes[mirror_num - 1].physical; >> +submit: > > Why no else instead of the goto here? That would read a lot easier.
On Mon, Mar 27, 2023 at 12:32:22PM +0800, Qu Wenruo wrote: > > Not crossing the stripe boundary is not enough, for zone append it > > also must not cross the zone append limit, which (at least in theory) > > can be arbitrarily small. > > Do you mean that we can have some real zone append limit which is even > smaller than 64K? > > If so, then the write helper can be more complex than I thought... In theory it can be as small as a single LBA. It might make sense to require a minimum of 64k for btrfs, and reject the writable mount if it is not, as a device with just a 64k zone append limit or smaller would be really horrible to use anyway.
On 2023/3/27 13:22, Christoph Hellwig wrote: > On Mon, Mar 27, 2023 at 12:32:22PM +0800, Qu Wenruo wrote: >>> Not crossing the stripe boundary is not enough, for zone append it >>> also must not cross the zone append limit, which (at least in theory) >>> can be arbitrarily small. >> >> Do you mean that we can have some real zone append limit which is even >> smaller than 64K? >> >> If so, then the write helper can be more complex than I thought... > > In theory it can be as small as a single LBA. It might make sense > to require a minimum of 64k for btrfs, and reject the writable mount > if it is not, as a device with just a 64k zone append limit or smaller > would be really horrible to use anyway. The rejection method sounds very reasonable to me. Although I'd prefer some feedback from Johannes on the real world values, and if possible let the zoned guys submit the patch. I haven't yet build my VMs to run ZNS tests. Another thing is, if we want to implement the rejection before the reworked scrub, the threshold would be 128K, as that's the bio size limit for the old scrub code. Thanks, Qu
On 27.03.23 09:52, Qu Wenruo wrote: > > > On 2023/3/27 13:22, Christoph Hellwig wrote: >> On Mon, Mar 27, 2023 at 12:32:22PM +0800, Qu Wenruo wrote: >>>> Not crossing the stripe boundary is not enough, for zone append it >>>> also must not cross the zone append limit, which (at least in theory) >>>> can be arbitrarily small. >>> >>> Do you mean that we can have some real zone append limit which is even >>> smaller than 64K? >>> >>> If so, then the write helper can be more complex than I thought... >> >> In theory it can be as small as a single LBA. It might make sense >> to require a minimum of 64k for btrfs, and reject the writable mount >> if it is not, as a device with just a 64k zone append limit or smaller >> would be really horrible to use anyway. > > The rejection method sounds very reasonable to me. > > Although I'd prefer some feedback from Johannes on the real world > values, and if possible let the zoned guys submit the patch. > Sure rejecting sub 64k ZA limits sounds reasonable to me as well. I can craft a patch if you want. > I haven't yet build my VMs to run ZNS tests. > > Another thing is, if we want to implement the rejection before the > reworked scrub, the threshold would be 128K, as that's the bio size > limit for the old scrub code. > > Thanks, > Qu >
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index bdef346c542c..34902d58bb4b 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -754,6 +754,98 @@ void btrfs_submit_scrub_read(struct btrfs_bio *bbio, int mirror_num) btrfs_bio_end_io(bbio, errno_to_blk_status(ret)); } +/* + * Scrub write special version. Some extra limits: + * + * - Only support write back for dev-replace and scrub read-repair. + * This means, the write bio, even for RAID56, would only + * be mapped to single device. + * + * - @mirror_num must be >0. + * To indicate which mirror to be written. + * If it's RAID56, it must be 1 (data stripes). + * + * - The @bbio must not cross stripe boundary. + * + * - If @dev_replace is true, the resulted stripe must be mapped to + * replace source device. + * + * - No csum geneartion. + */ +void btrfs_submit_scrub_write(struct btrfs_bio *bbio, int mirror_num, + bool dev_replace) +{ + struct btrfs_fs_info *fs_info = bbio->fs_info; + u64 logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT; + u64 length = bbio->bio.bi_iter.bi_size; + u64 map_length = length; + struct btrfs_io_context *bioc = NULL; + struct btrfs_io_stripe smap; + int ret; + + ASSERT(fs_info); + ASSERT(mirror_num > 0); + ASSERT(btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE); + ASSERT(!bbio->inode); + + btrfs_bio_counter_inc_blocked(fs_info); + ret = __btrfs_map_block(fs_info, btrfs_op(&bbio->bio), logical, + &map_length, &bioc, &smap, &mirror_num, 1); + if (ret) + goto fail; + + /* Caller should ensure the @bbio doesn't cross stripe boundary. */ + ASSERT(map_length >= length); + if (btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE && btrfs_is_zoned(fs_info)) { + bbio->bio.bi_opf &= ~REQ_OP_WRITE; + bbio->bio.bi_opf |= REQ_OP_ZONE_APPEND; + } + + if (!bioc) + goto submit; + + /* Map the RAID56 multi-stripe writes to a single one. */ + if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) { + int data_stripes = (bioc->map_type & BTRFS_BLOCK_GROUP_RAID5) ? + bioc->num_stripes - 1 : bioc->num_stripes - 2; + int i; + + /* This special write only works for data stripes. */ + ASSERT(mirror_num == 1); + for (i = 0; i < data_stripes; i++) { + u64 stripe_start = bioc->full_stripe_logical + + (i << BTRFS_STRIPE_LEN_SHIFT); + + if (logical >= stripe_start && + logical < stripe_start + BTRFS_STRIPE_LEN) + break; + } + ASSERT(i < data_stripes); + smap.dev = bioc->stripes[i].dev; + smap.physical = bioc->stripes[i].physical + + ((logical - bioc->full_stripe_logical) & + BTRFS_STRIPE_LEN_MASK); + goto submit; + } + ASSERT(mirror_num <= bioc->num_stripes); + smap.dev = bioc->stripes[mirror_num - 1].dev; + smap.physical = bioc->stripes[mirror_num - 1].physical; +submit: + ASSERT(smap.dev); + btrfs_put_bioc(bioc); + bioc = NULL; + if (dev_replace) { + ASSERT(smap.dev == fs_info->dev_replace.srcdev); + smap.dev = fs_info->dev_replace.tgtdev; + } + __btrfs_submit_bio(&bbio->bio, bioc, &smap, mirror_num); + return; + +fail: + btrfs_bio_counter_dec(fs_info); + btrfs_bio_end_io(bbio, errno_to_blk_status(ret)); +} + void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num) { while (!btrfs_submit_chunk(bbio, mirror_num)) diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h index afbcf318fdda..ad5a6a558662 100644 --- a/fs/btrfs/bio.h +++ b/fs/btrfs/bio.h @@ -107,6 +107,8 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status) void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num); void btrfs_submit_scrub_read(struct btrfs_bio *bbio, int mirror_num); +void btrfs_submit_scrub_write(struct btrfs_bio *bbio, int mirror_num, + bool dev_replace); int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, u64 length, u64 logical, struct page *page, unsigned int pg_offset, int mirror_num);