diff mbox series

[v3,02/12] btrfs: introduce a new helper to submit bio for scrub

Message ID b61263cba690fd894e21d75442556ae2f150f095.1679278088.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() | expand

Commit Message

Qu Wenruo March 20, 2023, 2:12 a.m. UTC
The new helper, btrfs_submit_scrub_read(), would be mostly a subset of
btrfs_submit_bio(), with the following limitations:

- Only supports read
- @mirror_num must be > 0
- No read-time repair nor checksum verification
- The @bbio must not cross stripe boundary

This would provide the basis for unified read repair for scrub, as we no
longer needs to handle RAID56 recovery all by scrub, and RAID56 data
stripes scrub can share the same code of read and repair.

The repair part would be the same as non-RAID56, as we only need to try
the next mirror.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/bio.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++----
 fs/btrfs/bio.h | 17 ++++++++++++++++-
 2 files changed, 64 insertions(+), 5 deletions(-)

Comments

Christoph Hellwig March 21, 2023, 12:02 p.m. UTC | #1
> @@ -39,6 +39,8 @@ void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_inode *inode,
>  	bbio->end_io = end_io;
>  	bbio->private = private;
>  	atomic_set(&bbio->pending_ios, 1);
> +	if (inode)
> +		bbio->fs_info = inode->root->fs_info;
>  }

Please split modifications to the btrfs_bio infrastructure into
separate patches.  And we'll need a proper helper to allocate these
non-bio inodes that ensures fs_info is always set.

>  	struct btrfs_bio *bbio = btrfs_bio(bio);
>  	struct btrfs_device *dev = bio->bi_private;
> -	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
> +	struct btrfs_fs_info *fs_info = bbio->inode ?
> +		bbio->inode->root->fs_info : bbio->fs_info;

.. and all places that just need the fs_info need to just use
bbio->fs_info instead of these complicated constructs.
Qu Wenruo March 24, 2023, 9:58 a.m. UTC | #2
On 2023/3/21 20:02, Christoph Hellwig wrote:
>> @@ -39,6 +39,8 @@ void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_inode *inode,
>>   	bbio->end_io = end_io;
>>   	bbio->private = private;
>>   	atomic_set(&bbio->pending_ios, 1);
>> +	if (inode)
>> +		bbio->fs_info = inode->root->fs_info;
>>   }
> 
> Please split modifications to the btrfs_bio infrastructure into
> separate patches.  And we'll need a proper helper to allocate these
> non-bio inodes that ensures fs_info is always set.
> 
>>   	struct btrfs_bio *bbio = btrfs_bio(bio);
>>   	struct btrfs_device *dev = bio->bi_private;
>> -	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
>> +	struct btrfs_fs_info *fs_info = bbio->inode ?
>> +		bbio->inode->root->fs_info : bbio->fs_info;
> 
> .. and all places that just need the fs_info need to just use
> bbio->fs_info instead of these complicated constructs.
> 

I have updated the branch, with a dedicated btrfs_bio::fs_info, 
btrfs_scrub_bio_alloc() and always populate that new fs_info.

Mind to have a pre-check?

https://github.com/adam900710/linux/commit/0f6a419f035787546eec6f51b0430a05a345d4c5

Although the next two patches are also slightly updated to take the 
advantage of that always populated fs_info:

Since the modification, I'm a little more dependent on that always 
populated fs_info now, thus not sure if going back to a union and 
conditionally grabbing that fs_info is a good idea.

Although I'm more interested why there is a super big gap between work 
and bio (32 bytes), while the compiler still refuses to put the pointer 
into that hole...

Thanks,
Qu
Christoph Hellwig March 25, 2023, 8:09 a.m. UTC | #3
On Fri, Mar 24, 2023 at 05:58:57PM +0800, Qu Wenruo wrote:
> I have updated the branch, with a dedicated btrfs_bio::fs_info,
> btrfs_scrub_bio_alloc() and always populate that new fs_info.
> 
> Mind to have a pre-check?
> 
> https://github.com/adam900710/linux/commit/0f6a419f035787546eec6f51b0430a05a345d4c5

I'd prefer to just pass a fs_info to btrfs_bio_init and btrfs_bio_alloc
instead of duplicating the allocator.  With my "simplify extent_buffer
reading and writing" series and another tiny change we'll only need the
inode for data inodes.  But given that we've not made too much
progress on getting that series merged I guess we'll have to add
the duplicate allocator for now and just revert it later.

I'd drop the ASSERT in btrfs_submit_bio - all allocators initialize
the field, no need for the extra check.

> Although the next two patches are also slightly updated to take the
> advantage of that always populated fs_info:

I think you wanted to add links here, but they are missing.

> Since the modification, I'm a little more dependent on that always populated
> fs_info now, thus not sure if going back to a union and conditionally
> grabbing that fs_info is a good idea.

I don't think the union is a good idea at all.  Maybe in the future
we can move the inode into a union, but right now the csum for data
read is the upper bound on the btrfs_bio, so I'm not sure we'll
ever get to a point where that would help.

> Although I'm more interested why there is a super big gap between work and
> bio (32 bytes), while the compiler still refuses to put the pointer into
> that hole...

I don't think there is one.  pahole just reports structs strangely
sometimes.
Qu Wenruo March 25, 2023, 8:21 a.m. UTC | #4
On 2023/3/25 16:09, Christoph Hellwig wrote:
> On Fri, Mar 24, 2023 at 05:58:57PM +0800, Qu Wenruo wrote:
>> I have updated the branch, with a dedicated btrfs_bio::fs_info,
>> btrfs_scrub_bio_alloc() and always populate that new fs_info.
>>
>> Mind to have a pre-check?
>>
>> https://github.com/adam900710/linux/commit/0f6a419f035787546eec6f51b0430a05a345d4c5
> 
> I'd prefer to just pass a fs_info to btrfs_bio_init and btrfs_bio_alloc
> instead of duplicating the allocator.  With my "simplify extent_buffer
> reading and writing" series and another tiny change we'll only need the
> inode for data inodes.  But given that we've not made too much
> progress on getting that series merged I guess we'll have to add
> the duplicate allocator for now and just revert it later.

No big deal, I can go with the extra fs_info parameter for 
btrfs_bio_init() and btrfs_bio_alloc().

The main reason I go with the duplicated allocate is to remove the need 
for nr_vecs, but that's pretty minor, thus not a critical one.

> 
> I'd drop the ASSERT in btrfs_submit_bio - all allocators initialize
> the field, no need for the extra check.

OK, sounds reasonable.

> 
>> Although the next two patches are also slightly updated to take the
>> advantage of that always populated fs_info:
> 
> I think you wanted to add links here, but they are missing.
> 
>> Since the modification, I'm a little more dependent on that always populated
>> fs_info now, thus not sure if going back to a union and conditionally
>> grabbing that fs_info is a good idea.
> 
> I don't think the union is a good idea at all.  Maybe in the future
> we can move the inode into a union, but right now the csum for data
> read is the upper bound on the btrfs_bio, so I'm not sure we'll
> ever get to a point where that would help.

Yeah, let's avoid union and let fs_info always populated.

Thanks,
Qu

> 
>> Although I'm more interested why there is a super big gap between work and
>> bio (32 bytes), while the compiler still refuses to put the pointer into
>> that hole...
> 
> I don't think there is one.  pahole just reports structs strangely
> sometimes.
Christoph Hellwig March 25, 2023, 8:31 a.m. UTC | #5
On Sat, Mar 25, 2023 at 04:21:41PM +0800, Qu Wenruo wrote:
> No big deal, I can go with the extra fs_info parameter for btrfs_bio_init()
> and btrfs_bio_alloc().

I'd be tempted to just initialize it in the callers given that it's
an optional field now.  I wonder if the same might also make sense
for ->file_offset.

> The main reason I go with the duplicated allocate is to remove the need for
> nr_vecs, but that's pretty minor, thus not a critical one.

Passing nr_vecs might actually make sense.  There is no need to always
allocated all of them for existing callers either, so we can do
additional optimizations based on that later.
Qu Wenruo March 25, 2023, 8:48 a.m. UTC | #6
On 2023/3/25 16:31, Christoph Hellwig wrote:
> On Sat, Mar 25, 2023 at 04:21:41PM +0800, Qu Wenruo wrote:
>> No big deal, I can go with the extra fs_info parameter for btrfs_bio_init()
>> and btrfs_bio_alloc().
> 
> I'd be tempted to just initialize it in the callers given that it's
> an optional field now.  I wonder if the same might also make sense
> for ->file_offset.

In that case, I'm not sure if adding more and more arguments while 
almost all of them can be optional is a good idea...

Shouldn't we only pass mandatory arguments, then let the caller to 
populate the optional ones?

Anyway, for scrub usage, fs_info can be initialized at the submission 
timing if neexed, thus I'm fine either way.

> 
>> The main reason I go with the duplicated allocate is to remove the need for
>> nr_vecs, but that's pretty minor, thus not a critical one.
> 
> Passing nr_vecs might actually make sense.  There is no need to always
> allocated all of them for existing callers either, so we can do
> additional optimizations based on that later.

Or we can just go with the github version and refine later?

Considering scrub would be the final major code migrating to use 
btrfs_bio, there shouldn't be any major use case change to btrfs_bio for 
a while.

Thanks,
Qu
Christoph Hellwig March 27, 2023, 3:36 a.m. UTC | #7
On Sat, Mar 25, 2023 at 04:48:40PM +0800, Qu Wenruo wrote:
> > I'd be tempted to just initialize it in the callers given that it's
> > an optional field now.  I wonder if the same might also make sense
> > for ->file_offset.
> 
> In that case, I'm not sure if adding more and more arguments while almost
> all of them can be optional is a good idea...
> 
> Shouldn't we only pass mandatory arguments, then let the caller to populate
> the optional ones?

Yes, that's what I tried to suggest above.  Right now inode and
file_offset are mandatory, but with your changes they stop being so,
so we should stop passing them to the allocation helpers.
diff mbox series

Patch

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index cf09c6271edb..b96f40160b08 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -39,6 +39,8 @@  void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_inode *inode,
 	bbio->end_io = end_io;
 	bbio->private = private;
 	atomic_set(&bbio->pending_ios, 1);
+	if (inode)
+		bbio->fs_info = inode->root->fs_info;
 }
 
 /*
@@ -308,8 +310,8 @@  static void btrfs_end_bio_work(struct work_struct *work)
 {
 	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, end_io_work);
 
-	/* Metadata reads are checked and repaired by the submitter. */
-	if (bbio->bio.bi_opf & REQ_META)
+	/* Metadata or scrub reads are checked and repaired by the submitter. */
+	if (bbio->bio.bi_opf & REQ_META || !bbio->inode)
 		bbio->end_io(bbio);
 	else
 		btrfs_check_read_bio(bbio, bbio->bio.bi_private);
@@ -319,7 +321,8 @@  static void btrfs_simple_end_io(struct bio *bio)
 {
 	struct btrfs_bio *bbio = btrfs_bio(bio);
 	struct btrfs_device *dev = bio->bi_private;
-	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
+	struct btrfs_fs_info *fs_info = bbio->inode ?
+		bbio->inode->root->fs_info : bbio->fs_info;
 
 	btrfs_bio_counter_dec(fs_info);
 
@@ -343,7 +346,8 @@  static void btrfs_raid56_end_io(struct bio *bio)
 
 	btrfs_bio_counter_dec(bioc->fs_info);
 	bbio->mirror_num = bioc->mirror_num;
-	if (bio_op(bio) == REQ_OP_READ && !(bbio->bio.bi_opf & REQ_META))
+	if (bio_op(bio) == REQ_OP_READ && bbio->inode &&
+	    !(bbio->bio.bi_opf & REQ_META))
 		btrfs_check_read_bio(bbio, NULL);
 	else
 		btrfs_orig_bbio_end_io(bbio);
@@ -689,6 +693,46 @@  static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 	return true;
 }
 
+/*
+ * Scrub read special version, with extra limits:
+ *
+ * - Only support read for scrub usage
+ * - @mirror_num must be >0
+ * - No read-time repair nor checksum verification.
+ * - The @bbio must not cross stripe boundary.
+ */
+void btrfs_submit_scrub_read(struct btrfs_fs_info *fs_info,
+			     struct btrfs_bio *bbio, int mirror_num)
+{
+	struct btrfs_bio *orig_bbio = bbio;
+	u64 logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
+	u64 length = bbio->bio.bi_iter.bi_size;
+	u64 map_length = length;
+	struct btrfs_io_context *bioc = NULL;
+	struct btrfs_io_stripe smap;
+	int ret;
+
+	ASSERT(mirror_num > 0);
+	ASSERT(btrfs_op(&bbio->bio) == BTRFS_MAP_READ);
+	ASSERT(!bbio->inode);
+
+	bbio->fs_info = fs_info;
+	btrfs_bio_counter_inc_blocked(fs_info);
+	ret = __btrfs_map_block(fs_info, btrfs_op(&bbio->bio), logical,
+				&map_length, &bioc, &smap, &mirror_num, 1);
+	if (ret)
+		goto fail;
+
+	/* Caller should ensure the @bbio doesn't cross stripe boundary. */
+	ASSERT(map_length >= length);
+	__btrfs_submit_bio(&bbio->bio, bioc, &smap, mirror_num);
+	return;
+
+fail:
+	btrfs_bio_counter_dec(fs_info);
+	btrfs_bio_end_io(orig_bbio, ret);
+}
+
 void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num)
 {
 	while (!btrfs_submit_chunk(bbio, mirror_num))
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index dbf125f6fa33..073df13365e4 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -30,7 +30,13 @@  typedef void (*btrfs_bio_end_io_t)(struct btrfs_bio *bbio);
  * passed to btrfs_submit_bio for mapping to the physical devices.
  */
 struct btrfs_bio {
-	/* Inode and offset into it that this I/O operates on. */
+	/*
+	 * Inode and offset into it that this I/O operates on.
+	 *
+	 * @inode can be NULL for callers who don't want any advanced features
+	 * like read-time repair.
+	 * In that case, @fs_info must be properly initialized.
+	 */
 	struct btrfs_inode *inode;
 	u64 file_offset;
 
@@ -58,6 +64,13 @@  struct btrfs_bio {
 	atomic_t pending_ios;
 	struct work_struct end_io_work;
 
+	/*
+	 * For cases where callers only want to read/write from a logical
+	 * bytenr, in that case @inode can be NULL, and we need such
+	 * @fs_info pointer to grab the corresponding fs_info.
+	 */
+	struct btrfs_fs_info *fs_info;
+
 	/*
 	 * This member must come last, bio_alloc_bioset will allocate enough
 	 * bytes for entire btrfs_bio but relies on bio being last.
@@ -89,6 +102,8 @@  static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 #define REQ_BTRFS_ONE_ORDERED			REQ_DRV
 
 void btrfs_submit_bio(struct btrfs_bio *bbio, int mirror_num);
+void btrfs_submit_scrub_read(struct btrfs_fs_info *fs_info,
+			     struct btrfs_bio *bbio, int mirror_num);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 			    u64 length, u64 logical, struct page *page,
 			    unsigned int pg_offset, int mirror_num);