diff mbox series

[02/17] btrfs: save bio::bi_iter into btrfs_bio::iter before submitting

Message ID 20211201051756.53742-3-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: split bio at btrfs_map_bio() time | expand

Commit Message

Qu Wenruo Dec. 1, 2021, 5:17 a.m. UTC
Since block layer will advance bio::bi_iter, at endio time we can no
longer rely on bio::bi_iter for split bio.

But for the incoming btrfs_bio split at btrfs_map_bio() time, we have to
ensure endio function is only executed for the split range, not the
whole original bio.

Thus this patch will introduce a new helper, btrfs_bio_save_iter(), to
save bi_iter into btrfs_bio::iter.

The following call sites need this helper call:

- btrfs_submit_compressed_read()
  For compressed read. For compressed write it doesn't really care as
  they use ordered extent.

- raid56_parity_write()
- raid56_parity_recovery()
  For RAID56.

- submit_stripe_bio()
  For all other cases.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c |  3 +++
 fs/btrfs/raid56.c      |  2 ++
 fs/btrfs/volumes.c     | 14 ++++++++++++++
 fs/btrfs/volumes.h     | 18 ++++++++++++++++++
 4 files changed, 37 insertions(+)

Comments

Qu Wenruo Dec. 1, 2021, 11:52 a.m. UTC | #1
On 2021/12/1 13:17, Qu Wenruo wrote:
> Since block layer will advance bio::bi_iter, at endio time we can no
> longer rely on bio::bi_iter for split bio.
> 
> But for the incoming btrfs_bio split at btrfs_map_bio() time, we have to
> ensure endio function is only executed for the split range, not the
> whole original bio.
> 
> Thus this patch will introduce a new helper, btrfs_bio_save_iter(), to
> save bi_iter into btrfs_bio::iter.
> 
> The following call sites need this helper call:
> 
> - btrfs_submit_compressed_read()
>    For compressed read. For compressed write it doesn't really care as
>    they use ordered extent.
> 
> - raid56_parity_write()
> - raid56_parity_recovery()
>    For RAID56.
> 
> - submit_stripe_bio()
>    For all other cases.

These are not enough.

There are cases where we allocate a bio but without going through 
btrfs_map_bio(), and error out.

In that case, those bios don't have bbio::iter, and can cause errors in 
generic/475 related to data/metadata writeback failure.

Fixed in my github repo, by just adding more btrfs_bio_save_iter() calls 
in error paths.

Thanks,
Qu
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/compression.c |  3 +++
>   fs/btrfs/raid56.c      |  2 ++
>   fs/btrfs/volumes.c     | 14 ++++++++++++++
>   fs/btrfs/volumes.h     | 18 ++++++++++++++++++
>   4 files changed, 37 insertions(+)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index e776956d5bc9..cc8d13369f53 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -870,6 +870,9 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>   	/* include any pages we added in add_ra-bio_pages */
>   	cb->len = bio->bi_iter.bi_size;
>   
> +	/* Save bi_iter so that end_bio_extent_readpage() won't freak out. */
> +	btrfs_bio_save_iter(btrfs_bio(bio));
> +
>   	while (cur_disk_byte < disk_bytenr + compressed_len) {
>   		u64 offset = cur_disk_byte - disk_bytenr;
>   		unsigned int index = offset >> PAGE_SHIFT;
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 0e239a4c3b26..13e726c88a81 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1731,6 +1731,7 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
>   		return PTR_ERR(rbio);
>   	}
>   	bio_list_add(&rbio->bio_list, bio);
> +	btrfs_bio_save_iter(btrfs_bio(bio));
>   	rbio->bio_list_bytes = bio->bi_iter.bi_size;
>   	rbio->operation = BTRFS_RBIO_WRITE;
>   
> @@ -2135,6 +2136,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
>   
>   	rbio->operation = BTRFS_RBIO_READ_REBUILD;
>   	bio_list_add(&rbio->bio_list, bio);
> +	btrfs_bio_save_iter(btrfs_bio(bio));
>   	rbio->bio_list_bytes = bio->bi_iter.bi_size;
>   
>   	rbio->faila = find_logical_bio_stripe(rbio, bio);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f38c230111be..b70037cc1a51 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6829,6 +6829,20 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   		BUG();
>   	}
>   
> +	/*
> +	 * At endio time, bi_iter is no longer reliable, thus we have to save
> +	 * current bi_iter into btrfs_bio so that even for split bio we can
> +	 * iterate only the split part.
> +	 *
> +	 * And this has to be done before any bioc error, as endio functions
> +	 * will rely on bbio::iter.
> +	 *
> +	 * For bio create by btrfs_bio_slit() or btrfs_bio_clone*(), it's
> +	 * already set, but we can still have original bio which has its
> +	 * iter not initialized.
> +	 */
> +	btrfs_bio_save_iter(btrfs_bio(bio));
> +
>   	for (dev_nr = 0; dev_nr < total_devs; dev_nr++) {
>   		dev = bioc->stripes[dev_nr].dev;
>   		if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING,
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 3b8130680749..f9178d2c2fd6 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -334,6 +334,12 @@ struct btrfs_bio {
>   	struct btrfs_device *device;
>   	u8 *csum;
>   	u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
> +	/*
> +	 * Saved bio::bi_iter before submission.
> +	 *
> +	 * This allows us to interate the cloned/split bio properly, as at
> +	 * endio time bio::bi_iter is no longer reliable.
> +	 */
>   	struct bvec_iter iter;
>   
>   	/*
> @@ -356,6 +362,18 @@ static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
>   	}
>   }
>   
> +/*
> + * To save bbio::bio->bi_iter into bbio::iter so for callers who need the
> + * original bi_iter can access the original part of the bio.
> + * This is especially important for the incoming split btrfs_bio, which needs
> + * to call its endio for and only for the split range.
> + */
> +static inline void btrfs_bio_save_iter(struct btrfs_bio *bbio)
> +{
> +	if (!bbio->iter.bi_size)
> +		bbio->iter = bbio->bio.bi_iter;
> +}
> +
>   struct btrfs_io_stripe {
>   	struct btrfs_device *dev;
>   	u64 physical;
>
diff mbox series

Patch

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index e776956d5bc9..cc8d13369f53 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -870,6 +870,9 @@  blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	/* include any pages we added in add_ra-bio_pages */
 	cb->len = bio->bi_iter.bi_size;
 
+	/* Save bi_iter so that end_bio_extent_readpage() won't freak out. */
+	btrfs_bio_save_iter(btrfs_bio(bio));
+
 	while (cur_disk_byte < disk_bytenr + compressed_len) {
 		u64 offset = cur_disk_byte - disk_bytenr;
 		unsigned int index = offset >> PAGE_SHIFT;
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0e239a4c3b26..13e726c88a81 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1731,6 +1731,7 @@  int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
 		return PTR_ERR(rbio);
 	}
 	bio_list_add(&rbio->bio_list, bio);
+	btrfs_bio_save_iter(btrfs_bio(bio));
 	rbio->bio_list_bytes = bio->bi_iter.bi_size;
 	rbio->operation = BTRFS_RBIO_WRITE;
 
@@ -2135,6 +2136,7 @@  int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 
 	rbio->operation = BTRFS_RBIO_READ_REBUILD;
 	bio_list_add(&rbio->bio_list, bio);
+	btrfs_bio_save_iter(btrfs_bio(bio));
 	rbio->bio_list_bytes = bio->bi_iter.bi_size;
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f38c230111be..b70037cc1a51 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6829,6 +6829,20 @@  blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		BUG();
 	}
 
+	/*
+	 * At endio time, bi_iter is no longer reliable, thus we have to save
+	 * current bi_iter into btrfs_bio so that even for split bio we can
+	 * iterate only the split part.
+	 *
+	 * And this has to be done before any bioc error, as endio functions
+	 * will rely on bbio::iter.
+	 *
+	 * For bio create by btrfs_bio_slit() or btrfs_bio_clone*(), it's
+	 * already set, but we can still have original bio which has its
+	 * iter not initialized.
+	 */
+	btrfs_bio_save_iter(btrfs_bio(bio));
+
 	for (dev_nr = 0; dev_nr < total_devs; dev_nr++) {
 		dev = bioc->stripes[dev_nr].dev;
 		if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING,
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3b8130680749..f9178d2c2fd6 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -334,6 +334,12 @@  struct btrfs_bio {
 	struct btrfs_device *device;
 	u8 *csum;
 	u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
+	/*
+	 * Saved bio::bi_iter before submission.
+	 *
+	 * This allows us to interate the cloned/split bio properly, as at
+	 * endio time bio::bi_iter is no longer reliable.
+	 */
 	struct bvec_iter iter;
 
 	/*
@@ -356,6 +362,18 @@  static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
 	}
 }
 
+/*
+ * To save bbio::bio->bi_iter into bbio::iter so for callers who need the
+ * original bi_iter can access the original part of the bio.
+ * This is especially important for the incoming split btrfs_bio, which needs
+ * to call its endio for and only for the split range.
+ */
+static inline void btrfs_bio_save_iter(struct btrfs_bio *bbio)
+{
+	if (!bbio->iter.bi_size)
+		bbio->iter = bbio->bio.bi_iter;
+}
+
 struct btrfs_io_stripe {
 	struct btrfs_device *dev;
 	u64 physical;