diff mbox

block: note about cloned bios and bio_for_each_segment_all

Message ID 20170714134051.22756-1-dsterba@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Sterba July 14, 2017, 1:40 p.m. UTC
We've switched to cloned bios in btrfs and hit a nasty bug leading to
corruptions, when cloned bios are iterated by bio_for_each_segment_all.

Report and fix:
https://patchwork.kernel.org/patch/9838535/

As a matter of precaution, we've added assertions to btrfs code to catch
the bad usage pattern:

https://patchwork.kernel.org/patch/9839267/

The cloned/bi_vcnt behaviour seems tobe implementation dependent and is
not documented, so this patch at least warns about this one particular
case but this might still be insufficient.

CC: linux-block@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.com>
---
 include/linux/bio.h | 3 +++
 1 file changed, 3 insertions(+)

Comments

Ming Lei July 14, 2017, 1:47 p.m. UTC | #1
On Fri, Jul 14, 2017 at 9:40 PM, David Sterba <dsterba@suse.com> wrote:
> We've switched to cloned bios in btrfs and hit a nasty bug leading to
> corruptions, when cloned bios are iterated by bio_for_each_segment_all.

No, you simply can't use bio_for_each_segment_all on cloned bio, and the
reason is obviously.

>
> Report and fix:
> https://patchwork.kernel.org/patch/9838535/
>
> As a matter of precaution, we've added assertions to btrfs code to catch
> the bad usage pattern:
>
> https://patchwork.kernel.org/patch/9839267/
>
> The cloned/bi_vcnt behaviour seems tobe implementation dependent and is
> not documented, so this patch at least warns about this one particular
> case but this might still be insufficient.
>
> CC: linux-block@vger.kernel.org
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  include/linux/bio.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 7b1cf4ba0902..f1ac84edcf39 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -156,6 +156,9 @@ static inline void *bio_data(struct bio *bio)
>  /*
>   * drivers should _never_ use the all version - the bio may have been split
>   * before it got to the driver and the driver won't own all of it
> + *
> + * Note that cloned bios must not use this as their bi_vcnt may be invalid and
> + * this could lead to silent corruptions.
>   */
>  #define bio_for_each_segment_all(bvl, bio, i)                          \
>         for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
> --
> 2.13.0
>

Maybe we can add a warning here if it is a cloned bio.
David Sterba July 14, 2017, 3:03 p.m. UTC | #2
On Fri, Jul 14, 2017 at 09:47:30PM +0800, Ming Lei wrote:
> On Fri, Jul 14, 2017 at 9:40 PM, David Sterba <dsterba@suse.com> wrote:
> > We've switched to cloned bios in btrfs and hit a nasty bug leading to
> > corruptions, when cloned bios are iterated by bio_for_each_segment_all.
> 
> No, you simply can't use bio_for_each_segment_all on cloned bio, and the
> reason is obviously.

This was not obvious to us, speaking for the btrfs developers trying to
make more use of the of the bio API, so we had to find out the hard way.

The proposed WARN_ON, possibly more sanity checks or documentation would
help us not to trip over similar problems in the future. I try to take
great care when dealing with code changing bios on our side so I read
the headers, and partially the implementation, but still can miss
something.
Filipe Manana July 14, 2017, 5:56 p.m. UTC | #3
On 07/14/2017 04:03 PM, David Sterba wrote:
> On Fri, Jul 14, 2017 at 09:47:30PM +0800, Ming Lei wrote:
>> On Fri, Jul 14, 2017 at 9:40 PM, David Sterba <dsterba@suse.com> wrote:
>>> We've switched to cloned bios in btrfs and hit a nasty bug leading to
>>> corruptions, when cloned bios are iterated by bio_for_each_segment_all.
>>
>> No, you simply can't use bio_for_each_segment_all on cloned bio, and the
>> reason is obviously.
> 
> This was not obvious to us, speaking for the btrfs developers trying to
> make more use of the of the bio API, so we had to find out the hard way.

Yep, it might be obvious to those familiar with the block layer's
internals, but for those not so familiar, it's not. There's no mention
in bio_clone_fast() that the cloned bio's bi_vcnt shouldn't be used, and
after finding that, one has to check which bio APIs use it and which
don't. In this specific btrfs issue, it lead to silent write
corruptions, making it harder to find (as opposed to crashes or other
immediate failures).

> 
> The proposed WARN_ON, possibly more sanity checks or documentation would
> help us not to trip over similar problems in the future. I try to take
> great care when dealing with code changing bios on our side so I read
> the headers, and partially the implementation, but still can miss
> something.
>
Jens Axboe July 14, 2017, 6:19 p.m. UTC | #4
On 07/14/2017 11:56 AM, Filipe Manana wrote:
> 
> 
> On 07/14/2017 04:03 PM, David Sterba wrote:
>> On Fri, Jul 14, 2017 at 09:47:30PM +0800, Ming Lei wrote:
>>> On Fri, Jul 14, 2017 at 9:40 PM, David Sterba <dsterba@suse.com> wrote:
>>>> We've switched to cloned bios in btrfs and hit a nasty bug leading to
>>>> corruptions, when cloned bios are iterated by bio_for_each_segment_all.
>>>
>>> No, you simply can't use bio_for_each_segment_all on cloned bio, and the
>>> reason is obviously.
>>
>> This was not obvious to us, speaking for the btrfs developers trying to
>> make more use of the of the bio API, so we had to find out the hard way.
> 
> Yep, it might be obvious to those familiar with the block layer's
> internals, but for those not so familiar, it's not. There's no mention
> in bio_clone_fast() that the cloned bio's bi_vcnt shouldn't be used,
> and after finding that, one has to check which bio APIs use it and
> which don't. In this specific btrfs issue, it lead to silent write
> corruptions, making it harder to find (as opposed to crashes or other
> immediate failures).

It's hard to circulate info like that, but the WARN_ON() should have
been there from the get-go. I just need someone to test that patch
triggers for the problematic case, then I'd be happy to get it queued
up.
diff mbox

Patch

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7b1cf4ba0902..f1ac84edcf39 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -156,6 +156,9 @@  static inline void *bio_data(struct bio *bio)
 /*
  * drivers should _never_ use the all version - the bio may have been split
  * before it got to the driver and the driver won't own all of it
+ *
+ * Note that cloned bios must not use this as their bi_vcnt may be invalid and
+ * this could lead to silent corruptions.
  */
 #define bio_for_each_segment_all(bvl, bio, i)				\
 	for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)