Message ID | 20190116115308.8687-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: Do mandatory tree block check before submitting bio | expand |
Hi, Qu, On Wed, Jan 16, 2019 at 07:53:08PM +0800, Qu Wenruo wrote: > There are at least 2 reports about memory bit flip sneaking into on-disk > data. > > Currently we only have a relaxed check triggered at > btrfs_mark_buffer_dirty() time, as it's not mandatory, only for > CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build. > > This patch will address the hole by triggering comprehensive check on > tree blocks before writing it back to disk. > > The timing is set to csum_tree_block() where @verify == 0. > At that timing, we're generation csum for tree blocks before submitting > the metadata bio, so we could avoid all the unnecessary calls at > btrfs_mark_buffer_dirty(), but still catch enough error. I agree wholeheartedly with the idea of this change. Just one question: How does this get reported to the user/administrator? As I understand it, a detectably corrupt metadata page will generate an I/O error from the filesystem before it's written to disk? How will this show up in kernel logs? Is it distinguishable in any way from a similar error that was generated on reading such a corrupt metadata node from the disk? Basically, I want to be able to distinguish this case (error detected when writing) from the existing case (error detected when reading) when someone shows up on IRC with a "broken filesystem". Hugo. > Reported-by: Leonard Lausen <leonard@lausen.nl> > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 8da2f380d3c0..45bf6be9e751 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -313,6 +313,16 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info, > return -EUCLEAN; > } > } else { > + /* > + * Here we're calculating csum before writing it to disk, > + * do comprenhensive check here to catch memory corruption > + */ > + if (btrfs_header_level(buf)) > + err = btrfs_check_node(fs_info, buf); > + else > + err = btrfs_check_leaf_full(fs_info, buf); > + if (err < 0) > + return err; > write_extent_buffer(buf, result, 0, csum_size); > } >
On 2019/1/16 下午8:01, Hugo Mills wrote: > Hi, Qu, > > On Wed, Jan 16, 2019 at 07:53:08PM +0800, Qu Wenruo wrote: >> There are at least 2 reports about memory bit flip sneaking into on-disk >> data. >> >> Currently we only have a relaxed check triggered at >> btrfs_mark_buffer_dirty() time, as it's not mandatory, only for >> CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build. >> >> This patch will address the hole by triggering comprehensive check on >> tree blocks before writing it back to disk. >> >> The timing is set to csum_tree_block() where @verify == 0. >> At that timing, we're generation csum for tree blocks before submitting >> the metadata bio, so we could avoid all the unnecessary calls at >> btrfs_mark_buffer_dirty(), but still catch enough error. > > I agree wholeheartedly with the idea of this change. Just one > question: > > How does this get reported to the user/administrator? As I > understand it, a detectably corrupt metadata page will generate an I/O > error from the filesystem before it's written to disk? How will this > show up in kernel logs? Well, you caught me. I haven't try the error case, and in fact if it fails, it fails by triggering kernel BUG_ON(), thus you may not have a chance to see the error message from btrfs module. > Is it distinguishable in any way from a > similar error that was generated on reading such a corrupt metadata > node from the disk? Kind of distinguishable for this patch, when you hit kernel BUG_ON() at fs/btrfs/extent_io.c:4016 then it's definitely from this patch. :P > > Basically, I want to be able to distinguish this case (error > detected when writing) from the existing case (error detected when > reading) when someone shows up on IRC with a "broken filesystem". Definitely, I'll address this and the BUG_ON() in next version. Thanks, Qu > > Hugo. > >> Reported-by: Leonard Lausen <leonard@lausen.nl> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/disk-io.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 8da2f380d3c0..45bf6be9e751 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -313,6 +313,16 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info, >> return -EUCLEAN; >> } >> } else { >> + /* >> + * Here we're calculating csum before writing it to disk, >> + * do comprenhensive check here to catch memory corruption >> + */ >> + if (btrfs_header_level(buf)) >> + err = btrfs_check_node(fs_info, buf); >> + else >> + err = btrfs_check_leaf_full(fs_info, buf); >> + if (err < 0) >> + return err; >> write_extent_buffer(buf, result, 0, csum_size); >> } >> >
On Wed, Jan 16, 2019 at 08:26:35PM +0800, Qu Wenruo wrote: > > > On 2019/1/16 下午8:01, Hugo Mills wrote: > > Hi, Qu, > > > > On Wed, Jan 16, 2019 at 07:53:08PM +0800, Qu Wenruo wrote: > >> There are at least 2 reports about memory bit flip sneaking into on-disk > >> data. > >> > >> Currently we only have a relaxed check triggered at > >> btrfs_mark_buffer_dirty() time, as it's not mandatory, only for > >> CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build. > >> > >> This patch will address the hole by triggering comprehensive check on > >> tree blocks before writing it back to disk. > >> > >> The timing is set to csum_tree_block() where @verify == 0. > >> At that timing, we're generation csum for tree blocks before submitting > >> the metadata bio, so we could avoid all the unnecessary calls at > >> btrfs_mark_buffer_dirty(), but still catch enough error. > > > > I agree wholeheartedly with the idea of this change. Just one > > question: > > > > How does this get reported to the user/administrator? As I > > understand it, a detectably corrupt metadata page will generate an I/O > > error from the filesystem before it's written to disk? How will this > > show up in kernel logs? > > Well, you caught me. > > I haven't try the error case, and in fact if it fails, it fails by > triggering kernel BUG_ON(), thus you may not have a chance to see the > error message from btrfs module. > > > Is it distinguishable in any way from a > > similar error that was generated on reading such a corrupt metadata > > node from the disk? > > Kind of distinguishable for this patch, when you hit kernel BUG_ON() at > fs/btrfs/extent_io.c:4016 then it's definitely from this patch. :P Haha. :) > > Basically, I want to be able to distinguish this case (error > > detected when writing) from the existing case (error detected when > > reading) when someone shows up on IRC with a "broken filesystem". > > Definitely, I'll address this and the BUG_ON() in next version. The error-on-read case gives us a pretty good report on what's wrong and where. Typically something like "bad key order" or "wrong item size", plus the logical address of the metadata chunk so we can dump it with debug-tree -b. Having the same messages from this call-site to indicate the kind of error found, and also something to indicate that it was detected before hitting disk (i.e. which call-site for the checks triggered the error) would be, I think, the minimal information needed. If we could also have the human-readable dump of the full metadata page logged as well, that would be ideal -- we won't be able to use debug-tree to diagnose the issue afterwards, as it won't have reached the disk. Thanks, Hugo. > Thanks, > Qu > > > > > Hugo. > > > >> Reported-by: Leonard Lausen <leonard@lausen.nl> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> fs/btrfs/disk-io.c | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > >> index 8da2f380d3c0..45bf6be9e751 100644 > >> --- a/fs/btrfs/disk-io.c > >> +++ b/fs/btrfs/disk-io.c > >> @@ -313,6 +313,16 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info, > >> return -EUCLEAN; > >> } > >> } else { > >> + /* > >> + * Here we're calculating csum before writing it to disk, > >> + * do comprenhensive check here to catch memory corruption > >> + */ > >> + if (btrfs_header_level(buf)) > >> + err = btrfs_check_node(fs_info, buf); > >> + else > >> + err = btrfs_check_leaf_full(fs_info, buf); > >> + if (err < 0) > >> + return err; > >> write_extent_buffer(buf, result, 0, csum_size); > >> } > >> > > >
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8da2f380d3c0..45bf6be9e751 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -313,6 +313,16 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info, return -EUCLEAN; } } else { + /* + * Here we're calculating csum before writing it to disk, + * do comprenhensive check here to catch memory corruption + */ + if (btrfs_header_level(buf)) + err = btrfs_check_node(fs_info, buf); + else + err = btrfs_check_leaf_full(fs_info, buf); + if (err < 0) + return err; write_extent_buffer(buf, result, 0, csum_size); }
There are at least 2 reports about memory bit flip sneaking into on-disk data. Currently we only have a relaxed check triggered at btrfs_mark_buffer_dirty() time, as it's not mandatory, only for CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build. This patch will address the hole by triggering comprehensive check on tree blocks before writing it back to disk. The timing is set to csum_tree_block() where @verify == 0. At that timing, we're generation csum for tree blocks before submitting the metadata bio, so we could avoid all the unnecessary calls at btrfs_mark_buffer_dirty(), but still catch enough error. Reported-by: Leonard Lausen <leonard@lausen.nl> Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 10 ++++++++++ 1 file changed, 10 insertions(+)