Message ID | 20180417014719.3799-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> v3: > Update commit message to show the corruption in details. > Modify the kernel error message to show corruption is detected before > transaction commitment. Nice. Thanks. more below. > @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device *device, > > btrfs_set_super_bytenr(sb, bytenr); > > + /* check the validation of the primary sb before writing */ > + if (i == 0) { > + ret = btrfs_check_super_valid(device->fs_info, sb); > + if (ret) { > + btrfs_err(device->fs_info, > +"superblock corruption detected before transaction commitment for device %llu", > + device->devid); > + return -EUCLEAN; > + } Why not move this entire check further below, after we have the ready crc and use btrfs_check_super_csum(), instead of btrfs_check_super_valid()? so that we verify only what is known to be corrupted that is .. btrfs_super_block { :: __le64 incompat_flags; __le16 csum_type; :: } And also can you dump contents of incompat_flags and csum_type at both fs_info->super_copy and fs_info->super_for_commit Because at each commit transaction we btrfs_commit_transaction() { :: memcpy(fs_info->super_for_commit, fs_info->super_copy, sizeof(*fs_info->super_copy)); :: ret = write_all_supers(fs_info, 0); } And also the sync log can write the btrfs_sync_log() { :: ret = write_all_supers(fs_info, 1); Finally locks between these two threads needs a review as well. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018年04月17日 17:05, Anand Jain wrote: > >> v3: >> Update commit message to show the corruption in details. >> Modify the kernel error message to show corruption is detected before >> transaction commitment. > Nice. Thanks. more below. > >> @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device >> *device, >> btrfs_set_super_bytenr(sb, bytenr); >> + /* check the validation of the primary sb before writing */ >> + if (i == 0) { >> + ret = btrfs_check_super_valid(device->fs_info, sb); >> + if (ret) { >> + btrfs_err(device->fs_info, >> +"superblock corruption detected before transaction commitment for >> device %llu", >> + device->devid); >> + return -EUCLEAN; >> + } > > Why not move this entire check further below, after we have the ready > crc and use btrfs_check_super_csum(), instead of > btrfs_check_super_valid()? so that we verify only what is known to be > corrupted that is .. The problem is, we don't know the cause yet, so we must check the whole superblock. For example, if the corruption is caused by some wild pointer of other kernel module, and we're just unlucky that one day it corrupts nodesize, then we can't detect it if we only check certain members. > > btrfs_super_block { > :: > __le64 incompat_flags; > __le16 csum_type; > :: > } > > And also can you dump contents of incompat_flags and csum_type at both > fs_info->super_copy > and > fs_info->super_for_commit Not really needed, as when corruption happens, it's super_copy corrupted, not something went wrong after we called memcpy() Thanks, Qu > > Because at each commit transaction we > > btrfs_commit_transaction() > { > :: > memcpy(fs_info->super_for_commit, fs_info->super_copy, > sizeof(*fs_info->super_copy)); > :: > ret = write_all_supers(fs_info, 0); > > } > > And also the sync log can write the > > btrfs_sync_log() > { > :: > ret = write_all_supers(fs_info, 1); > > > Finally locks between these two threads needs a review as well. > > Thanks, Anand > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/17/2018 05:58 PM, Qu Wenruo wrote: > > > On 2018年04月17日 17:05, Anand Jain wrote: >> >>> v3: >>> Update commit message to show the corruption in details. >>> Modify the kernel error message to show corruption is detected before >>> transaction commitment. >> Nice. Thanks. more below. >> >>> @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device >>> *device, >>> btrfs_set_super_bytenr(sb, bytenr); >>> + /* check the validation of the primary sb before writing */ >>> + if (i == 0) { >>> + ret = btrfs_check_super_valid(device->fs_info, sb); >>> + if (ret) { >>> + btrfs_err(device->fs_info, >>> +"superblock corruption detected before transaction commitment for >>> device %llu", >>> + device->devid); >>> + return -EUCLEAN; >>> + } >> >> Why not move this entire check further below, after we have the ready >> crc and use btrfs_check_super_csum(), instead of >> btrfs_check_super_valid()? so that we verify only what is known to be >> corrupted that is .. > > The problem is, we don't know the cause yet, so we must check the whole > superblock. > > For example, if the corruption is caused by some wild pointer of other > kernel module, and we're just unlucky that one day it corrupts nodesize, > then we can't detect it if we only check certain members. Right I notice that. But without btrfs_check_super_csum(), it leaves out checking one of the member (csum_type) which is know to be corrupted at the two instances, so it can also include btrfs_check_super_csum(). There were two cases, both of them corrupted the same offset, its not just a coincidence that both of these reported corrupted the same offset. Also the incompatible features flags (169) are still valid in both the cases. It looks as if we wrote u32 to a u64. I notice that we provide the options to write the incompatible flags through mount option, sysfs and ioctl. >> btrfs_super_block { >> :: >> __le64 incompat_flags; >> __le16 csum_type; >> :: >> } >> >> And also can you dump contents of incompat_flags and csum_type at both >> fs_info->super_copy >> and >> fs_info->super_for_commit > > Not really needed, as when corruption happens, it's super_copy > corrupted, not something went wrong after we called memcpy() As shown below, we aren't memcpy()-ing in the btrfs_sync_log() thread, did you check if btrfs_sync_log() can not be the last person to write at umount? Thanks, Anand > Thanks, > Qu > >> >> Because at each commit transaction we >> >> btrfs_commit_transaction() >> { >> :: >> memcpy(fs_info->super_for_commit, fs_info->super_copy, >> sizeof(*fs_info->super_copy)); >> :: >> ret = write_all_supers(fs_info, 0); >> >> } >> >> And also the sync log can write the >> >> btrfs_sync_log() >> { >> :: >> ret = write_all_supers(fs_info, 1); >> >> >> Finally locks between these two threads needs a review as well. >> >> Thanks, Anand >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018年04月17日 22:32, Anand Jain wrote: > > > On 04/17/2018 05:58 PM, Qu Wenruo wrote: >> >> >> On 2018年04月17日 17:05, Anand Jain wrote: >>> >>>> v3: >>>> Update commit message to show the corruption in details. >>>> Modify the kernel error message to show corruption is detected >>>> before >>>> transaction commitment. >>> Nice. Thanks. more below. >>> >>>> @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device >>>> *device, >>>> btrfs_set_super_bytenr(sb, bytenr); >>>> + /* check the validation of the primary sb before writing */ >>>> + if (i == 0) { >>>> + ret = btrfs_check_super_valid(device->fs_info, sb); >>>> + if (ret) { >>>> + btrfs_err(device->fs_info, >>>> +"superblock corruption detected before transaction commitment for >>>> device %llu", >>>> + device->devid); >>>> + return -EUCLEAN; >>>> + } >>> >>> Why not move this entire check further below, after we have the ready >>> crc and use btrfs_check_super_csum(), instead of >>> btrfs_check_super_valid()? so that we verify only what is known to be >>> corrupted that is .. >> >> The problem is, we don't know the cause yet, so we must check the whole >> superblock. >> >> For example, if the corruption is caused by some wild pointer of other >> kernel module, and we're just unlucky that one day it corrupts nodesize, >> then we can't detect it if we only check certain members. > > Right I notice that. > > But without btrfs_check_super_csum(), it leaves out checking one of the > member (csum_type) which is know to be corrupted at the two instances, > so it can also include btrfs_check_super_csum(). > > There were two cases, both of them corrupted the same offset, its not > just a coincidence that both of these reported corrupted the same > offset. Yep, but since we're here to do extra verification, checking everything is never a bad idea. By this we don't need to bother checking other members when new corruption pops out. > > Also the incompatible features flags (169) are still valid in both the > cases. It looks as if we wrote u32 to a u64. I notice that we provide > the options to write the incompatible flags through mount option, sysfs > and ioctl. While I don't think that's the cause of these reported corruption. I'd prefer some under/over flow of memory which corrupted fs_info->super_copy somehow. It may be btrfs or it may not. It's pretty hard to determine with just 2 reports. Especially for ben's report, he is using latest vega graphics IIRC, who knows what could went wrong with latest amd drm codes. > > >>> btrfs_super_block { >>> :: >>> __le64 incompat_flags; >>> __le16 csum_type; >>> :: >>> } >>> >>> And also can you dump contents of incompat_flags and csum_type at both >>> fs_info->super_copy >>> and >>> fs_info->super_for_commit >> >> Not really needed, as when corruption happens, it's super_copy >> corrupted, not something went wrong after we called memcpy() > > As shown below, we aren't memcpy()-ing in the btrfs_sync_log() thread, > did you check if btrfs_sync_log() can not be the last person to write > at umount? I checked the dump super output, where log_tree output is all 0, means no log tree, hence not btrfs_sync_log() caused the problem. From Ben's: ------ chunk_root 5518540881920 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 ------ And from Ken's ------ chunk_root 21004288 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 ------ So at least for current only reports, it's not btrfs_sync_log() causing the problem. Thanks, Qu > > Thanks, Anand > >> Thanks, >> Qu >> >>> >>> Because at each commit transaction we >>> >>> btrfs_commit_transaction() >>> { >>> :: >>> memcpy(fs_info->super_for_commit, fs_info->super_copy, >>> sizeof(*fs_info->super_copy)); >>> :: >>> ret = write_all_supers(fs_info, 0); >>> >>> } >>> >>> And also the sync log can write the >>> >>> btrfs_sync_log() >>> { >>> :: >>> ret = write_all_supers(fs_info, 1); >>> >>> >>> Finally locks between these two threads needs a review as well. >>> >>> Thanks, Anand >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>
On Tue, Apr 17, 2018 at 09:47:19AM +0800, Qu Wenruo wrote: > @@ -2680,7 +2681,7 @@ int open_ctree(struct super_block *sb, > > memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE); > > - ret = btrfs_check_super_valid(fs_info); > + ret = btrfs_check_super_valid(fs_info, fs_info->super_copy); > if (ret) { > btrfs_err(fs_info, "superblock contains fatal errors"); > err = -EINVAL; > @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device *device, This is in write_dev_supers, so the superblock is checked number-of-devices times. The caller write_all_supers rewrites the device item so it matches the device it's going to write to. But, btrfs_check_super_valid does not validate the dev_item so all the validation does not bring much benefit, as it repeatedly checks the same data. So, what if the validation is done only once in write_all_supers? Lock the devices, validate, if it fails, report that and unlock devices and go readonly. There's a differnce to what you implemented: if the in-memory superblock corruption happens between writing to the devices, there are some left with the new superblock and some with the old. Although this sounds quite improbable, I think that doing the check in advance would save some trouble if that happens. The superblocks on all devices will match. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018年04月19日 06:04, David Sterba wrote: > On Tue, Apr 17, 2018 at 09:47:19AM +0800, Qu Wenruo wrote: >> @@ -2680,7 +2681,7 @@ int open_ctree(struct super_block *sb, >> >> memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE); >> >> - ret = btrfs_check_super_valid(fs_info); >> + ret = btrfs_check_super_valid(fs_info, fs_info->super_copy); >> if (ret) { >> btrfs_err(fs_info, "superblock contains fatal errors"); >> err = -EINVAL; >> @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device *device, > > This is in write_dev_supers, so the superblock is checked > number-of-devices times. The caller write_all_supers rewrites the device > item so it matches the device it's going to write to. But, > btrfs_check_super_valid does not validate the dev_item so all the > validation does not bring much benefit, as it repeatedly checks the same > data. > > So, what if the validation is done only once in write_all_supers? Lock > the devices, validate, if it fails, report that and unlock devices and > go readonly. Makes sense. I'll update btrfs_check_super_valid() to cooperate with that in next update. Thanks, Qu > > There's a differnce to what you implemented: if the in-memory superblock > corruption happens between writing to the devices, there are some left > with the new superblock and some with the old. > > Although this sounds quite improbable, I think that doing the check in > advance would save some trouble if that happens. The superblocks on all > devices will match. >
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 23803102aa0d..2d543ba2b7af 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -68,7 +68,8 @@ static const struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void free_fs_root(struct btrfs_root *root); -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info); +static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, + struct btrfs_super_block *sb); static void btrfs_destroy_ordered_extents(struct btrfs_root *root); static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, struct btrfs_fs_info *fs_info); @@ -2680,7 +2681,7 @@ int open_ctree(struct super_block *sb, memcpy(fs_info->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE); - ret = btrfs_check_super_valid(fs_info); + ret = btrfs_check_super_valid(fs_info, fs_info->super_copy); if (ret) { btrfs_err(fs_info, "superblock contains fatal errors"); err = -EINVAL; @@ -3310,6 +3311,27 @@ static int write_dev_supers(struct btrfs_device *device, btrfs_set_super_bytenr(sb, bytenr); + /* check the validation of the primary sb before writing */ + if (i == 0) { + ret = btrfs_check_super_valid(device->fs_info, sb); + if (ret) { + btrfs_err(device->fs_info, +"superblock corruption detected before transaction commitment for device %llu", + device->devid); + return -EUCLEAN; + } + /* + * Unknown incompat flags can't be mounted, so newly + * developed flags means corruption + */ + if (btrfs_super_incompat_flags(sb) & + ~BTRFS_FEATURE_INCOMPAT_SUPP) { + btrfs_err(device->fs_info, +"superblock corruption detected before transaction commitment for device %llu", + device->devid); + return -EUCLEAN; + } + } crc = ~(u32)0; crc = btrfs_csum_data((const char *)sb + BTRFS_CSUM_SIZE, crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE); @@ -3985,9 +4007,9 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level, level, first_key); } -static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) +static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info, + struct btrfs_super_block *sb) { - struct btrfs_super_block *sb = fs_info->super_copy; u64 nodesize = btrfs_super_nodesize(sb); u64 sectorsize = btrfs_super_sectorsize(sb); int ret = 0;
There are already 2 reports about strangely corrupted super blocks, where csum still matches but extra garbage gets slipped into super block. The corruption would looks like: ------ superblock: bytenr=65536, device=/dev/sdc1 --------------------------------------------------------- csum_type 41700 (INVALID) csum 0x3b252d3a [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] ... incompat_flags 0x5b22400000000169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA | unknown flag: 0x5b22400000000000 ) ... ------ Or ------ superblock: bytenr=65536, device=/dev/mapper/x --------------------------------------------------------- csum_type 35355 (INVALID) csum_size 32 csum 0xf0dbeddd [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] ... incompat_flags 0x176d200000000169 ( MIXED_BACKREF | COMPRESS_LZO | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA | unknown flag: 0x176d200000000000 ) ------ Obviously, csum_type and incompat_flags get some garbage, but its csum still matches, which means kernel calculates the csum based on corrupted super block memory. And after manually fixing these values, the filesystem is completely healthy without any problem exposed by btrfs check. Although the cause is still unknown, at least detect it and prevent further corruption. Reported-by: Ken Swenson <flat@imo.uto.moe> Reported-by: Ben Parsons <9parsonsb@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> --- changelog: v2: Fix false alerts by moving the check to write_dev_supers() as btrfs_check_super_valid() only handles the primary superblock. v3: Update commit message to show the corruption in details. Modify the kernel error message to show corruption is detected before transaction commitment. --- fs/btrfs/disk-io.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-)