Message ID | 502d2273052e95e19366d785ee85e542e86fe61e.1606938211.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Cleanup error handling in relocation | expand |
On 2020/12/3 上午3:50, Josef Bacik wrote: > While doing error injection I would sometimes get a corrupt file system. > This is because I was injecting errors at btrfs_search_slot, but would > only do it one time per stack. This uncovered a problem in > commit_fs_roots, where if we get an error we would just break. However > we're in a nested loop, the main loop being a loop to find all the dirty > fs roots, and then subsequent root updates would succeed clearing the > error value. > > This isn't likely to happen in real scenarios, however we could > potentially get a random ENOMEM once and then not again, and we'd end up > with a corrupted file system. Fix this by moving the error checking > around a bit to the nested loop, as this is the only place where > something will fail, and return the error as soon as it occurs. > > With this patch my reproducer no longer corrupts the file system. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Yep, that err can be overwritten by next loop, so definitely a problem. Thanks, Qu > --- > fs/btrfs/transaction.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index 8e0f7a1029c6..a614f7699ce4 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -1319,7 +1319,6 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) > struct btrfs_root *gang[8]; > int i; > int ret; > - int err = 0; > > spin_lock(&fs_info->fs_roots_radix_lock); > while (1) { > @@ -1331,6 +1330,8 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) > break; > for (i = 0; i < ret; i++) { > struct btrfs_root *root = gang[i]; > + int err; > + > radix_tree_tag_clear(&fs_info->fs_roots_radix, > (unsigned long)root->root_key.objectid, > BTRFS_ROOT_TRANS_TAG); > @@ -1353,14 +1354,14 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) > err = btrfs_update_root(trans, fs_info->tree_root, > &root->root_key, > &root->root_item); > - spin_lock(&fs_info->fs_roots_radix_lock); > if (err) > - break; > + return err; > + spin_lock(&fs_info->fs_roots_radix_lock); > btrfs_qgroup_free_meta_all_pertrans(root); > } > } > spin_unlock(&fs_info->fs_roots_radix_lock); > - return err; > + return 0; > } > > /* >
On 02/12/2020 20:54, Josef Bacik wrote: > While doing error injection I would sometimes get a corrupt file system. > This is because I was injecting errors at btrfs_search_slot, but would > only do it one time per stack. This uncovered a problem in > commit_fs_roots, where if we get an error we would just break. However > we're in a nested loop, the main loop being a loop to find all the dirty > fs roots, and then subsequent root updates would succeed clearing the > error value. > > This isn't likely to happen in real scenarios, however we could > potentially get a random ENOMEM once and then not again, and we'd end up > with a corrupted file system. Fix this by moving the error checking > around a bit to the nested loop, as this is the only place where > something will fail, and return the error as soon as it occurs. > > With this patch my reproducer no longer corrupts the file system. Better to abort the transaction than to corrupt the FS, Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8e0f7a1029c6..a614f7699ce4 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1319,7 +1319,6 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) struct btrfs_root *gang[8]; int i; int ret; - int err = 0; spin_lock(&fs_info->fs_roots_radix_lock); while (1) { @@ -1331,6 +1330,8 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) break; for (i = 0; i < ret; i++) { struct btrfs_root *root = gang[i]; + int err; + radix_tree_tag_clear(&fs_info->fs_roots_radix, (unsigned long)root->root_key.objectid, BTRFS_ROOT_TRANS_TAG); @@ -1353,14 +1354,14 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) err = btrfs_update_root(trans, fs_info->tree_root, &root->root_key, &root->root_item); - spin_lock(&fs_info->fs_roots_radix_lock); if (err) - break; + return err; + spin_lock(&fs_info->fs_roots_radix_lock); btrfs_qgroup_free_meta_all_pertrans(root); } } spin_unlock(&fs_info->fs_roots_radix_lock); - return err; + return 0; } /*
While doing error injection I would sometimes get a corrupt file system. This is because I was injecting errors at btrfs_search_slot, but would only do it one time per stack. This uncovered a problem in commit_fs_roots, where if we get an error we would just break. However we're in a nested loop, the main loop being a loop to find all the dirty fs roots, and then subsequent root updates would succeed clearing the error value. This isn't likely to happen in real scenarios, however we could potentially get a random ENOMEM once and then not again, and we'd end up with a corrupted file system. Fix this by moving the error checking around a bit to the nested loop, as this is the only place where something will fail, and return the error as soon as it occurs. With this patch my reproducer no longer corrupts the file system. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/transaction.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)