Message ID | 20191213040915.3502922-12-naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned block device support | expand |
On 12/12/19 11:08 PM, Naohiro Aota wrote: > If the btrfs volume has mirrored block groups, it unconditionally makes > un-mirrored block groups read only. When we have mirrored block groups, but > don't have writable block groups, this will drop all writable block groups. > So, check if we have at least one writable mirrored block group before > setting un-mirrored block groups read only. > > This change is necessary to handle e.g. xfstests btrfs/124 case. > > When we mount degraded RAID1 FS and write to it, and then re-mount with > full device, the write pointers of corresponding zones of written block > group differ. We mark such block group as "wp_broken" and make it read > only. In this situation, we only have read only RAID1 block groups because > of "wp_broken" and un-mirrored block groups are also marked read only, > because we have RAID1 block groups. As a result, all the block groups are > now read only, so that we cannot even start the rebalance to fix the > situation. I'm not sure I understand. In degraded mode we're writing to just one mirror of a RAID1 block group, correct? And this messes up the WP for the broken side, so it gets marked with wp_broken and thus RO. How does this patch help? The block groups are still marked RAID1 right? Or are new block groups allocated with SINGLE or RAID0? I'm confused. Thanks, Josef
On Tue, Dec 17, 2019 at 02:25:37PM -0500, Josef Bacik wrote: >On 12/12/19 11:08 PM, Naohiro Aota wrote: >>If the btrfs volume has mirrored block groups, it unconditionally makes >>un-mirrored block groups read only. When we have mirrored block groups, but >>don't have writable block groups, this will drop all writable block groups. >>So, check if we have at least one writable mirrored block group before >>setting un-mirrored block groups read only. >> >>This change is necessary to handle e.g. xfstests btrfs/124 case. >> >>When we mount degraded RAID1 FS and write to it, and then re-mount with >>full device, the write pointers of corresponding zones of written block >>group differ. We mark such block group as "wp_broken" and make it read >>only. In this situation, we only have read only RAID1 block groups because >>of "wp_broken" and un-mirrored block groups are also marked read only, >>because we have RAID1 block groups. As a result, all the block groups are >>now read only, so that we cannot even start the rebalance to fix the >>situation. > >I'm not sure I understand. In degraded mode we're writing to just one >mirror of a RAID1 block group, correct? And this messes up the WP for >the broken side, so it gets marked with wp_broken and thus RO. How >does this patch help? The block groups are still marked RAID1 right? >Or are new block groups allocated with SINGLE or RAID0? I'm confused. >Thanks, > >Josef First of all, I found that some recent change (maybe commit 112974d4067b ("btrfs: volumes: Remove ENOSPC-prone btrfs_can_relocate()")?) solved the issue, so we no longer need patch 11 and 12. So, I will drop these two in the next version. So, I think you may already have no interest on the answer, but just for a note... The situation was like this: * before degrading - All block groups are RAID1, working fine. * degraded mount - Block groups allocated before degrading are RAID1. Writes goes into RAID1 block group and break the write pointer. - Newly allocated block groups are SINGLE, since we only have one available device. * mount with the both drive again - RAID1 block groups are markd RO because of broken write pointer - SINGLE block groups are also marked RO because we have RAID1 block groups and at this point, btrfs was somehow unable to allocate new block group or to start blancing.
On 12/18/19 2:35 AM, Naohiro Aota wrote: > On Tue, Dec 17, 2019 at 02:25:37PM -0500, Josef Bacik wrote: >> On 12/12/19 11:08 PM, Naohiro Aota wrote: >>> If the btrfs volume has mirrored block groups, it unconditionally makes >>> un-mirrored block groups read only. When we have mirrored block groups, but >>> don't have writable block groups, this will drop all writable block groups. >>> So, check if we have at least one writable mirrored block group before >>> setting un-mirrored block groups read only. >>> >>> This change is necessary to handle e.g. xfstests btrfs/124 case. >>> >>> When we mount degraded RAID1 FS and write to it, and then re-mount with >>> full device, the write pointers of corresponding zones of written block >>> group differ. We mark such block group as "wp_broken" and make it read >>> only. In this situation, we only have read only RAID1 block groups because >>> of "wp_broken" and un-mirrored block groups are also marked read only, >>> because we have RAID1 block groups. As a result, all the block groups are >>> now read only, so that we cannot even start the rebalance to fix the >>> situation. >> >> I'm not sure I understand. In degraded mode we're writing to just one mirror >> of a RAID1 block group, correct? And this messes up the WP for the broken >> side, so it gets marked with wp_broken and thus RO. How does this patch >> help? The block groups are still marked RAID1 right? Or are new block groups >> allocated with SINGLE or RAID0? I'm confused. Thanks, >> >> Josef > > First of all, I found that some recent change (maybe commit > 112974d4067b ("btrfs: volumes: Remove ENOSPC-prone > btrfs_can_relocate()")?) solved the issue, so we no longer need patch > 11 and 12. So, I will drop these two in the next version. > > So, I think you may already have no interest on the answer, but just > for a note... The situation was like this: > > * before degrading > - All block groups are RAID1, working fine. > > * degraded mount > - Block groups allocated before degrading are RAID1. Writes goes > into RAID1 block group and break the write pointer. > - Newly allocated block groups are SINGLE, since we only have one > available device. > > * mount with the both drive again > - RAID1 block groups are markd RO because of broken write pointer > - SINGLE block groups are also marked RO because we have RAID1 block > groups > > and at this point, btrfs was somehow unable to allocate new block > group or to start blancing. Oooh ok I see, I had it in my head we would still allocate RAID1 chunks, but we allocate SINGLE, so that makes sense. Go ahead and drop those patches, and thanks for the explanation. Josef
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 5c04422f6f5a..b286359f3876 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1813,6 +1813,27 @@ static int read_one_block_group(struct btrfs_fs_info *info, return ret; } +/* + * have_mirrored_block_group - check if we have at least one writable + * mirrored Block Group + */ +static bool have_mirrored_block_group(struct btrfs_space_info *space_info) +{ + struct btrfs_block_group *block_group; + int i; + + for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) { + if (i == BTRFS_RAID_RAID0 || i == BTRFS_RAID_SINGLE) + continue; + list_for_each_entry(block_group, &space_info->block_groups[i], + list) { + if (!block_group->ro) + return true; + } + } + return false; +} + int btrfs_read_block_groups(struct btrfs_fs_info *info) { struct btrfs_path *path; @@ -1861,6 +1882,10 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) BTRFS_BLOCK_GROUP_RAID56_MASK | BTRFS_BLOCK_GROUP_DUP))) continue; + + if (!have_mirrored_block_group(space_info)) + continue; + /* * Avoid allocating from un-mirrored block group if there are * mirrored block groups.
If the btrfs volume has mirrored block groups, it unconditionally makes un-mirrored block groups read only. When we have mirrored block groups, but don't have writable block groups, this will drop all writable block groups. So, check if we have at least one writable mirrored block group before setting un-mirrored block groups read only. This change is necessary to handle e.g. xfstests btrfs/124 case. When we mount degraded RAID1 FS and write to it, and then re-mount with full device, the write pointers of corresponding zones of written block group differ. We mark such block group as "wp_broken" and make it read only. In this situation, we only have read only RAID1 block groups because of "wp_broken" and un-mirrored block groups are also marked read only, because we have RAID1 block groups. As a result, all the block groups are now read only, so that we cannot even start the rebalance to fix the situation. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/block-group.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)