Message ID | 20191212110132.11063-1-Damenly_Su@gmx.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: metadata uuid fixes and enhancements | expand |
On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote: > From: Su Yue <Damenly_Su@gmx.com> > > This patchset fixes one reproducible bug and add two split-brain > cases ignored. > > The origin code thinks the final state of successful synced device is > always has INCOMPAT_METADATA_UUID feature. However, a device without > the feature flag can be the one pull into disk. This is what handled > in the patchset. Test images are added in btrfs-progs part. > > Patch[1] fixes a bug about wrong fsid copy. > Patch[2] is for the later patches. > Patch[3-5] add the forgotten cases. > Patch[6] just does simple code movement for grace. > > The set passes xfstests-dev without regressions. > > Su Yue (6): > btrfs: metadata_uuid: fix failed assertion due to unsuccessful device > scan > btrfs: metadata_uuid: move split-brain handling from fs_id() to new > function > btrfs: split-brain case for scanned changing device with > INCOMPAT_METADATA_UUID > btrfs: split-brain case for scanned changed device without > INCOMPAT_METADATA_UUID > btrfs: copy fsid and metadata_uuid for pulled disk without > INCOMPAT_METADATA_UUID > btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() > > fs/btrfs/volumes.c | 193 +++++++++++++++++++++++++++++---------------- > 1 file changed, 125 insertions(+), 68 deletions(-) > I'm currently on holiday but the fsid change feature has a design document here: https://github.com/btrfs/btrfs-dev-docs/blob/master/fsid-change.txt it lists all the cases I have handled. If you think there are other please first describe them in prose following the parlance set out in the document to ease reasoning.
On 2019/12/13 4:03 PM, Nikolay Borisov wrote: > > > On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote: >> From: Su Yue <Damenly_Su@gmx.com> >> >> This patchset fixes one reproducible bug and add two split-brain >> cases ignored. >> >> The origin code thinks the final state of successful synced device is >> always has INCOMPAT_METADATA_UUID feature. However, a device without >> the feature flag can be the one pull into disk. This is what handled >> in the patchset. Test images are added in btrfs-progs part. >> >> Patch[1] fixes a bug about wrong fsid copy. >> Patch[2] is for the later patches. >> Patch[3-5] add the forgotten cases. >> Patch[6] just does simple code movement for grace. >> >> The set passes xfstests-dev without regressions. >> >> Su Yue (6): >> btrfs: metadata_uuid: fix failed assertion due to unsuccessful device >> scan >> btrfs: metadata_uuid: move split-brain handling from fs_id() to new >> function >> btrfs: split-brain case for scanned changing device with >> INCOMPAT_METADATA_UUID >> btrfs: split-brain case for scanned changed device without >> INCOMPAT_METADATA_UUID >> btrfs: copy fsid and metadata_uuid for pulled disk without >> INCOMPAT_METADATA_UUID >> btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() >> >> fs/btrfs/volumes.c | 193 +++++++++++++++++++++++++++++---------------- >> 1 file changed, 125 insertions(+), 68 deletions(-) >> > > > I'm currently on holiday but the fsid change feature has a design > document here: > https://github.com/btrfs/btrfs-dev-docs/blob/master/fsid-change.txt > > it lists all the cases I have handled. If you think there are other > please first describe them in prose following the parlance set out in > the document to ease reasoning. > Thanks. I am going to do it. The following is just a rough version for people to understand. The pull request version will be more official like expressions in btrfs-dev-docs. > 4. Failure during transaction x + 1. When such a failure happens the > filesystem in question will be partitioned in two sets P and Q. Where P > would have the C flag and a new value for fsid written to it as well as the > old FSID value written in the ‘metadata_uuid’ field. In contrast Q would > have just the old fsid and the IP flag, also the superblock’s generation > number of disks in P will be higher than those in Q. Again two cases needs > to be handled: I won't argue the Q has the old fsid and IP flag. There is another state of P. 0) There are two devices with fsid A, without metadata_uuid. E.g d1[A, 0, 0] d2[A, 0, 0] (The first element is the FSID, the 2nd is METADATA_UUID, the 3rd is the incompat flag bits) 1) After running "btrfstune -M B": d1[B, A, METADATA_UUID] d2[B, A, METADATA_UUID] 2) During "btrfstune -M a", 2.1) After first btrfs_commit_transaction() of set_metadata_uuid() finished: d1[B, A, METADATA_UUID | FSID_CHANGING_V2] d2[B, A, METADATA_UUID | FSID_CHANGING_V2] 2.2) Then run into the branch btrfstune.c: line 141. if (new_uuid && uuid_changed && memcmp(disk_super->metadata_uuid, new_fsid, BTRFS_FSID_SIZE) == 0) { /* * Changing fsid to be the same as metadata uuid, so just * disable the flag */ memcpy(disk_super->fsid, &new_fsid, BTRFS_FSID_SIZE); incompat_flags &= ~BTRFS_FEATURE_INCOMPAT_METADATA_UUID; btrfs_set_super_incompat_flags(disk_super, incompat_flags); memset(disk_super->metadata_uuid, 0, BTRFS_FSID_SIZE); Now @disk_super is [A, 0, FSID_CHANGING_V2], without METADATA_UUID flag. 2.3) Then we go to the final btrfs_commit_transaction() of set_metadata_uuid(). But powerloss happened, and only d1 get synced: d1[A, 0, 0] ---> P d2[B, A, METADATA_UUID | FSID_CHANGING_V2] ---> Q So there are 2 cases you forgot to consider. - P is scanned first, then Q. - Q is scanned first, then P.
From: Su Yue <Damenly_Su@gmx.com> This patchset fixes one reproducible bug and add two split-brain cases ignored. The origin code thinks the final state of successful synced device is always has INCOMPAT_METADATA_UUID feature. However, a device without the feature flag can be the one pull into disk. This is what handled in the patchset. Test images are added in btrfs-progs part. Patch[1] fixes a bug about wrong fsid copy. Patch[2] is for the later patches. Patch[3-5] add the forgotten cases. Patch[6] just does simple code movement for grace. The set passes xfstests-dev without regressions. Su Yue (6): btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan btrfs: metadata_uuid: move split-brain handling from fs_id() to new function btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID btrfs: split-brain case for scanned changed device without INCOMPAT_METADATA_UUID btrfs: copy fsid and metadata_uuid for pulled disk without INCOMPAT_METADATA_UUID btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() fs/btrfs/volumes.c | 193 +++++++++++++++++++++++++++++---------------- 1 file changed, 125 insertions(+), 68 deletions(-)