Message ID | 50E9C267.3050302@inwind.it (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Jan 06, 2013 at 07:28:55PM +0100, Goffredo Baroncelli wrote: > Currently wipefs doesn't clear all the superblock of btrfs. Only the first > one is cleared. > > Btrfs has three superblocks. The first one is placed at 64KB, the second > one at 64MB, the third one at 256GB. It can have as much as 4 superblock backup copies: Superblock offset 0 is 65536 (0x10000, block=16/0x10) Superblock offset 1 is 67108864 (0x4000000, block=16384/0x4000) Superblock offset 2 is 274877906944 (0x4000000000, block=67108864/0x4000000) Superblock offset 3 is 1125899906842624 (0x4000000000000, block=274877906944/0x4000000000) Superblock offset 4 is 4611686018427387904 (0x4000000000000000, block=1125899906842624/0x4000000000000) > If the first superblock is valid except that the "magic field" is zeroed, > btrfs skips the check of the other superblocks. > If the first superblock is fully invalid, btrfs checks for the other > superblock. > > So zeroing the first superblock "magic field" at the beginning seems > that the filesystem is wiped. But when the first superblock is overwritten > (e.g. by another filesystem), then the other two superblocks may be considered > valid, and the filesystem may resurrect. And for that purpose all the superblock copies should be taken into account, regardless of the tricks that btrfs_mount applies. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, On 01/07/2013 05:33 PM, David Sterba wrote: > On Sun, Jan 06, 2013 at 07:28:55PM +0100, Goffredo Baroncelli wrote: >> Currently wipefs doesn't clear all the superblock of btrfs. Only the first >> one is cleared. >> >> Btrfs has three superblocks. The first one is placed at 64KB, the second >> one at 64MB, the third one at 256GB. > > It can have as much as 4 superblock backup copies: > > Superblock offset 0 is 65536 (0x10000, block=16/0x10) > Superblock offset 1 is 67108864 (0x4000000, block=16384/0x4000) > Superblock offset 2 is 274877906944 (0x4000000000, block=67108864/0x4000000) > Superblock offset 3 is 1125899906842624 (0x4000000000000, block=274877906944/0x4000000000) > Superblock offset 4 is 4611686018427387904 (0x4000000000000000, block=1125899906842624/0x4000000000000) Are you sure ? Regarding the btrfs-progs suite, I looked at the btrfs_read_dev_super(): [..] for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); ret = pread64(fd, &buf, sizeof(buf), bytenr); Where BTRFS_SUPER_MIRROR_MAX is 3. Regarding the kernel code, I looked at several function which call btrfs_sb_offset(); everywhere there is an upper limit of the superblock numbero which is BTRFS_SUPER_MIRROR_MAX, which is still 3. Moreover I performed the following test: $ ls -lh 7tb-filesystem.img -rw-r--r-- 1 ghigo ghigo 7.1E Jan 7 18:49 7eb-filesystem.img $ /sbin/mkfs.btrfs 7eb-filesystem.img $ cat extract-sign.py import os BTRFS_SUPER_MIRROR_SHIFT = 12 BTRFS_SUPER_INFO_OFFSET = (64*1024) def btrfs_sb_offset(mirror): start = 16*1024 if(mirror): return start << (BTRFS_SUPER_MIRROR_SHIFT * mirror) return BTRFS_SUPER_INFO_OFFSET f = open("7eb-filesystem.img","r") for i in range(5): pos = btrfs_sb_offset(i)+64 f.seek(pos) sign = f.read(8) print "Superblock #%d - %20d - '%s'"%(i,pos,sign) $ python extract-sign.py Superblock #0 - 65600 - '_BHRfS_M' Superblock #1 - 67108928 - '_BHRfS_M' Superblock #2 - 274877907008 - '_BHRfS_M' Superblock #3 - 1125899906842688 - '' Superblock #4 - 4611686018427387968 - '' To me it seems that in a 7TB filesystem there is only 3 superblocks. > >> If the first superblock is valid except that the "magic field" is zeroed, >> btrfs skips the check of the other superblocks. >> If the first superblock is fully invalid, btrfs checks for the other >> superblock. >> >> So zeroing the first superblock "magic field" at the beginning seems >> that the filesystem is wiped. But when the first superblock is overwritten >> (e.g. by another filesystem), then the other two superblocks may be considered >> valid, and the filesystem may resurrect. > > And for that purpose all the superblock copies should be taken into > account, regardless of the tricks that btrfs_mount applies. > > > david >
On Mon, Jan 07, 2013 at 07:20:16PM +0100, Goffredo Baroncelli wrote: > Hi David, > > On 01/07/2013 05:33 PM, David Sterba wrote: > > On Sun, Jan 06, 2013 at 07:28:55PM +0100, Goffredo Baroncelli wrote: > >> Currently wipefs doesn't clear all the superblock of btrfs. Only the first > >> one is cleared. > >> > >> Btrfs has three superblocks. The first one is placed at 64KB, the second > >> one at 64MB, the third one at 256GB. > > > > It can have as much as 4 superblock backup copies: > > > > Superblock offset 0 is 65536 (0x10000, block=16/0x10) > > Superblock offset 1 is 67108864 (0x4000000, block=16384/0x4000) > > Superblock offset 2 is 274877906944 (0x4000000000, block=67108864/0x4000000) > > Superblock offset 3 is 1125899906842624 (0x4000000000000, block=274877906944/0x4000000000) > > Superblock offset 4 is 4611686018427387904 (0x4000000000000000, block=1125899906842624/0x4000000000000) > > Are you sure ? > > Regarding the btrfs-progs suite, I looked at the btrfs_read_dev_super(): > [..] > for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { > bytenr = btrfs_sb_offset(i); > ret = pread64(fd, &buf, sizeof(buf), bytenr); > > Where BTRFS_SUPER_MIRROR_MAX is 3. > > Regarding the kernel code, I looked at several function which call > btrfs_sb_offset(); everywhere there is an upper limit of the superblock > numbero which is BTRFS_SUPER_MIRROR_MAX, which is still 3. > > Moreover I performed the following test: > > $ ls -lh 7tb-filesystem.img > -rw-r--r-- 1 ghigo ghigo 7.1E Jan 7 18:49 7eb-filesystem.img > $ /sbin/mkfs.btrfs 7eb-filesystem.img > > $ cat extract-sign.py > import os > > BTRFS_SUPER_MIRROR_SHIFT = 12 > BTRFS_SUPER_INFO_OFFSET = (64*1024) > > def btrfs_sb_offset(mirror): > start = 16*1024 > if(mirror): > return start << (BTRFS_SUPER_MIRROR_SHIFT * mirror) > return BTRFS_SUPER_INFO_OFFSET > > f = open("7eb-filesystem.img","r") > for i in range(5): > pos = btrfs_sb_offset(i)+64 > f.seek(pos) > sign = f.read(8) > > print "Superblock #%d - %20d - '%s'"%(i,pos,sign) > > $ python extract-sign.py > Superblock #0 - 65600 - '_BHRfS_M' 64 KiB > Superblock #1 - 67108928 - '_BHRfS_M' 256 MiB > Superblock #2 - 274877907008 - '_BHRfS_M' 1 TiB > Superblock #3 - 1125899906842688 - '' 4 PiB > Superblock #4 - 4611686018427387968 - '' 16 EiB > To me it seems that in a 7TB filesystem there is only 3 superblocks. That would be as expected. How many on a 5 PiB filesystem, though? Or a 20 EiB one? Hugo.
Hi hugo On 01/07/2013 07:24 PM, Hugo Mills wrote: [...] >> print "Superblock #%d - %20d - '%s'"%(i,pos,sign) >> >> $ python extract-sign.py >> Superblock #0 - 65600 - '_BHRfS_M' > > 64 KiB OK, (above) > >> Superblock #1 - 67108928 - '_BHRfS_M' > > 256 MiB .... above is 64M, below is 256M ! > >> Superblock #2 - 274877907008 - '_BHRfS_M' > > 1 TiB > >> Superblock #3 - 1125899906842688 - '' > > 4 PiB > >> Superblock #4 - 4611686018427387968 - '' > > 16 EiB Not reachable in linux. There is a VFS limits to 8EB. > > >> To me it seems that in a 7TB filesystem there is only 3 superblocks. > > That would be as expected. How many on a 5 PiB filesystem, though? > Or a 20 EiB one? I wrote 7TB, but I meant 8EB. I first tried a test with 7TB and I wrote the email, but then I retested with 8EB and I corrected the email, forgetting some 7t instead of 8e... Anyway, if BTRFS would allow more than three super-block, in the 7TB case, the super-block at 1TB would appeared > > Hugo. >
Am 06.01.2013 19:28, schrieb Goffredo Baroncelli: > > You can pull the change from: > http://cassiopea.homelinux.net/git/util-linux.git > from the branch > btrfs-wipefs > After applying this patch, findfs and blkid didn't find filesystems smaller than 256GB, probably because there is no third superblock. Günter -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 06, 2013 at 07:28:55PM +0100, Goffredo Baroncelli wrote: > +const struct blkid_idinfo btrfs_idinfo1 = > +{ > + .name = "btrfs [bak #1]", > + .usage = BLKID_USAGE_FILESYSTEM, > + .probefunc = probe_btrfs, > + .minsz = 64 * 1024 * 1024 + 4 * 1024, > + .magics = > + { > + { .magic = "_BHRfS_M", > + .len = 8, > + .kboff = 64 * 1024, > + .sboff = 0x40 }, > + { NULL } > + } > +}; > + > +const struct blkid_idinfo btrfs_idinfo2 = > +{ > + .name = "btrfs [bak #2]", > + .usage = BLKID_USAGE_FILESYSTEM, > + .probefunc = probe_btrfs, > + .minsz = 256 * 1024 * 1024 * 1024 + 4 *1024, > + .magics = > + { > + { .magic = "_BHRfS_M", > + .len = 8, > + .kboff = 256 * 1024 * 1024, > + .sboff = 0x40 }, > + { NULL } > + } > +}; You can specify more than one magic strings for the same filesystem, the .magics = { } is array. .magics = { /* backup #1 */ { .magic = "_BHRfS_M", .len = 8, .kboff = 64 * 1024, .sboff = 0x40 }, }, /* backup #2 */ { .magic = "_BHRfS_M", .len = 8, .kboff = 256 * 1024 * 1024, .sboff = 0x40 }, ... } see for example libblkid/src/superblocks/reiserfs.c Karel
On Mon, Jan 07, 2013 at 07:20:16PM +0100, Goffredo Baroncelli wrote: > On 01/07/2013 05:33 PM, David Sterba wrote: > > It can have as much as 4 superblock backup copies: > > > > Superblock offset 0 is 65536 (0x10000, block=16/0x10) > > Superblock offset 1 is 67108864 (0x4000000, block=16384/0x4000) > > Superblock offset 2 is 274877906944 (0x4000000000, block=67108864/0x4000000) > > Superblock offset 3 is 1125899906842624 (0x4000000000000, block=274877906944/0x4000000000) > > Superblock offset 4 is 4611686018427387904 (0x4000000000000000, block=1125899906842624/0x4000000000000) > > Are you sure ? > > Regarding the btrfs-progs suite, I looked at the btrfs_read_dev_super(): > [..] > for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { > bytenr = btrfs_sb_offset(i); > ret = pread64(fd, &buf, sizeof(buf), bytenr); > > Where BTRFS_SUPER_MIRROR_MAX is 3. My bad, sorry, I was using the values from an old script that computed the values with a wrong upper limit. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Jan 06, 2013 at 07:28:55PM +0100, Goffredo Baroncelli wrote: > If the first superblock is valid except that the "magic field" is zeroed, > btrfs skips the check of the other superblocks. > If the first superblock is fully invalid, btrfs checks for the other > superblock. Hmm... why inconsistent (or missing) superblock is not reported as a problem? If I good understand the filesystem is still mountable, right? > So zeroing the first superblock "magic field" at the beginning seems > that the filesystem is wiped. Well, this is exactly the idea behind wipefs(8), just wipe minimal number of bytes from the device to make the filesystem invisible for libblkid (udev, ...). This concept is relatively safe, if you make a mistake than you can restore the magic string, your data should not be affected by wipefs(8). Karel
Hi Karel, On 01/08/2013 07:01 PM, Karel Zak wrote: > On Sun, Jan 06, 2013 at 07:28:55PM +0100, Goffredo Baroncelli wrote: >> If the first superblock is valid except that the "magic field" is zeroed, >> btrfs skips the check of the other superblocks. >> If the first superblock is fully invalid, btrfs checks for the other >> superblock. > > Hmm... why inconsistent (or missing) superblock is not reported as a > problem? If I good understand the filesystem is still mountable, > right? It should, however my tests were unsuccessful :-(... Chris ? > >> So zeroing the first superblock "magic field" at the beginning seems >> that the filesystem is wiped. > > Well, this is exactly the idea behind wipefs(8), just wipe minimal > number of bytes from the device to make the filesystem invisible for > libblkid (udev, ...). This concept is relatively safe, if you make a > mistake than you can restore the magic string, your data should not > be affected by wipefs(8). I fully agree. However wipefs should zero all three superblock > > Karel >
On Jan 8, 2013, at 1:09 PM, Goffredo Baroncelli <kreijack@tiscalinet.it> wrote: > Hi Karel, > > On 01/08/2013 07:01 PM, Karel Zak wrote: >> On Sun, Jan 06, 2013 at 07:28:55PM +0100, Goffredo Baroncelli wrote: >>> If the first superblock is valid except that the "magic field" is zeroed, >>> btrfs skips the check of the other superblocks. >>> If the first superblock is fully invalid, btrfs checks for the other >>> superblock. >> >> Hmm... why inconsistent (or missing) superblock is not reported as a >> problem? If I good understand the filesystem is still mountable, >> right? > > It should, however my tests were unsuccessful :-(... Chris ? I haven't tried mounting after wipefs, only after wipefs and subsequent mkfs.ext4; in that case the file system mounts without error as ext4. If I use mount -t btrfs I get an error. I haven't tried mounting with -t btrfs -o recovery. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Günter, On 01/08/2013 04:48 PM, Günter Gersdorf wrote: > Am 06.01.2013 19:28, schrieb Goffredo Baroncelli: >> >> You can pull the change from: >> http://cassiopea.homelinux.net/git/util-linux.git >> from the branch >> btrfs-wipefs >> > After applying this patch, findfs and blkid didn't find filesystems > smaller than 256GB, probably because there is no third superblock. In my system also the stock "findfs" and "blkid" have the same problem: ie without my patch these utilities were unable to detect a btrfs filesystem. Could you confirm that ? > > Günter
Hi Karel, > > You can specify more than one magic strings for the same filesystem, > the .magics = { } is array. thanks for you suggestion. However this seems to me not applicable. I tried to change the code, and what I got to me seems inconsistently: Whit this change 1) if I do "wipefs <device>", I got the offset of the first superblock (good enough) 2) if I do "wipefs -a <device>", I clean-up *all three* superblocks (very good) 3) if I do "wipefs -o <offset> <device>", I clean-up only the superblock located at <offset> (very bad) If the user doesn't know enough btrfs, trying 1) and 3) could think that the disk is cleaned-up. Instead the 2nd and the 3rd super-blocks still exist. > see for example libblkid/src/superblocks/reiserfs.c I think that this is a different case: the reiser superblocks are *alternative*; instead in the btrfs case, *all the three superblocks* exist at the same time. > Karel Ciao Goffredo
On Wed, Jan 09, 2013 at 06:48:28PM +0100, Goffredo Baroncelli wrote: > Hi Karel, > > > > > You can specify more than one magic strings for the same filesystem, > > the .magics = { } is array. > > thanks for you suggestion. However this seems to me not applicable. I > tried to change the code, and what I got to me seems inconsistently: > > Whit this change > > 1) if I do "wipefs <device>", I got the offset of the first superblock > (good enough) > 2) if I do "wipefs -a <device>", I clean-up *all three* superblocks > (very good) > 3) if I do "wipefs -o <offset> <device>", I clean-up only the superblock > located at <offset> (very bad) this is expected behavior described in wipefs man page: Note that some filesystems or some partition tables store more magic strings on the devices. The wipefs lists the first offset where a magic string has been detected. The device is not scanned for additional magic strings for the same filesystem. It's possible that after wipefs -o <offset> will be the same filesystem or partition table visible by another magic string on another offset. > If the user doesn't know enough btrfs, trying 1) and 3) could think that > the disk is cleaned-up. Instead the 2nd and the 3rd super-blocks still > exist. well, users (and installers) usually use wipefs -a or wipefs -t <fsname> > > see for example libblkid/src/superblocks/reiserfs.c > > I think that this is a different case: the reiser superblocks are it was example how to specify the magic strings in the code > *alternative*; instead in the btrfs case, *all the three superblocks* > exist at the same time. this is pretty common to have backup superblock (e.g. GPT) or more ways how to detect the filesystem (e.g. FAT). Please, send me the patch with the magic strings :-) I really don't want to add dummy filesystems to the library (like you did in the first version of the patch) -- it's very bad idea with many side effects. Karel
diff --git a/libblkid/src/superblocks/btrfs.c b/libblkid/src/superblocks/btrfs.c index 039be42..d1331e6 100644 --- a/libblkid/src/superblocks/btrfs.c +++ b/libblkid/src/superblocks/btrfs.c @@ -91,3 +91,35 @@ const struct blkid_idinfo btrfs_idinfo = } }; +const struct blkid_idinfo btrfs_idinfo1 = +{ + .name = "btrfs [bak #1]", + .usage = BLKID_USAGE_FILESYSTEM, + .probefunc = probe_btrfs, + .minsz = 64 * 1024 * 1024 + 4 * 1024, + .magics = + { + { .magic = "_BHRfS_M", + .len = 8, + .kboff = 64 * 1024, + .sboff = 0x40 }, + { NULL } + } +}; + +const struct blkid_idinfo btrfs_idinfo2 = +{ + .name = "btrfs [bak #2]", + .usage = BLKID_USAGE_FILESYSTEM, + .probefunc = probe_btrfs, + .minsz = 256 * 1024 * 1024 * 1024 + 4 *1024, + .magics = + { + { .magic = "_BHRfS_M", + .len = 8, + .kboff = 256 * 1024 * 1024, + .sboff = 0x40 }, + { NULL } + } +}; + diff --git a/libblkid/src/superblocks/superblocks.c b/libblkid/src/superblocks/superblocks.c index 2929a5f..1e88b63 100644 --- a/libblkid/src/superblocks/superblocks.c +++ b/libblkid/src/superblocks/superblocks.c @@ -135,6 +135,8 @@ static const struct blkid_idinfo *idinfos[] = &squashfs_idinfo, &netware_idinfo, &btrfs_idinfo, + &btrfs_idinfo1, + &btrfs_idinfo2, &ubifs_idinfo, &bfs_idinfo, &vmfs_fs_idinfo, diff --git a/libblkid/src/superblocks/superblocks.h b/libblkid/src/superblocks/superblocks.h index 08f1438..974ff8e 100644 --- a/libblkid/src/superblocks/superblocks.h +++ b/libblkid/src/superblocks/superblocks.h @@ -59,6 +59,8 @@ extern const struct blkid_idinfo netware_idinfo; extern const struct blkid_idinfo sysv_idinfo; extern const struct blkid_idinfo xenix_idinfo; extern const struct blkid_idinfo btrfs_idinfo; +extern const struct blkid_idinfo btrfs_idinfo1; +extern const struct blkid_idinfo btrfs_idinfo2; extern const struct blkid_idinfo ubifs_idinfo; extern const struct blkid_idinfo zfs_idinfo; extern const struct blkid_idinfo bfs_idinfo;