Message ID | 20170615220002.GB4530@birch.djwong.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
[cc xfs] On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote: > On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote: > > Document the metadump file format. > > Thanks for all this! I have started wondering all this and was > curious if there are perhaps more docs about the format or more > practical docs which can help one go read the dumps and help > analyze through examples. > > > --- /dev/null > > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc > > +== Dump Obfuscation > > + > > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block > > +space and naming information to avoid leaking sensitive information into > > +the metadump file. +xfs_metadump+ does not copy user data blocks. > > + > > +The obfuscation policy is as follows: > > + > > +* File and extended attribute names are both considered "names". > > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name. > > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name. > > Any reason for this? /me doesn't know. Maybe it's too hard to generate a new name with the same hash? > > +* Names shorter than 5 characters are not obscured at all. > > This does not seem like a good idea, do we have a record of why this was done > historically? > > > +* Names that cross a block boundary are not obscured at all. > > Likewise. iirc we basically copy things a block at a time, which makes it harder to deal with multi-fsblock dirblocks (???) I don't really know, let's see if the list remembers. :) --D > > Luis -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 25, 2017 at 9:36 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote: >> > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc >> > +== Dump Obfuscation >> > + >> > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block >> > +space and naming information to avoid leaking sensitive information into >> > +the metadump file. +xfs_metadump+ does not copy user data blocks. >> > + >> > +The obfuscation policy is as follows: >> > + >> > +* File and extended attribute names are both considered "names". >> > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name. >> > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name. >> >> Any reason for this? > > /me doesn't know. Maybe it's too hard to generate a new name with the > same hash? I just thought of a good reason: shorter names means less information to be able to create a unique hash and avoid clashes. >> > +* Names shorter than 5 characters are not obscured at all. >> >> This does not seem like a good idea, do we have a record of why this was done >> historically? And likewise for smaller number of characters there could be clashes. In fact the latest md5 clash was for 64 bytes. But since we *really* don't want to create a reverse map at all why not just use random characters? Sure we'd eat up entropy fast on a system but it would seem to be a fair requirement to ask for good entropy for this. >> > +* Names that cross a block boundary are not obscured at all. >> >> Likewise. > > iirc we basically copy things a block at a time, which makes it harder > to deal with multi-fsblock dirblocks (???) > > I don't really know, let's see if the list remembers. :) Luis -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 25, 2017 at 09:36:12AM -0700, Darrick J. Wong wrote: > [cc xfs] > > On Tue, Jul 25, 2017 at 06:20:54PM +0200, Luis R. Rodriguez wrote: > > On Thu, Jun 15, 2017 at 03:00:02PM -0700, Darrick J. Wong wrote: > > > Document the metadump file format. > > > > Thanks for all this! I have started wondering all this and was > > curious if there are perhaps more docs about the format or more > > practical docs which can help one go read the dumps and help > > analyze through examples. > > > > > --- /dev/null > > > +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc > > > +== Dump Obfuscation > > > + > > > +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block > > > +space and naming information to avoid leaking sensitive information into > > > +the metadump file. +xfs_metadump+ does not copy user data blocks. > > > + > > > +The obfuscation policy is as follows: > > > + > > > +* File and extended attribute names are both considered "names". > > > +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name. > > > +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name. > > > > Any reason for this? > > /me doesn't know. Maybe it's too hard to generate a new name with the > same hash? > > > > +* Names shorter than 5 characters are not obscured at all. > > > > This does not seem like a good idea, do we have a record of why this was done > > historically? It was done because of mathematics. This is all "IIRC" off the top of my head.... The hash we calculate is 4 bytes long, so we can't calculate a hash collision from less than 5 bytes of input. i.e. 4 bytes + 1 byte to cause collision is the shortest we can obfuscate, but one byte of "name overwrite" isn't enough to find collisions if all 4 of the first 4 bytes are randomly chosen. Hence for names 5-8 bytes in length we are limited to 1 byte of correction for each of the first 4 bytes that is randomised. Hence it's not until filenames are longer than 8 bytes that we can generate a truly random filename that causes a hash collision. > > > +* Names that cross a block boundary are not obscured at all. > > > > Likewise. > > iirc we basically copy things a block at a time, which makes it harder > to deal with multi-fsblock dirblocks (???) Discontiguous multi-block directories should be handled transparently for xfs_db via libxfs buffers now. Maybe metadump doesn't use these for dumping directory blocks? Also, We shouldn't be splitting names and values across da block boundaries - we leave free space in the block and allocate a new one if it doesn't fit.... Cheers, Dave.
diff --git a/design/XFS_Filesystem_Structure/docinfo.xml b/design/XFS_Filesystem_Structure/docinfo.xml index 5cdcf6c..e13d705 100644 --- a/design/XFS_Filesystem_Structure/docinfo.xml +++ b/design/XFS_Filesystem_Structure/docinfo.xml @@ -155,4 +155,18 @@ </simplelist> </revdescription> </revision> + <revision> + <revnumber>3.14159</revnumber> + <date>June 2017</date> + <author> + <firstname>Darrick</firstname> + <surname>Wong</surname> + <email>darrick.wong@oracle.com</email> + </author> + <revdescription> + <simplelist> + <member>Add the metadump file format.</member> + </simplelist> + </revdescription> + </revision> </revhistory> diff --git a/design/XFS_Filesystem_Structure/magic.asciidoc b/design/XFS_Filesystem_Structure/magic.asciidoc index 77bed6d..7e62783 100644 --- a/design/XFS_Filesystem_Structure/magic.asciidoc +++ b/design/XFS_Filesystem_Structure/magic.asciidoc @@ -47,6 +47,7 @@ relevant chapters. Magic numbers tend to have consistent locations: | +XFS_RMAP_CRC_MAGIC+ | 0x524d4233 | RMB3 | xref:Reverse_Mapping_Btree[Reverse Mapping B+tree], v5 only | +XFS_RTRMAP_CRC_MAGIC+ | 0x4d415052 | MAPR | xref:Real_time_Reverse_Mapping_Btree[Real-Time Reverse Mapping B+tree], v5 only | +XFS_REFC_CRC_MAGIC+ | 0x52334643 | R3FC | xref:Reference_Count_Btree[Reference Count B+tree], v5 only +| +XFS_MD_MAGIC+ | 0x5846534d | XFSM | xref:Metadata_Dumps[Metadata Dumps] |===== The magic numbers for log items are at offset zero in each log item, but items diff --git a/design/XFS_Filesystem_Structure/metadump.asciidoc b/design/XFS_Filesystem_Structure/metadump.asciidoc new file mode 100644 index 0000000..2bddb77 --- /dev/null +++ b/design/XFS_Filesystem_Structure/metadump.asciidoc @@ -0,0 +1,62 @@ +[[Metadata_Dumps]] += Metadata Dumps + +The +xfs_metadump+ and +xfs_mdrestore+ tools are used to create a sparse +snapshot of a live file system and to restore that snapshot onto a block +device for debugging purposes. Only the metadata are captured in the +snapshot, and the metadata blocks may be obscured for privacy reasons. + +A metadump file starts with a +xfs_metablock+ that records the addresses of +the blocks that follow. Following that are the metadata blocks captured +from the filesystem. The first block following the first superblock +must be the superblock from AG 0. If the metadump has more blocks than +can be pointed to by the +xfs_metablock.mb_daddr+ area, the sequence +of +xfs_metablock+ followed by metadata blocks is repeated. + +.Metadata Dump Format + +[source, c] +---- +struct xfs_metablock { + __be32 mb_magic; + __be16 mb_count; + uint8_t mb_blocklog; + uint8_t mb_reserved; + __be64 mb_daddr[]; +}; +---- + +*mb_magic*:: +The magic number, ``XFSM'' (0x5846534d). + +*mb_count*:: +Number of blocks indexed by this record. This value must not exceed +(1 +<< mb_blocklog) - sizeof(struct xfs_metablock)+. + +*mb_blocklog*:: +The log size of a metadump block. This size of a metadump block 512 +bytes, so this value should be 9. + +*mb_reserved*:: +Reserved. Should be zero. + +*mb_daddr*:: +An array of disk addresses. Each of the +mb_count+ blocks (of size +(1 +<< mb_blocklog+) following the +xfs_metablock+ should be written back to +the address pointed to by the corresponding +mb_daddr+ entry. + +== Dump Obfuscation + +Unless explicitly disabled, the +xfs_metadump+ tool obfuscates empty block +space and naming information to avoid leaking sensitive information into +the metadump file. +xfs_metadump+ does not copy user data blocks. + +The obfuscation policy is as follows: + +* File and extended attribute names are both considered "names". +* Names longer than 8 characters are totally rewritten with a name that matches the hash of the old name. +* Names between 5 and 8 characters are partially rewritten to match the hash of the old name. +* Names shorter than 5 characters are not obscured at all. +* Names that cross a block boundary are not obscured at all. +* Extended attribute values are zeroed. +* Empty parts of metadata blocks are zeroed. diff --git a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc index 7916fbe..5540fa0 100644 --- a/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc +++ b/design/XFS_Filesystem_Structure/xfs_filesystem_structure.asciidoc @@ -96,3 +96,12 @@ include::extended_attributes.asciidoc[] include::symbolic_links.asciidoc[] :leveloffset: 0 + +Auxiliary Data Structures +========================= + +:leveloffset: 1 + +include::metadump.asciidoc[] + +:leveloffset: 0
Document the metadump file format. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- design/XFS_Filesystem_Structure/docinfo.xml | 14 +++++ design/XFS_Filesystem_Structure/magic.asciidoc | 1 design/XFS_Filesystem_Structure/metadump.asciidoc | 62 ++++++++++++++++++++ .../xfs_filesystem_structure.asciidoc | 9 +++ 4 files changed, 86 insertions(+) create mode 100644 design/XFS_Filesystem_Structure/metadump.asciidoc -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html