diff mbox series

[v12,06/12] reftable: define version 2 of the spec to accomodate SHA256

Message ID 093fa74a3d0e7721093cceb338e8efc9c0c95b1c.1588845586.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Reftable support git-core | expand

Commit Message

Johannes Schindelin via GitGitGadget May 7, 2020, 9:59 a.m. UTC
From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
 1 file changed, 28 insertions(+), 22 deletions(-)

Comments

Junio C Hamano May 8, 2020, 7:59 p.m. UTC | #1
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
>  1 file changed, 28 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
> index 9fa4657d9ff..ee3f36ea851 100644
> --- a/Documentation/technical/reftable.txt
> +++ b/Documentation/technical/reftable.txt
> @@ -193,8 +193,8 @@ and non-aligned files.
>  Very small files (e.g. 1 only ref block) may omit `padding` and the ref

Hmph, I am seeing nbsp before '1' and am wondering where it came from.

>  index to reduce total file size.
>  
> -Header
> -^^^^^^
> +Header (version 1)
> +^^^^^^^^^^^^^^^^^^
>  
>  A 24-byte header appears at the beginning of the file:
>  
> @@ -215,6 +215,24 @@ used in a stack for link:#Update-transactions[transactions], these
>  fields can order the files such that the prior file’s
>  `max_update_index + 1` is the next file’s `min_update_index`.

Am I correct to assume that we do not plan to support a repository
with mixed set of refs, some referring to a commit with its SHA-1
object name while others using SHA-256 object name?

> +Header (version 2)
> +^^^^^^^^^^^^^^^^^^
> +
> +A 28-byte header appears at the beginning of the file:
> +
> +....
> +'REFT'
> +uint8( version_number = 1 )

Shouldn't this be 2 instead, as v1 lacked the Hash-id field?

> +uint24( block_size )
> +uint64( min_update_index )
> +uint64( max_update_index )
> +uint32( hash_id )
> +....
> +
> +The header is identical to `version_number=1`, with the 4-byte hash ID
> +("sha1" for SHA1 and "s256" for SHA-256) append to the header.

Am I correct to assume that SHA-1 repositories are encouraged to use
version 2 when the code becomes available?

>  First ref block
>  ^^^^^^^^^^^^^^^
>  
> @@ -671,14 +689,8 @@ Footer
>  After the last block of the file, a file footer is written. It begins
>  like the file header, but is extended with additional data.
>  
> -A 68-byte footer appears at the end:
> -
>  ....
> -    'REFT'
> -    uint8( version_number = 1 )
> -    uint24( block_size )
> -    uint64( min_update_index )
> -    uint64( max_update_index )
> +    HEADER
>  
>      uint64( ref_index_position )
>      uint64( (obj_position << 5) | obj_id_len )
> @@ -701,12 +713,16 @@ obj blocks.
>  * `obj_index_position`: byte position for the start of the obj index.
>  * `log_index_position`: byte position for the start of the log index.
>  
> +The size of the footer is 68 bytes for version 1, and 72 bytes for
> +version 2.
> +
>  Reading the footer
>  ++++++++++++++++++
>  
> -Readers must seek to `file_length - 68` to access the footer. A trusted
> -external source (such as `stat(2)`) is necessary to obtain
> -`file_length`. When reading the footer, readers must verify:
> +Readers must first read the file start to determine the version
> +number. Then they seek to `file_length - FOOTER_LENGTH` to access the
> +footer. A trusted external source (such as `stat(2)`) is necessary to
> +obtain `file_length`. When reading the footer, readers must verify:

In any case, the size of this patch is pleasant to see---it must be
a sign that the previous step was done well not to hardcode the
"hash size is 20 bytes" assumption all over the place and instead
used "this field holds N+m bytes where N is the size of the hash
described in the REFT header" consistently.

Nicely done.
diff mbox series

Patch

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 9fa4657d9ff..ee3f36ea851 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -193,8 +193,8 @@  and non-aligned files.
 Very small files (e.g. 1 only ref block) may omit `padding` and the ref
 index to reduce total file size.
 
-Header
-^^^^^^
+Header (version 1)
+^^^^^^^^^^^^^^^^^^
 
 A 24-byte header appears at the beginning of the file:
 
@@ -215,6 +215,24 @@  used in a stack for link:#Update-transactions[transactions], these
 fields can order the files such that the prior file’s
 `max_update_index + 1` is the next file’s `min_update_index`.
 
+Header (version 2)
+^^^^^^^^^^^^^^^^^^
+
+A 28-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+uint32( hash_id )
+....
+
+The header is identical to `version_number=1`, with the 4-byte hash ID
+("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+
+
 First ref block
 ^^^^^^^^^^^^^^^
 
@@ -671,14 +689,8 @@  Footer
 After the last block of the file, a file footer is written. It begins
 like the file header, but is extended with additional data.
 
-A 68-byte footer appears at the end:
-
 ....
-    'REFT'
-    uint8( version_number = 1 )
-    uint24( block_size )
-    uint64( min_update_index )
-    uint64( max_update_index )
+    HEADER
 
     uint64( ref_index_position )
     uint64( (obj_position << 5) | obj_id_len )
@@ -701,12 +713,16 @@  obj blocks.
 * `obj_index_position`: byte position for the start of the obj index.
 * `log_index_position`: byte position for the start of the log index.
 
+The size of the footer is 68 bytes for version 1, and 72 bytes for
+version 2.
+
 Reading the footer
 ++++++++++++++++++
 
-Readers must seek to `file_length - 68` to access the footer. A trusted
-external source (such as `stat(2)`) is necessary to obtain
-`file_length`. When reading the footer, readers must verify:
+Readers must first read the file start to determine the version
+number. Then they seek to `file_length - FOOTER_LENGTH` to access the
+footer. A trusted external source (such as `stat(2)`) is necessary to
+obtain `file_length`. When reading the footer, readers must verify:
 
 * 4-byte magic is correct
 * 1-byte version number is recognized
@@ -1055,13 +1071,3 @@  impossible.
 
 A common format that can be supported by all major Git implementations
 (git-core, JGit, libgit2) is strongly preferred.
-
-Future
-~~~~~~
-
-Longer hashes
-^^^^^^^^^^^^^
-
-Version will bump (e.g. 2) to indicate `value` uses a different object
-id length other than 20. The length could be stored in an expanded file
-header, or hardcoded as part of the version.