diff mbox

[0/3] SHA-256: Update commit-graph and multi-pack-index formats

Message ID pull.703.git.1597428440.gitgitgadget@gmail.com
State New, archived
Headers show

Commit Message

Johannes Schindelin via GitGitGadget Aug. 14, 2020, 6:07 p.m. UTC
As discussed [1], there is some concern around binary file formats requiring
the context of the repository config in order to infer hash lengths. Two
formats that were designed with the hash transition in mind (commit-graph
and multi-pack-index) have bytes available to indicate the hash algorithm
used. Let's actually update these formats to be more self-contained with the
two hash algorithms being available.

[1] 
https://lore.kernel.org/git/CAN0heSp024=Kyy7gdQ2VSetk_5iVhj_qdT8CMVPcry_AwWrhHQ@mail.gmail.com/

This merges cleanly with tb/bloom-improvements, but both that branch and
this patch series have merge conflicts with the corrected commit date patch
series [2].

[2] 
https://lore.kernel.org/git/pull.676.v2.git.1596941624.gitgitgadget@gmail.com/

In particular, the following conflict can be resolved in the "obvioius" way:

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< HEAD
    header: 43475048 1 $OID_VERSION 3 $NUM_BASE
================================
    header: 43475048 1 1 4 $NUM_BASE
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> abhishek/corrected_commit_date

Instead use:

    header: 43475048 1 $OID_VERSION 4 $NUM_BASE

But, it also needs the following fix to actually work with this series:


If this is the way we want to go with the formats, then I'll assist
coordinating these textual and semantic merge conflicts.

Thanks, -Stolee

Derrick Stolee (3):
  t/README: document GIT_TEST_DEFAULT_HASH
  commit-graph: use the hash version byte
  multi-pack-index: use hash version byte

 .../technical/commit-graph-format.txt         |  9 +++-
 Documentation/technical/pack-format.txt       |  7 ++-
 commit-graph.c                                |  6 ++-
 midx.c                                        | 32 +++++++++++---
 t/README                                      |  3 ++
 t/helper/test-read-midx.c                     |  8 +++-
 t/t4216-log-bloom.sh                          |  8 +++-
 t/t5318-commit-graph.sh                       | 37 +++++++++++++++-
 t/t5319-multi-pack-index.sh                   | 43 +++++++++++++++++--
 t/t5324-split-commit-graph.sh                 |  8 +++-
 10 files changed, 142 insertions(+), 19 deletions(-)


base-commit: 878e727637ec5815ccb3301eb994a54df95b21b8
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-703%2Fderrickstolee%2Fcommit-graph-256-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-703/derrickstolee/commit-graph-256-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/703

Comments

Junio C Hamano Aug. 14, 2020, 7:25 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> As discussed [1], there is some concern around binary file formats requiring
> the context of the repository config in order to infer hash lengths. Two
> formats that were designed with the hash transition in mind (commit-graph
> and multi-pack-index) have bytes available to indicate the hash algorithm
> used. Let's actually update these formats to be more self-contained with the
> two hash algorithms being available.
> ...
> If this is the way we want to go with the formats, then I'll assist
> coordinating these textual and semantic merge conflicts.

I agree that the files should be self-identifying, but have these
changes tested without sha256 hash?
Derrick Stolee Aug. 14, 2020, 8:34 p.m. UTC | #2
On 8/14/2020 3:25 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> As discussed [1], there is some concern around binary file formats requiring
>> the context of the repository config in order to infer hash lengths. Two
>> formats that were designed with the hash transition in mind (commit-graph
>> and multi-pack-index) have bytes available to indicate the hash algorithm
>> used. Let's actually update these formats to be more self-contained with the
>> two hash algorithms being available.
>> ...
>> If this is the way we want to go with the formats, then I'll assist
>> coordinating these textual and semantic merge conflicts.
> 
> I agree that the files should be self-identifying, but have these
> changes tested without sha256 hash?

All of the test scripts pass with and without GIT_TEST_DEFAULT_HASH=sha256,
and this test in t5318 (and a similar one in t5319) are explicit about
testing both options:

+test_expect_success 'warn on improper hash version' '
+	git init --object-format=sha1 sha1 &&
+	(
+		cd sha1 &&
+		test_commit 1 &&
+		git commit-graph write --reachable &&
+		mv .git/objects/info/commit-graph ../cg-sha1
+	) &&
+	git init --object-format=sha256 sha256 &&
+	(
+		cd sha256 &&
+		test_commit 1 &&
+		git commit-graph write --reachable &&
+		mv .git/objects/info/commit-graph ../cg-sha256
+	) &&
+	(
+		cd sha1 &&
+		mv ../cg-sha256 .git/objects/info/commit-graph &&
+		git log -1 2>err &&
+		test_i18ngrep "commit-graph hash version 2 does not match version 1" err
+	) &&
+	(
+		cd sha256 &&
+		mv ../cg-sha1 .git/objects/info/commit-graph &&
+		git log -1 2>err &&
+		test_i18ngrep "commit-graph hash version 1 does not match version 2" err
+	)
+'
+

Since this tests exactly that the "hash version" byte is the same in
a SHA-1 repo, this checks that the new version of Git writes backwards-
compatible data in SHA-1 repos.

Or are you hinting at a more subtle test scenario that I missed?

Thanks,
-Stolee
Junio C Hamano Aug. 14, 2020, 9:41 p.m. UTC | #3
Derrick Stolee <stolee@gmail.com> writes:

> On 8/14/2020 3:25 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> 
>>> As discussed [1], there is some concern around binary file formats requiring
>>> the context of the repository config in order to infer hash lengths. Two
>>> formats that were designed with the hash transition in mind (commit-graph
>>> and multi-pack-index) have bytes available to indicate the hash algorithm
>>> used. Let's actually update these formats to be more self-contained with the
>>> two hash algorithms being available.
>>> ...
>>> If this is the way we want to go with the formats, then I'll assist
>>> coordinating these textual and semantic merge conflicts.
>> 
>> I agree that the files should be self-identifying, but have these
>> changes tested without sha256 hash?
>
> All of the test scripts pass with and without GIT_TEST_DEFAULT_HASH=sha256,
> and this test in t5318 (and a similar one in t5319) are explicit about
> testing both options:
>
> +test_expect_success 'warn on improper hash version' '
> +	git init --object-format=sha1 sha1 &&
> +	(
> +		cd sha1 &&
> +		test_commit 1 &&
> +		git commit-graph write --reachable &&
> +		mv .git/objects/info/commit-graph ../cg-sha1
> +	) &&
> +	git init --object-format=sha256 sha256 &&
> +	(
> +		cd sha256 &&
> +		test_commit 1 &&
> +		git commit-graph write --reachable &&
> +		mv .git/objects/info/commit-graph ../cg-sha256
> +	) &&
> +	(
> +		cd sha1 &&
> +		mv ../cg-sha256 .git/objects/info/commit-graph &&
> +		git log -1 2>err &&
> +		test_i18ngrep "commit-graph hash version 2 does not match version 1" err
> +	) &&
> +	(
> +		cd sha256 &&
> +		mv ../cg-sha1 .git/objects/info/commit-graph &&
> +		git log -1 2>err &&
> +		test_i18ngrep "commit-graph hash version 1 does not match version 2" err
> +	)
> +'
> +
>
> Since this tests exactly that the "hash version" byte is the same in
> a SHA-1 repo, this checks that the new version of Git writes backwards-
> compatible data in SHA-1 repos.
>
> Or are you hinting at a more subtle test scenario that I missed?

No, I was just wondering how ready we are, as the four tests looked
too easy ;-)
diff mbox

Patch

diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 211ec625d2..09f133792c 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -464,7 +464,7 @@  test_expect_success 'setup repo for mixed generation commit-graph-chain' '
        GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split=no-merge &&
        test-tool read-graph >output &&
        cat >expect <<-EOF &&
-       header: 43475048 1 1 4 1
+       header: 43475048 1 $OID_VERSION 4 1
        num_commits: 2
        chunks: oid_fanout oid_lookup commit_metadata
        EOF
@@ -482,7 +482,7 @@  test_expect_success 'does not write generation data chunk if not present on exis
        git commit-graph write --reachable --split=no-merge &&
        test-tool read-graph >output &&
        cat >expect <<-EOF &&
-       header: 43475048 1 1 4 2
+       header: 43475048 1 $OID_VERSION 4 2
        num_commits: 3
        chunks: oid_fanout oid_lookup commit_metadata
        EOF