diff mbox series

[v2,04/30] repository: add a compatibility hash algorithm

Message ID 20231002024034.2611-4-ebiederm@gmail.com (mailing list archive)
State New, archived
Headers show
Series initial support for multiple hash functions | expand

Commit Message

Eric W. Biederman Oct. 2, 2023, 2:40 a.m. UTC
From: "Eric W. Biederman" <ebiederm@xmission.com>

We currently have support for using a full stage 4 SHA-256
implementation.  However, we'd like to support interoperability with
SHA-1 repositories as well.  The transition plan anticipates a
compatibility hash algorithm configuration option that we can use to
implement support for this.  Let's add an element to the repository
structure that indicates the compatibility hash algorithm so we can use
it when we need to consider interoperability between algorithms.

Add a helper function repo_set_compat_hash_algo that takes a
compatibility hash algorithm and sets "repo->compat_hash_algo".  If
GIT_HASH_UNKNOWN is passed as the compatibility hash algorithm
"repo->compat_hash_algo" is set to NULL.

For now, the code results in "repo->compat_hash_algo" always being set
to NULL, but that will change once a configuration option is added.

Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 repository.c | 8 ++++++++
 repository.h | 4 ++++
 setup.c      | 3 +++
 3 files changed, 15 insertions(+)

Comments

Linus Arver Feb. 13, 2024, 10:02 a.m. UTC | #1
"Eric W. Biederman" <ebiederm@gmail.com> writes:

> From: "Eric W. Biederman" <ebiederm@xmission.com>
>
> We currently have support for using a full stage 4 SHA-256
> implementation.  However, we'd like to support interoperability with
> SHA-1 repositories as well.  The transition plan anticipates a
> compatibility hash algorithm configuration option that we can use to
> implement support for this.

Perhaps add

    See section "Object names on the command line" in
    git/Documentation/technical/hash-function-transition.txt .

? That section does not use the language "compatibility hash algorithm"
though, and I think "hash compatibility option" is easier to say.

Hmm, or are you talking about "compatObjectFormat" discussed in that doc?

> Let's add an element to the repository
> structure that indicates the compatibility hash algorithm so we can use
> it when we need to consider interoperability between algorithms.

How about just

    Add a hash compatibility option to the repository structure to
    consider interoperability between hash algorithms.

?

Aside: already we are seeing multiple keywords "compatibility",
"transition", "interoperability" to all mean roughly similar things. I
hope we can settle on just one (ideally) in the codebase by the end of
this series.

> Add a helper function repo_set_compat_hash_algo that takes a
> compatibility hash algorithm and sets "repo->compat_hash_algo".  If
> GIT_HASH_UNKNOWN is passed as the compatibility hash algorithm
> "repo->compat_hash_algo" is set to NULL.
>
> For now, the code results in "repo->compat_hash_algo" always being set
> to NULL, but that will change once a configuration option is added.

It's not clear to me whether you are talking about a config option to
describe the different stages of transition around algorithms, or a hash
algorithm itself (SHA1, SHA256, UNKNOWN).

> Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  repository.c | 8 ++++++++
>  repository.h | 4 ++++
>  setup.c      | 3 +++
>  3 files changed, 15 insertions(+)
>
> diff --git a/repository.c b/repository.c
> index a7679ceeaa45..80252b79e93e 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -104,6 +104,13 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
>  	repo->hash_algo = &hash_algos[hash_algo];
>  }
>  
> +void repo_set_compat_hash_algo(struct repository *repo, int algo)
> +{
> +	if (hash_algo_by_ptr(repo->hash_algo) == algo)
> +		BUG("hash_algo and compat_hash_algo match");
> +	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
> +}

Ah, OK. So we are talking about an algorithm itself. Looking at this
code it seems like a compat_hash_algo is something like "the hash
algorithm I want my repository to start using but which has not
already". Such a description would have been useful in the commit
message.

Nit: I think 

    BUG("compat_hash_algo may not be the same as hash_algo");

is more natural because the error message should explain the badness of
the behavior rather than merely reflect the triggering condition. And
the "star of the show" here is the new compat_hash_algo member, so it
makes sense to emphasize that more as the only subject of the sentence
instead of grouping it together with hash_algo (given them equal
importance).

> +
>  /*
>   * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
>   * Return 0 upon success and a non-zero value upon failure.
> @@ -184,6 +191,7 @@ int repo_init(struct repository *repo,
>  		goto error;
>  
>  	repo_set_hash_algo(repo, format.hash_algo);
> +	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
>  	repo->repository_format_worktree_config = format.worktree_config;
>  
>  	/* take ownership of format.partial_clone */
> diff --git a/repository.h b/repository.h
> index 5f18486f6465..bf3fc601cc53 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -160,6 +160,9 @@ struct repository {
>  	/* Repository's current hash algorithm, as serialized on disk. */
>  	const struct git_hash_algo *hash_algo;
>  
> +	/* Repository's compatibility hash algorithm. */

Perhaps add "May not be the same as hash_algo." ?

> +	const struct git_hash_algo *compat_hash_algo;
> +
>  	/* A unique-id for tracing purposes. */
>  	int trace2_repo_id;
>  
> @@ -199,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
>  		     const struct set_gitdir_args *extra_args);
>  void repo_set_worktree(struct repository *repo, const char *path);
>  void repo_set_hash_algo(struct repository *repo, int algo);
> +void repo_set_compat_hash_algo(struct repository *repo, int compat_algo);
>  void initialize_the_repository(void);
>  RESULT_MUST_BE_USED
>  int repo_init(struct repository *r, const char *gitdir, const char *worktree);
> diff --git a/setup.c b/setup.c
> index 18927a847b86..aa8bf5da5226 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1564,6 +1564,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  		}
>  		if (startup_info->have_repository) {
>  			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> +			repo_set_compat_hash_algo(the_repository,
> +						  GIT_HASH_UNKNOWN);
>  			the_repository->repository_format_worktree_config =
>  				repo_fmt.worktree_config;
>  			/* take ownership of repo_fmt.partial_clone */
> @@ -1657,6 +1659,7 @@ void check_repository_format(struct repository_format *fmt)
>  	check_repository_format_gently(get_git_dir(), fmt, NULL);
>  	startup_info->have_repository = 1;
>  	repo_set_hash_algo(the_repository, fmt->hash_algo);
> +	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
>  	the_repository->repository_format_worktree_config =
>  		fmt->worktree_config;
>  	the_repository->repository_format_partial_clone =
> -- 
> 2.41.0
Patrick Steinhardt Feb. 15, 2024, 11:22 a.m. UTC | #2
On Sun, Oct 01, 2023 at 09:40:08PM -0500, Eric W. Biederman wrote:
> From: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> We currently have support for using a full stage 4 SHA-256
> implementation.

What is a "full stage 4 SHA-256 implementation"? I was assuming that you
referred to "Documentation/technical/hash-function-transition.txt", but
it does not mention stages either.

> However, we'd like to support interoperability with
> SHA-1 repositories as well.  The transition plan anticipates a
> compatibility hash algorithm configuration option that we can use to
> implement support for this.  Let's add an element to the repository
> structure that indicates the compatibility hash algorithm so we can use
> it when we need to consider interoperability between algorithms.
> 
> Add a helper function repo_set_compat_hash_algo that takes a
> compatibility hash algorithm and sets "repo->compat_hash_algo".  If
> GIT_HASH_UNKNOWN is passed as the compatibility hash algorithm
> "repo->compat_hash_algo" is set to NULL.
> 
> For now, the code results in "repo->compat_hash_algo" always being set
> to NULL, but that will change once a configuration option is added.
> 
> Inspired-by: brian m. carlson <sandals@crustytoothpaste.net>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  repository.c | 8 ++++++++
>  repository.h | 4 ++++
>  setup.c      | 3 +++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/repository.c b/repository.c
> index a7679ceeaa45..80252b79e93e 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -104,6 +104,13 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
>  	repo->hash_algo = &hash_algos[hash_algo];
>  }
>  
> +void repo_set_compat_hash_algo(struct repository *repo, int algo)
> +{
> +	if (hash_algo_by_ptr(repo->hash_algo) == algo)
> +		BUG("hash_algo and compat_hash_algo match");
> +	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
> +}
> +
>  /*
>   * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
>   * Return 0 upon success and a non-zero value upon failure.
> @@ -184,6 +191,7 @@ int repo_init(struct repository *repo,
>  		goto error;
>  
>  	repo_set_hash_algo(repo, format.hash_algo);
> +	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
>  	repo->repository_format_worktree_config = format.worktree_config;
>  
>  	/* take ownership of format.partial_clone */
> diff --git a/repository.h b/repository.h
> index 5f18486f6465..bf3fc601cc53 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -160,6 +160,9 @@ struct repository {
>  	/* Repository's current hash algorithm, as serialized on disk. */
>  	const struct git_hash_algo *hash_algo;
>  
> +	/* Repository's compatibility hash algorithm. */
> +	const struct git_hash_algo *compat_hash_algo;
> +
>  	/* A unique-id for tracing purposes. */
>  	int trace2_repo_id;
>  
> @@ -199,6 +202,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
>  		     const struct set_gitdir_args *extra_args);
>  void repo_set_worktree(struct repository *repo, const char *path);
>  void repo_set_hash_algo(struct repository *repo, int algo);
> +void repo_set_compat_hash_algo(struct repository *repo, int compat_algo);
>  void initialize_the_repository(void);
>  RESULT_MUST_BE_USED
>  int repo_init(struct repository *r, const char *gitdir, const char *worktree);
> diff --git a/setup.c b/setup.c
> index 18927a847b86..aa8bf5da5226 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1564,6 +1564,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  		}
>  		if (startup_info->have_repository) {
>  			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> +			repo_set_compat_hash_algo(the_repository,
> +						  GIT_HASH_UNKNOWN);
>  			the_repository->repository_format_worktree_config =
>  				repo_fmt.worktree_config;
>  			/* take ownership of repo_fmt.partial_clone */
> @@ -1657,6 +1659,7 @@ void check_repository_format(struct repository_format *fmt)
>  	check_repository_format_gently(get_git_dir(), fmt, NULL);
>  	startup_info->have_repository = 1;
>  	repo_set_hash_algo(the_repository, fmt->hash_algo);
> +	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
>  	the_repository->repository_format_worktree_config =
>  		fmt->worktree_config;
>  	the_repository->repository_format_partial_clone =

There's also `init_db()`, where we call `repo_set_hash_algo()`. Would we
have to call `repo_set_compat_hash_algo()` there, too? There are some
other locations when handling remotes or clones, but I don't think those
are relevant right now.

Patrick
diff mbox series

Patch

diff --git a/repository.c b/repository.c
index a7679ceeaa45..80252b79e93e 100644
--- a/repository.c
+++ b/repository.c
@@ -104,6 +104,13 @@  void repo_set_hash_algo(struct repository *repo, int hash_algo)
 	repo->hash_algo = &hash_algos[hash_algo];
 }
 
+void repo_set_compat_hash_algo(struct repository *repo, int algo)
+{
+	if (hash_algo_by_ptr(repo->hash_algo) == algo)
+		BUG("hash_algo and compat_hash_algo match");
+	repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
+}
+
 /*
  * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
  * Return 0 upon success and a non-zero value upon failure.
@@ -184,6 +191,7 @@  int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
+	repo_set_compat_hash_algo(repo, GIT_HASH_UNKNOWN);
 	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
diff --git a/repository.h b/repository.h
index 5f18486f6465..bf3fc601cc53 100644
--- a/repository.h
+++ b/repository.h
@@ -160,6 +160,9 @@  struct repository {
 	/* Repository's current hash algorithm, as serialized on disk. */
 	const struct git_hash_algo *hash_algo;
 
+	/* Repository's compatibility hash algorithm. */
+	const struct git_hash_algo *compat_hash_algo;
+
 	/* A unique-id for tracing purposes. */
 	int trace2_repo_id;
 
@@ -199,6 +202,7 @@  void repo_set_gitdir(struct repository *repo, const char *root,
 		     const struct set_gitdir_args *extra_args);
 void repo_set_worktree(struct repository *repo, const char *path);
 void repo_set_hash_algo(struct repository *repo, int algo);
+void repo_set_compat_hash_algo(struct repository *repo, int compat_algo);
 void initialize_the_repository(void);
 RESULT_MUST_BE_USED
 int repo_init(struct repository *r, const char *gitdir, const char *worktree);
diff --git a/setup.c b/setup.c
index 18927a847b86..aa8bf5da5226 100644
--- a/setup.c
+++ b/setup.c
@@ -1564,6 +1564,8 @@  const char *setup_git_directory_gently(int *nongit_ok)
 		}
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			repo_set_compat_hash_algo(the_repository,
+						  GIT_HASH_UNKNOWN);
 			the_repository->repository_format_worktree_config =
 				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
@@ -1657,6 +1659,7 @@  void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	repo_set_compat_hash_algo(the_repository, GIT_HASH_UNKNOWN);
 	the_repository->repository_format_worktree_config =
 		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =