diff mbox series

[v2,08/20] hash: require hash algorithm in `empty_tree_oid_hex()`

Message ID 4858cca25fe9e57c984fc3181fe8498d0b7222b0.1718259125.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Introduce `USE_THE_REPOSITORY_VARIABLE` macro | expand

Commit Message

Patrick Steinhardt June 13, 2024, 6:14 a.m. UTC
The `empty_tree_oid_hex()` function use `the_repository` to derive the
hash function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.

While at it, remove the unused `empty_blob_oid_hex()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 add-interactive.c      |  2 +-
 add-patch.c            |  2 +-
 builtin/merge.c        |  3 ++-
 builtin/receive-pack.c |  2 +-
 hash-ll.h              |  3 +--
 object-file.c          | 10 ++--------
 sequencer.c            |  2 +-
 submodule.c            |  6 +++---
 wt-status.c            |  4 ++--
 9 files changed, 14 insertions(+), 20 deletions(-)

Comments

Phillip Wood June 13, 2024, 10:01 a.m. UTC | #1
On 13/06/2024 07:14, Patrick Steinhardt wrote:
> The `empty_tree_oid_hex()` function use `the_repository` to derive the
> hash function that shall be used. Require callers to pass in the hash
> algorithm to get rid of this implicit dependency.

Many of these call sites already have a repository instance available so 
don't need to use "the_repository". I haven't checked but with these 
changes it might be possible to remove some of these files from the next 
patch.

I've only really looked at this patch in this series as I was just 
checking for changes to the sequencer code. As for the series as a whole 
I think adding USE_THE_REPOSITORY_VARIABLE is a good direction.

> While at it, remove the unused `empty_blob_oid_hex()` function.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>   add-interactive.c      |  2 +-
>   add-patch.c            |  2 +-
>   builtin/merge.c        |  3 ++-
>   builtin/receive-pack.c |  2 +-
>   hash-ll.h              |  3 +--
>   object-file.c          | 10 ++--------
>   sequencer.c            |  2 +-
>   submodule.c            |  6 +++---
>   wt-status.c            |  4 ++--
>   9 files changed, 14 insertions(+), 20 deletions(-)
> 
> diff --git a/add-interactive.c b/add-interactive.c
> index b5d6cd689a..a0961096cd 100644
> --- a/add-interactive.c
> +++ b/add-interactive.c
> @@ -557,7 +557,7 @@ static int get_modified_files(struct repository *r,
>   		s.skip_unseen = filter && i;
>   
>   		opt.def = is_initial ?
> -			empty_tree_oid_hex() : oid_to_hex(&head_oid);
> +			empty_tree_oid_hex(the_repository->hash_algo) : oid_to_hex(&head_oid);

The hunk fragment shows that we already have a struct repository 
instance in this function which we should use instead of "the_repository"

> diff --git a/add-patch.c b/add-patch.c
> index 814de57c4a..86181770f2 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -420,7 +420,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>   			    /* could be on an unborn branch */
>   			    !strcmp("HEAD", s->revision) &&
>   			    repo_get_oid(the_repository, "HEAD", &oid) ?
> -			    empty_tree_oid_hex() : s->revision);
> +			    empty_tree_oid_hex(the_repository->hash_algo) : s->revision);

It's not obvious from this hunk but there is a repository instance in 
"s->s->r" which we should use instead of "the_repository"

> diff --git a/sequencer.c b/sequencer.c
> index 68d62a12ff..823691e379 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2263,7 +2263,7 @@ static int do_pick_commit(struct repository *r,
>   			unborn = 1;
>   		} else if (unborn)
>   			oidcpy(&head, the_hash_algo->empty_tree);
> -		if (index_differs_from(r, unborn ? empty_tree_oid_hex() : "HEAD",
> +		if (index_differs_from(r, unborn ? empty_tree_oid_hex(the_repository->hash_algo) : "HEAD",

The hunk fragment shows that we already have a struct repository 
instance in "r" which we should use instead of "the_repository" here.

> diff --git a/wt-status.c b/wt-status.c
> index ff4be071ca..5051f5e599 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -641,7 +641,7 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>   
>   	repo_init_revisions(s->repo, &rev, NULL);
>   	memset(&opt, 0, sizeof(opt));
> -	opt.def = s->is_initial ? empty_tree_oid_hex() : s->reference;
> +	opt.def = s->is_initial ? empty_tree_oid_hex(the_repository->hash_algo) : s->reference;

The context line above shows us that we have a repository instance 
available so we should use "s->repo" instead of "the_repository"

> @@ -1136,7 +1136,7 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
>   	rev.diffopt.ita_invisible_in_index = 1;
>   
>   	memset(&opt, 0, sizeof(opt));
> -	opt.def = s->is_initial ? empty_tree_oid_hex() : s->reference;
> +	opt.def = s->is_initial ? empty_tree_oid_hex(the_repository->hash_algo) : s->reference;

We should use "s->repo" here as well

Best Wishes

Phillip
Junio C Hamano June 13, 2024, 3:39 p.m. UTC | #2
Phillip Wood <phillip.wood123@gmail.com> writes:

>> diff --git a/add-interactive.c b/add-interactive.c
>> index b5d6cd689a..a0961096cd 100644
>> --- a/add-interactive.c
>> +++ b/add-interactive.c
>> @@ -557,7 +557,7 @@ static int get_modified_files(struct repository *r,
>>   		s.skip_unseen = filter && i;
>>     		opt.def = is_initial ?
>> -			empty_tree_oid_hex() : oid_to_hex(&head_oid);
>> +			empty_tree_oid_hex(the_repository->hash_algo) : oid_to_hex(&head_oid);
>
> The hunk fragment shows that we already have a struct repository
> instance in this function which we should use instead of
> "the_repository"

As an internal helper function in add-interactive.c, all of whose
callers deal with "struct add_i_state *", it probably should not
even take "struct repository *" as a parameter.  The state knows
what repository we are working with.

>> diff --git a/add-patch.c b/add-patch.c
>> index 814de57c4a..86181770f2 100644
>> --- a/add-patch.c
>> +++ b/add-patch.c
>> @@ -420,7 +420,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>>   			    /* could be on an unborn branch */
>>   			    !strcmp("HEAD", s->revision) &&
>>   			    repo_get_oid(the_repository, "HEAD", &oid) ?
>> -			    empty_tree_oid_hex() : s->revision);
>> +			    empty_tree_oid_hex(the_repository->hash_algo) : s->revision);
>
> It's not obvious from this hunk but there is a repository instance in
> "s->s->r" which we should use instead of "the_repository"

I agree it is the same issue.

Just like a previous effort, making a "faithful" conversion from the
original that used the_repository implicitly by explicitly passing
the_repository in one patch, and then making semantics corrections
of the original (if we were ever working on a repository in s->r
that is different from the_repository, the existing code is already
buggy) in a separate patch, is a reasonable approach to limit the
cognitive load while reviewing the first step, I would say.

> diff --git a/sequencer.c b/sequencer.c
>> index 68d62a12ff..823691e379 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -2263,7 +2263,7 @@ static int do_pick_commit(struct repository *r,
>>   			unborn = 1;
>>   		} else if (unborn)
>>   			oidcpy(&head, the_hash_algo->empty_tree);
>> -		if (index_differs_from(r, unborn ? empty_tree_oid_hex() : "HEAD",
>> +		if (index_differs_from(r, unborn ? empty_tree_oid_hex(the_repository->hash_algo) : "HEAD",
>
> The hunk fragment shows that we already have a struct repository
> instance in "r" which we should use instead of "the_repository" here.

Yes, but the same "it is better to make a faithful conversion first,
corrections separately in a later step" would apply.

As the sequencer machinery is inherently stateful, I wonder if we
should pass around not "repository" but a sequencer state object
that may have a pointer to a repository in use.  But that of course
belongs to the latter, i.e., "making corrections", theme.
Patrick Steinhardt June 14, 2024, 5:23 a.m. UTC | #3
On Thu, Jun 13, 2024 at 08:39:18AM -0700, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> > diff --git a/sequencer.c b/sequencer.c
> >> index 68d62a12ff..823691e379 100644
> >> --- a/sequencer.c
> >> +++ b/sequencer.c
> >> @@ -2263,7 +2263,7 @@ static int do_pick_commit(struct repository *r,
> >>   			unborn = 1;
> >>   		} else if (unborn)
> >>   			oidcpy(&head, the_hash_algo->empty_tree);
> >> -		if (index_differs_from(r, unborn ? empty_tree_oid_hex() : "HEAD",
> >> +		if (index_differs_from(r, unborn ? empty_tree_oid_hex(the_repository->hash_algo) : "HEAD",
> >
> > The hunk fragment shows that we already have a struct repository
> > instance in "r" which we should use instead of "the_repository" here.
> 
> Yes, but the same "it is better to make a faithful conversion first,
> corrections separately in a later step" would apply.

Yeah, this is what I'm aiming for. In large patch series like this I
think it increases the risk of regression quite significantly if we also
try to deviate from the preceding code and do the "right" thing, even if
it is seemingly obvious. So I rather do a faithful conversion that does
not change the behaviour and leave conversion away from `the_repository`
to later steps after this series.

I can amend the commit message to say so.

Patrick
diff mbox series

Patch

diff --git a/add-interactive.c b/add-interactive.c
index b5d6cd689a..a0961096cd 100644
--- a/add-interactive.c
+++ b/add-interactive.c
@@ -557,7 +557,7 @@  static int get_modified_files(struct repository *r,
 		s.skip_unseen = filter && i;
 
 		opt.def = is_initial ?
-			empty_tree_oid_hex() : oid_to_hex(&head_oid);
+			empty_tree_oid_hex(the_repository->hash_algo) : oid_to_hex(&head_oid);
 
 		repo_init_revisions(r, &rev, NULL);
 		setup_revisions(0, NULL, &rev, &opt);
diff --git a/add-patch.c b/add-patch.c
index 814de57c4a..86181770f2 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -420,7 +420,7 @@  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			    /* could be on an unborn branch */
 			    !strcmp("HEAD", s->revision) &&
 			    repo_get_oid(the_repository, "HEAD", &oid) ?
-			    empty_tree_oid_hex() : s->revision);
+			    empty_tree_oid_hex(the_repository->hash_algo) : s->revision);
 	}
 	color_arg_index = args.nr;
 	/* Use `--no-color` explicitly, just in case `diff.color = always`. */
diff --git a/builtin/merge.c b/builtin/merge.c
index abe66311c7..bb94b7df21 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -330,7 +330,8 @@  static void read_empty(const struct object_id *oid)
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
 
-	strvec_pushl(&cmd.args, "read-tree", "-m", "-u", empty_tree_oid_hex(),
+	strvec_pushl(&cmd.args, "read-tree", "-m", "-u",
+		     empty_tree_oid_hex(the_repository->hash_algo),
 		     oid_to_hex(oid), NULL);
 	cmd.git_cmd = 1;
 
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index aa5ba27d17..41d5fb8e60 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1371,7 +1371,7 @@  static const char *push_to_deploy(unsigned char *sha1,
 	strvec_pushl(&child.args, "diff-index", "--quiet", "--cached",
 		     "--ignore-submodules",
 		     /* diff-index with either HEAD or an empty tree */
-		     head_has_history() ? "HEAD" : empty_tree_oid_hex(),
+		     head_has_history() ? "HEAD" : empty_tree_oid_hex(the_repository->hash_algo),
 		     "--", NULL);
 	strvec_pushv(&child.env, env->v);
 	child.no_stdin = 1;
diff --git a/hash-ll.h b/hash-ll.h
index 1000a9af22..3161c778b9 100644
--- a/hash-ll.h
+++ b/hash-ll.h
@@ -347,8 +347,7 @@  static inline int is_null_oid(const struct object_id *oid)
 	return !memcmp(oid->hash, null_hash, GIT_MAX_RAWSZ);
 }
 
-const char *empty_tree_oid_hex(void);
-const char *empty_blob_oid_hex(void);
+const char *empty_tree_oid_hex(const struct git_hash_algo *algop);
 
 static inline int is_empty_blob_oid(const struct object_id *oid,
 				    const struct git_hash_algo *algop)
diff --git a/object-file.c b/object-file.c
index bb97f8a809..72318c8dd4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -227,16 +227,10 @@  const struct object_id *null_oid(void)
 	return the_hash_algo->null_oid;
 }
 
-const char *empty_tree_oid_hex(void)
+const char *empty_tree_oid_hex(const struct git_hash_algo *algop)
 {
 	static char buf[GIT_MAX_HEXSZ + 1];
-	return oid_to_hex_r(buf, the_hash_algo->empty_tree);
-}
-
-const char *empty_blob_oid_hex(void)
-{
-	static char buf[GIT_MAX_HEXSZ + 1];
-	return oid_to_hex_r(buf, the_hash_algo->empty_blob);
+	return oid_to_hex_r(buf, algop->empty_tree);
 }
 
 int hash_algo_by_name(const char *name)
diff --git a/sequencer.c b/sequencer.c
index 68d62a12ff..823691e379 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2263,7 +2263,7 @@  static int do_pick_commit(struct repository *r,
 			unborn = 1;
 		} else if (unborn)
 			oidcpy(&head, the_hash_algo->empty_tree);
-		if (index_differs_from(r, unborn ? empty_tree_oid_hex() : "HEAD",
+		if (index_differs_from(r, unborn ? empty_tree_oid_hex(the_repository->hash_algo) : "HEAD",
 				       NULL, 0))
 			return error_dirty_index(r, opts);
 	}
diff --git a/submodule.c b/submodule.c
index 759cf1e1cd..caf3aa5600 100644
--- a/submodule.c
+++ b/submodule.c
@@ -2119,7 +2119,7 @@  static void submodule_reset_index(const char *path, const char *super_prefix)
 	strvec_pushf(&cp.args, "--super-prefix=%s%s/",
 		     (super_prefix ? super_prefix : ""), path);
 
-	strvec_push(&cp.args, empty_tree_oid_hex());
+	strvec_push(&cp.args, empty_tree_oid_hex(the_repository->hash_algo));
 
 	if (run_command(&cp))
 		die(_("could not reset submodule index"));
@@ -2229,9 +2229,9 @@  int submodule_move_head(const char *path, const char *super_prefix,
 		strvec_push(&cp.args, "-m");
 
 	if (!(flags & SUBMODULE_MOVE_HEAD_FORCE))
-		strvec_push(&cp.args, old_head ? old_head : empty_tree_oid_hex());
+		strvec_push(&cp.args, old_head ? old_head : empty_tree_oid_hex(the_repository->hash_algo));
 
-	strvec_push(&cp.args, new_head ? new_head : empty_tree_oid_hex());
+	strvec_push(&cp.args, new_head ? new_head : empty_tree_oid_hex(the_repository->hash_algo));
 
 	if (run_command(&cp)) {
 		ret = error(_("Submodule '%s' could not be updated."), path);
diff --git a/wt-status.c b/wt-status.c
index ff4be071ca..5051f5e599 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -641,7 +641,7 @@  static void wt_status_collect_changes_index(struct wt_status *s)
 
 	repo_init_revisions(s->repo, &rev, NULL);
 	memset(&opt, 0, sizeof(opt));
-	opt.def = s->is_initial ? empty_tree_oid_hex() : s->reference;
+	opt.def = s->is_initial ? empty_tree_oid_hex(the_repository->hash_algo) : s->reference;
 	setup_revisions(0, NULL, &rev, &opt);
 
 	rev.diffopt.flags.override_submodule_config = 1;
@@ -1136,7 +1136,7 @@  static void wt_longstatus_print_verbose(struct wt_status *s)
 	rev.diffopt.ita_invisible_in_index = 1;
 
 	memset(&opt, 0, sizeof(opt));
-	opt.def = s->is_initial ? empty_tree_oid_hex() : s->reference;
+	opt.def = s->is_initial ? empty_tree_oid_hex(the_repository->hash_algo) : s->reference;
 	setup_revisions(0, NULL, &rev, &opt);
 
 	rev.diffopt.output_format |= DIFF_FORMAT_PATCH;