diff mbox series

[v3,1/9] tree: do not use the_repository for tree traversal methods.

Message ID 79959a54eb4c1a0812b1f4643530069a63e549f4.1665973401.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series archive: Add --recurse-submodules to git-archive command | expand

Commit Message

Heather Lapointe Oct. 17, 2022, 2:23 a.m. UTC
From: Alphadelta14 <alpha@alphaservcomputing.solutions>

Expect that tree walking may switch repository contexts for cases
such as submodules.
Added compatibility macros for existing cases.

Annotate an existing issue where repo is wrong when traversing.

Signed-off-by: Heather Lapointe <alpha@alphaservcomputing.solutions>
---
 tree.c | 15 +++++++++------
 tree.h | 14 ++++++++++----
 2 files changed, 19 insertions(+), 10 deletions(-)

Comments

Junio C Hamano Oct. 17, 2022, 1:26 p.m. UTC | #1
"Alphadelta14 via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Alphadelta14 <alpha@alphaservcomputing.solutions>

I'll fix this line to match all the other patches in the series
before applying.

> Expect that tree walking may switch repository contexts for cases
> such as submodules.
> Added compatibility macros for existing cases.
>
> Annotate an existing issue where repo is wrong when traversing.
>
> Signed-off-by: Heather Lapointe <alpha@alphaservcomputing.solutions>
> ---


> @@ -58,7 +58,11 @@ int read_tree_at(struct repository *r,
>  				    oid_to_hex(&entry.oid),
>  				    base->buf, entry.path);
>  
> -			if (parse_commit(commit))
> +			// FIXME: This is the wrong repo instance (it refers to the superproject)
> +			// it will always fail as is (will fix in later patch)
> +			// This current codepath isn't executed by any existing callbacks
> +			// so it wouldn't show up as an issue at this time.

	/*
	 * We write our multi-line comments
	 * this way.
	 */

My suspicion is that the if/else if/ cascade for GITLINK assumes
that the caller earlier did add_submodule_odb() to make sure any
object it needs should be available via the_repository->objects
object store.  If your caller (presumably "archive that is trying to
learn the --recurse-submodules option") hasn't learned to do so yet
at this step, it is understandable if it fails.

> +			if (repo_parse_commit(r, commit))
>  				die("Invalid commit %s in submodule path %s%s",
>  				    oid_to_hex(&entry.oid),
>  				    base->buf, entry.path);

> +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
> +#define parse_tree(tree) repo_parse_tree(the_repository, tree)
> +#define parse_tree_gently(tree, quiet_on_missing) repo_parse_tree_gently(the_repository, tree, quiet_on_missing)
> +#define parse_tree_indirect(oid) repo_parse_tree_indirect(the_repository, oid)
> +#endif

Good.
Glen Choo Oct. 26, 2022, 10:33 p.m. UTC | #2
"Alphadelta14 via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/tree.h b/tree.h
> index 6efff003e21..cc6402e4738 100644
> --- a/tree.h
> +++ b/tree.h
> @@ -18,15 +18,21 @@ struct tree *lookup_tree(struct repository *r, const struct object_id *oid);
>  
>  int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
>  
> -int parse_tree_gently(struct tree *tree, int quiet_on_missing);
> -static inline int parse_tree(struct tree *tree)
> +int repo_parse_tree_gently(struct repository *r, struct tree *tree, int quiet_on_missing);
> +static inline int repo_parse_tree(struct repository *r, struct tree *tree)
>  {
> -	return parse_tree_gently(tree, 0);
> +	return repo_parse_tree_gently(r, tree, 0);
>  }
> +
> +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
> +#define parse_tree(tree) repo_parse_tree(the_repository, tree)
> +#define parse_tree_gently(tree, quiet_on_missing) repo_parse_tree_gently(the_repository, tree, quiet_on_missing)
> +#define parse_tree_indirect(oid) repo_parse_tree_indirect(the_repository, oid)
> +#endif

Typically, when we have repo_* and non-repo_* variants, we use a "static
inline" function, e.g. from refs.h:

  int repo_dwim_ref(struct repository *r, const char *str, int len,
        struct object_id *oid, char **ref, int nonfatal_dangling_mark);

  static inline int dwim_ref(const char *str, int len, struct object_id *oid,
          char **ref, int nonfatal_dangling_mark)
  {
    return repo_dwim_ref(the_repository, str, len, oid, ref,
            nonfatal_dangling_mark);
  }

I think we should do the same here, instead of using "#ifndef
NO_THE_REPOSITORY_COMPATIBILITY_MACROS".

From I can gather from "git log -S
NO_THE_REPOSITORY_COMPATIBILITY_MACROS", that macro was introduced
in e675765235 (diff.c: remove implicit dependency on the_index,
2018-09-21) and all instances of that macro were introduced around that
time. At the time, there was an effort to get rid of the_repository and
the_index almost everywhere (except builtins), and the macro would
ensure that we did this successfully.

We did such a good job with the_index that we flipped the default from
NO_THE_INDEX_COMPATIBILITY_MACROS to USE_THE_INDEX_COMPATIBILITY_MACROS
(f8adbec9fe (cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch,
2019-01-24)) but it looks like we never got there with the_repository.
I couldn't find any instances of "#define
NO_THE_REPOSITORY_COMPATIBILITY_MACROS", so I think we should just use
"static inline" instead.

Alternatively, one could get rid of the non-repo_* variant and adjust
all existing callers to use "struct repository", but that's a lot of
churn and may conflict with other in-flight series, so that's probably
left for another time.

>  void free_tree_buffer(struct tree *tree);
>  
>  /* Parses and returns the tree in the given ent, chasing tags and commits. */
> -struct tree *parse_tree_indirect(const struct object_id *oid);
> +struct tree *repo_parse_tree_indirect(struct repository *r, const struct object_id *oid);
>  
>  int cmp_cache_name_compare(const void *a_, const void *b_);
>  
> -- 
> gitgitgadget
Jonathan Tan Oct. 27, 2022, 6:09 p.m. UTC | #3
First of all, let me echo what Glen said [1], that this series is  
overall well laid out and makes sense. 

Other reviewers have commented on style issues, but I'll hold off on 
making my comments on those and also possible improvements on commit 
messages until I can say "besides style and commit messages, I think 
that this series is good to merge in". 

[1] https://lore.kernel.org/git/kl6lr0yuqlk0.fsf@chooglen-macbookpro.roam.corp.google.com/

"Alphadelta14 via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +			// This current codepath isn't executed by any existing callbacks
> +			// so it wouldn't show up as an issue at this time.

I was a bit confused by this comment, so I looked at the surrounding  
code. I think it could be better rephrased as: 

  All existing callbacks at the time of writing cause this part of the  
  code to be skipped when S_ISGITLINK(entry.mode) is true, so this 
  wrong behavior does not call any issues.
Junio C Hamano Oct. 27, 2022, 6:50 p.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:

> First of all, let me echo what Glen said [1], that this series is  
> overall well laid out and makes sense. 
>
> Other reviewers have commented on style issues, but I'll hold off on 
> making my comments on those and also possible improvements on commit 
> messages until I can say "besides style and commit messages, I think 
> that this series is good to merge in". 
>
> [1] https://lore.kernel.org/git/kl6lr0yuqlk0.fsf@chooglen-macbookpro.roam.corp.google.com/
>
> "Alphadelta14 via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> +			// This current codepath isn't executed by any existing callbacks
>> +			// so it wouldn't show up as an issue at this time.
>
> I was a bit confused by this comment, so I looked at the surrounding  
> code. I think it could be better rephrased as: 
>
>   All existing callbacks at the time of writing cause this part of the  
>   code to be skipped when S_ISGITLINK(entry.mode) is true, so this 
>   wrong behavior does not call any issues. 
>  

As I already said, I do not think this is "wrong behaviour" to begin
with.  The current code requires that you'd use add_submodule_odb()
to make the objects in them accessible and if your program fails to
do so, as a very natural consequence, you'd not see objects pointed
by the gitlink.

Changing that assumption is OK as long as existing callers that
depend on the current semantics are not broken by such a change, but
I do not think "wrong behaviour does not call any issues" is a
correct analysis of the problem.
diff mbox series

Patch

diff --git a/tree.c b/tree.c
index 410e3b477e5..13f9173d45e 100644
--- a/tree.c
+++ b/tree.c
@@ -22,7 +22,7 @@  int read_tree_at(struct repository *r,
 	int len, oldlen = base->len;
 	enum interesting retval = entry_not_interesting;
 
-	if (parse_tree(tree))
+	if (repo_parse_tree(r, tree))
 		return -1;
 
 	init_tree_desc(&desc, tree->buffer, tree->size);
@@ -58,7 +58,11 @@  int read_tree_at(struct repository *r,
 				    oid_to_hex(&entry.oid),
 				    base->buf, entry.path);
 
-			if (parse_commit(commit))
+			// FIXME: This is the wrong repo instance (it refers to the superproject)
+			// it will always fail as is (will fix in later patch)
+			// This current codepath isn't executed by any existing callbacks
+			// so it wouldn't show up as an issue at this time.
+			if (repo_parse_commit(r, commit))
 				die("Invalid commit %s in submodule path %s%s",
 				    oid_to_hex(&entry.oid),
 				    base->buf, entry.path);
@@ -121,7 +125,7 @@  int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
 	return 0;
 }
 
-int parse_tree_gently(struct tree *item, int quiet_on_missing)
+int repo_parse_tree_gently(struct repository *r, struct tree *item, int quiet_on_missing)
 {
 	 enum object_type type;
 	 void *buffer;
@@ -129,7 +133,7 @@  int parse_tree_gently(struct tree *item, int quiet_on_missing)
 
 	if (item->object.parsed)
 		return 0;
-	buffer = read_object_file(&item->object.oid, &type, &size);
+	buffer = repo_read_object_file(r, &item->object.oid, &type, &size);
 	if (!buffer)
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
@@ -149,9 +153,8 @@  void free_tree_buffer(struct tree *tree)
 	tree->object.parsed = 0;
 }
 
-struct tree *parse_tree_indirect(const struct object_id *oid)
+struct tree *repo_parse_tree_indirect(struct repository *r, const struct object_id *oid)
 {
-	struct repository *r = the_repository;
 	struct object *obj = parse_object(r, oid);
 	return (struct tree *)repo_peel_to_type(r, NULL, 0, obj, OBJ_TREE);
 }
diff --git a/tree.h b/tree.h
index 6efff003e21..cc6402e4738 100644
--- a/tree.h
+++ b/tree.h
@@ -18,15 +18,21 @@  struct tree *lookup_tree(struct repository *r, const struct object_id *oid);
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
 
-int parse_tree_gently(struct tree *tree, int quiet_on_missing);
-static inline int parse_tree(struct tree *tree)
+int repo_parse_tree_gently(struct repository *r, struct tree *tree, int quiet_on_missing);
+static inline int repo_parse_tree(struct repository *r, struct tree *tree)
 {
-	return parse_tree_gently(tree, 0);
+	return repo_parse_tree_gently(r, tree, 0);
 }
+
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define parse_tree(tree) repo_parse_tree(the_repository, tree)
+#define parse_tree_gently(tree, quiet_on_missing) repo_parse_tree_gently(the_repository, tree, quiet_on_missing)
+#define parse_tree_indirect(oid) repo_parse_tree_indirect(the_repository, oid)
+#endif
 void free_tree_buffer(struct tree *tree);
 
 /* Parses and returns the tree in the given ent, chasing tags and commits. */
-struct tree *parse_tree_indirect(const struct object_id *oid);
+struct tree *repo_parse_tree_indirect(struct repository *r, const struct object_id *oid);
 
 int cmp_cache_name_compare(const void *a_, const void *b_);