diff mbox series

[3/8] submodule: make static functions read submodules from commits

Message ID 20220210044152.78352-4-chooglen@google.com (mailing list archive)
State Superseded
Headers show
Series fetch --recurse-submodules: fetch unpopulated submodules | expand

Commit Message

Glen Choo Feb. 10, 2022, 4:41 a.m. UTC
A future commit will teach "fetch --recurse-submodules" to fetch
unpopulated submodules. Prepare for this by teaching the necessary
static functions to read submodules from superproject commits instead of
the index and filesystem. Then, store the necessary fields (path and
super_oid), and use them in "fetch --recurse-submodules" where possible.

As a result, "git fetch" now reads changed submodules using the
`.gitmodules` and path from super_oid's tree (which is where "git fetch"
actually noticed the changed submodule) instead of the filesystem.

Signed-off-by: Glen Choo <chooglen@google.com>
---
 submodule.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

Comments

Jonathan Tan Feb. 10, 2022, 7:15 p.m. UTC | #1
Glen Choo <chooglen@google.com> writes:
> As a result, "git fetch" now reads changed submodules using the
> `.gitmodules` and path from super_oid's tree (which is where "git fetch"
> actually noticed the changed submodule) instead of the filesystem.

Could we have a test showing what has changed?

> @@ -1029,7 +1044,7 @@ static int submodule_needs_pushing(struct repository *r,
>  				   const char *path,
>  				   struct oid_array *commits)
>  {
> -	if (!submodule_has_commits(r, path, commits))
> +	if (!submodule_has_commits(r, path, null_oid(), commits))

This confused me at first, but I see that this code is not for fetching,
but for pushing. This patch set concerns itself with fetching, so
passing null_oid() to preserve existing behavior is good.

> @@ -1414,12 +1429,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path)
>  }
>  
>  static struct fetch_task *fetch_task_create(struct repository *r,
> -					    const char *path)
> +					    const char *path,
> +					    const struct object_id *treeish_name)
>  {
>  	struct fetch_task *task = xmalloc(sizeof(*task));
>  	memset(task, 0, sizeof(*task));
>  
> -	task->sub = submodule_from_path(r, null_oid(), path);
> +	task->sub = submodule_from_path(r, treeish_name, path);

If there is not a good reason to have "path" before "treeish_name",
probably best to maintain the same parameter order as
submodule_from_path().

> @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp,
>  		if (!S_ISGITLINK(ce->ce_mode))
>  			continue;
>  
> -		task = fetch_task_create(spf->r, ce->name);
> +		task = fetch_task_create(spf->r, ce->name, null_oid());

Hmm...is the plumbing incomplete? This code is about fetching, but we're
not passing any superproject commit OID here. If this will be fixed in a
future commit, maybe the distribution of what goes into each commit
needs to be revised.

> @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp,
>  			continue;
>  		}
>  
> -		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
> +		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());

Same comment here.
Glen Choo Feb. 11, 2022, 10:07 a.m. UTC | #2
Hm, after reading your feedback, I'm starting to question whether this
patch makes sense in its current form.

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> As a result, "git fetch" now reads changed submodules using the
>> `.gitmodules` and path from super_oid's tree (which is where "git fetch"
>> actually noticed the changed submodule) instead of the filesystem.
>
> Could we have a test showing what has changed?

Looking at this closer, I don't think a test is feasible, but even more
importantly, I don't think this behavior change is even desirable at
all..

I was confused when I wrote the commit message. Prior to this patch,
"git fetch" already records the names of changed submodules by correctly
reading .gitmodules from the given commit. From
collect_changed_submodules_cb(): 

		submodule = submodule_from_path(me->repo,
						commit_oid, p->two->path);
    [...]
		item = string_list_insert(changed, name);
		if (!item->util)
			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
		cs_data = item->util;
		oid_array_append(&cs_data->new_commits, &p->two->oid);

The only behavior that actually _does_ change is that we plumb super_oid
through submodule_has_commits(). "git fetch" invokes this function to
figure out if it already has all of the needed commits, and if so, skip
the fetch.

Before plumbing super_oid, we could not check for commits in an
unpopulated submodule, but now we do. We will need this when we fetch in
unpopulated submodules, but as of this patch, we never fetch in
unpopulated submodules anyway, so this check is just wasted effort.

And because we never fetch anyway, we can't test any meaningful
behavior. We could check whether or not we check the submodule odb, but
that's a lot of effort to spend on something we don't need.

So we probably don't want this behavior change. I can preserve the
existing behavior by passing null_oid() instead, and pass super_oid in
the actual "fetch unpopulated submodules" patch.

>> @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp,
>>  		if (!S_ISGITLINK(ce->ce_mode))
>>  			continue;
>>  
>> -		task = fetch_task_create(spf->r, ce->name);
>> +		task = fetch_task_create(spf->r, ce->name, null_oid());
>
> Hmm...is the plumbing incomplete? This code is about fetching, but we're
> not passing any superproject commit OID here. If this will be fixed in a
> future commit, maybe the distribution of what goes into each commit
> needs to be revised.
>
>> @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp,
>>  			continue;
>>  		}
>>  
>> -		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
>> +		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
>
> Same comment here.

This is intentional (but I admit that I also got confused rereading
this). This should be null_oid() (i.e. read from the filesystem) because
we are iterating through the index to get submodule paths. So we should
pass null_oid() so that we read .gitmodules from the filesystem, not a
possibly out-of-sync superproject commit.
Glen Choo Feb. 11, 2022, 10:09 a.m. UTC | #3
Hm, after reading your feedback, I'm starting to question whether this
patch makes sense in its current form.

Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> As a result, "git fetch" now reads changed submodules using the
>> `.gitmodules` and path from super_oid's tree (which is where "git fetch"
>> actually noticed the changed submodule) instead of the filesystem.
>
> Could we have a test showing what has changed?

Looking at this closer, I don't think a test is feasible, but even more
importantly, I don't think this behavior change is even desirable at
all..

I was confused when I wrote the commit message. Prior to this patch,
"git fetch" already records the names of changed submodules by correctly
reading .gitmodules from the given commit. From
collect_changed_submodules_cb(): 

		submodule = submodule_from_path(me->repo,
						commit_oid, p->two->path);
    [...]
		item = string_list_insert(changed, name);
		if (!item->util)
			item->util = xcalloc(1, sizeof(struct changed_submodule_data));
		cs_data = item->util;
		oid_array_append(&cs_data->new_commits, &p->two->oid);

The only behavior that actually _does_ change is that we plumb super_oid
through submodule_has_commits(). "git fetch" invokes this function to
figure out if it already has all of the needed commits, and if so, skip
the fetch.

Before plumbing super_oid, we could not check for commits in an
unpopulated submodule, but now we do. We will need this when we fetch in
unpopulated submodules, but as of this patch, we never fetch in
unpopulated submodules anyway, so this check is just wasted effort.

And because we never fetch anyway, we can't test any meaningful
behavior. We could check whether or not we check the submodule odb, but
that's a lot of effort to spend on something we don't need.

So we probably don't want this behavior change. I can preserve the
existing behavior by passing null_oid() instead, and pass super_oid in
the actual "fetch unpopulated submodules" patch.

>> @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp,
>>  		if (!S_ISGITLINK(ce->ce_mode))
>>  			continue;
>>  
>> -		task = fetch_task_create(spf->r, ce->name);
>> +		task = fetch_task_create(spf->r, ce->name, null_oid());
>
> Hmm...is the plumbing incomplete? This code is about fetching, but we're
> not passing any superproject commit OID here. If this will be fixed in a
> future commit, maybe the distribution of what goes into each commit
> needs to be revised.
>
>> @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp,
>>  			continue;
>>  		}
>>  
>> -		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
>> +		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
>
> Same comment here.

This is intentional (but I admit that I also got confused rereading
this). This should be null_oid() (i.e. read from the filesystem) because
we are iterating through the index to get submodule paths. So we should
pass null_oid() so that we read .gitmodules from the filesystem, not a
possibly out-of-sync superproject commit.
diff mbox series

Patch

diff --git a/submodule.c b/submodule.c
index e2405c9f15..d4227ac22d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -811,6 +811,16 @@  static const char *default_name_or_path(const char *path_or_name)
  * member of the changed submodule string_list_item.
  */
 struct changed_submodule_data {
+	/*
+	 * The first superproject commit in the rev walk that points to the
+	 * submodule.
+	 */
+	const struct object_id *super_oid;
+	/*
+	 * Path to the submodule in the superproject commit referenced
+	 * by 'super_oid'.
+	 */
+	char *path;
 	/* The submodule commits that have changed in the rev walk. */
 	struct oid_array *new_commits;
 };
@@ -819,6 +829,7 @@  static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
 {
 	oid_array_clear(cs_data->new_commits);
 	free(cs_data->new_commits);
+	free(cs_data->path);
 }
 
 static void collect_changed_submodules_cb(struct diff_queue_struct *q,
@@ -869,6 +880,8 @@  static void collect_changed_submodules_cb(struct diff_queue_struct *q,
 			cs_data = xcalloc(1, sizeof(struct changed_submodule_data));
 			/* NEEDSWORK: should we have oid_array_init()? */
 			cs_data->new_commits = xcalloc(1, sizeof(struct oid_array));
+			cs_data->super_oid = commit_oid;
+			cs_data->path = xstrdup(p->two->path);
 			item->util = cs_data;
 		}
 		oid_array_append(cs_data->new_commits, &p->two->oid);
@@ -944,6 +957,7 @@  struct has_commit_data {
 	struct repository *repo;
 	int result;
 	const char *path;
+	const struct object_id *super_oid;
 };
 
 static int check_has_commit(const struct object_id *oid, void *data)
@@ -952,7 +966,7 @@  static int check_has_commit(const struct object_id *oid, void *data)
 	struct repository subrepo;
 	enum object_type type;
 
-	if (repo_submodule_init(&subrepo, cb->repo, cb->path, null_oid())) {
+	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
 		goto cleanup;
 	}
@@ -980,9 +994,10 @@  static int check_has_commit(const struct object_id *oid, void *data)
 
 static int submodule_has_commits(struct repository *r,
 				 const char *path,
+				 const struct object_id *super_oid,
 				 struct oid_array *commits)
 {
-	struct has_commit_data has_commit = { r, 1, path };
+	struct has_commit_data has_commit = { r, 1, path, super_oid };
 
 	/*
 	 * Perform a cheap, but incorrect check for the existence of 'commits'.
@@ -1029,7 +1044,7 @@  static int submodule_needs_pushing(struct repository *r,
 				   const char *path,
 				   struct oid_array *commits)
 {
-	if (!submodule_has_commits(r, path, commits))
+	if (!submodule_has_commits(r, path, null_oid(), commits))
 		/*
 		 * NOTE: We do consider it safe to return "no" here. The
 		 * correct answer would be "We do not know" instead of
@@ -1280,7 +1295,7 @@  static void calculate_changed_submodule_paths(struct repository *r,
 		const struct submodule *submodule;
 		const char *path = NULL;
 
-		submodule = submodule_from_name(r, null_oid(), name->string);
+		submodule = submodule_from_name(r, cs_data->super_oid, name->string);
 		if (submodule)
 			path = submodule->path;
 		else
@@ -1289,7 +1304,7 @@  static void calculate_changed_submodule_paths(struct repository *r,
 		if (!path)
 			continue;
 
-		if (submodule_has_commits(r, path, cs_data->new_commits)) {
+		if (submodule_has_commits(r, path, cs_data->super_oid, cs_data->new_commits)) {
 			oid_array_clear(cs_data->new_commits);
 			*name->string = '\0';
 		}
@@ -1414,12 +1429,13 @@  static const struct submodule *get_non_gitmodules_submodule(const char *path)
 }
 
 static struct fetch_task *fetch_task_create(struct repository *r,
-					    const char *path)
+					    const char *path,
+					    const struct object_id *treeish_name)
 {
 	struct fetch_task *task = xmalloc(sizeof(*task));
 	memset(task, 0, sizeof(*task));
 
-	task->sub = submodule_from_path(r, null_oid(), path);
+	task->sub = submodule_from_path(r, treeish_name, path);
 	if (!task->sub) {
 		/*
 		 * No entry in .gitmodules? Technically not a submodule,
@@ -1451,11 +1467,12 @@  static void fetch_task_release(struct fetch_task *p)
 }
 
 static struct repository *get_submodule_repo_for(struct repository *r,
-						 const char *path)
+						 const char *path,
+						 const struct object_id *treeish_name)
 {
 	struct repository *ret = xmalloc(sizeof(*ret));
 
-	if (repo_submodule_init(ret, r, path, null_oid())) {
+	if (repo_submodule_init(ret, r, path, treeish_name)) {
 		free(ret);
 		return NULL;
 	}
@@ -1476,7 +1493,7 @@  static int get_next_submodule(struct child_process *cp,
 		if (!S_ISGITLINK(ce->ce_mode))
 			continue;
 
-		task = fetch_task_create(spf->r, ce->name);
+		task = fetch_task_create(spf->r, ce->name, null_oid());
 		if (!task)
 			continue;
 
@@ -1499,7 +1516,7 @@  static int get_next_submodule(struct child_process *cp,
 			continue;
 		}
 
-		task->repo = get_submodule_repo_for(spf->r, task->sub->path);
+		task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid());
 		if (task->repo) {
 			struct strbuf submodule_prefix = STRBUF_INIT;
 			child_process_init(cp);