Message ID | 20220210044152.78352-4-chooglen@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fetch --recurse-submodules: fetch unpopulated submodules | expand |
Glen Choo <chooglen@google.com> writes: > As a result, "git fetch" now reads changed submodules using the > `.gitmodules` and path from super_oid's tree (which is where "git fetch" > actually noticed the changed submodule) instead of the filesystem. Could we have a test showing what has changed? > @@ -1029,7 +1044,7 @@ static int submodule_needs_pushing(struct repository *r, > const char *path, > struct oid_array *commits) > { > - if (!submodule_has_commits(r, path, commits)) > + if (!submodule_has_commits(r, path, null_oid(), commits)) This confused me at first, but I see that this code is not for fetching, but for pushing. This patch set concerns itself with fetching, so passing null_oid() to preserve existing behavior is good. > @@ -1414,12 +1429,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path) > } > > static struct fetch_task *fetch_task_create(struct repository *r, > - const char *path) > + const char *path, > + const struct object_id *treeish_name) > { > struct fetch_task *task = xmalloc(sizeof(*task)); > memset(task, 0, sizeof(*task)); > > - task->sub = submodule_from_path(r, null_oid(), path); > + task->sub = submodule_from_path(r, treeish_name, path); If there is not a good reason to have "path" before "treeish_name", probably best to maintain the same parameter order as submodule_from_path(). > @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp, > if (!S_ISGITLINK(ce->ce_mode)) > continue; > > - task = fetch_task_create(spf->r, ce->name); > + task = fetch_task_create(spf->r, ce->name, null_oid()); Hmm...is the plumbing incomplete? This code is about fetching, but we're not passing any superproject commit OID here. If this will be fixed in a future commit, maybe the distribution of what goes into each commit needs to be revised. > @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp, > continue; > } > > - task->repo = get_submodule_repo_for(spf->r, task->sub->path); > + task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid()); Same comment here.
Hm, after reading your feedback, I'm starting to question whether this patch makes sense in its current form. Jonathan Tan <jonathantanmy@google.com> writes: > Glen Choo <chooglen@google.com> writes: >> As a result, "git fetch" now reads changed submodules using the >> `.gitmodules` and path from super_oid's tree (which is where "git fetch" >> actually noticed the changed submodule) instead of the filesystem. > > Could we have a test showing what has changed? Looking at this closer, I don't think a test is feasible, but even more importantly, I don't think this behavior change is even desirable at all.. I was confused when I wrote the commit message. Prior to this patch, "git fetch" already records the names of changed submodules by correctly reading .gitmodules from the given commit. From collect_changed_submodules_cb(): submodule = submodule_from_path(me->repo, commit_oid, p->two->path); [...] item = string_list_insert(changed, name); if (!item->util) item->util = xcalloc(1, sizeof(struct changed_submodule_data)); cs_data = item->util; oid_array_append(&cs_data->new_commits, &p->two->oid); The only behavior that actually _does_ change is that we plumb super_oid through submodule_has_commits(). "git fetch" invokes this function to figure out if it already has all of the needed commits, and if so, skip the fetch. Before plumbing super_oid, we could not check for commits in an unpopulated submodule, but now we do. We will need this when we fetch in unpopulated submodules, but as of this patch, we never fetch in unpopulated submodules anyway, so this check is just wasted effort. And because we never fetch anyway, we can't test any meaningful behavior. We could check whether or not we check the submodule odb, but that's a lot of effort to spend on something we don't need. So we probably don't want this behavior change. I can preserve the existing behavior by passing null_oid() instead, and pass super_oid in the actual "fetch unpopulated submodules" patch. >> @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp, >> if (!S_ISGITLINK(ce->ce_mode)) >> continue; >> >> - task = fetch_task_create(spf->r, ce->name); >> + task = fetch_task_create(spf->r, ce->name, null_oid()); > > Hmm...is the plumbing incomplete? This code is about fetching, but we're > not passing any superproject commit OID here. If this will be fixed in a > future commit, maybe the distribution of what goes into each commit > needs to be revised. > >> @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp, >> continue; >> } >> >> - task->repo = get_submodule_repo_for(spf->r, task->sub->path); >> + task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid()); > > Same comment here. This is intentional (but I admit that I also got confused rereading this). This should be null_oid() (i.e. read from the filesystem) because we are iterating through the index to get submodule paths. So we should pass null_oid() so that we read .gitmodules from the filesystem, not a possibly out-of-sync superproject commit.
Hm, after reading your feedback, I'm starting to question whether this patch makes sense in its current form. Jonathan Tan <jonathantanmy@google.com> writes: > Glen Choo <chooglen@google.com> writes: >> As a result, "git fetch" now reads changed submodules using the >> `.gitmodules` and path from super_oid's tree (which is where "git fetch" >> actually noticed the changed submodule) instead of the filesystem. > > Could we have a test showing what has changed? Looking at this closer, I don't think a test is feasible, but even more importantly, I don't think this behavior change is even desirable at all.. I was confused when I wrote the commit message. Prior to this patch, "git fetch" already records the names of changed submodules by correctly reading .gitmodules from the given commit. From collect_changed_submodules_cb(): submodule = submodule_from_path(me->repo, commit_oid, p->two->path); [...] item = string_list_insert(changed, name); if (!item->util) item->util = xcalloc(1, sizeof(struct changed_submodule_data)); cs_data = item->util; oid_array_append(&cs_data->new_commits, &p->two->oid); The only behavior that actually _does_ change is that we plumb super_oid through submodule_has_commits(). "git fetch" invokes this function to figure out if it already has all of the needed commits, and if so, skip the fetch. Before plumbing super_oid, we could not check for commits in an unpopulated submodule, but now we do. We will need this when we fetch in unpopulated submodules, but as of this patch, we never fetch in unpopulated submodules anyway, so this check is just wasted effort. And because we never fetch anyway, we can't test any meaningful behavior. We could check whether or not we check the submodule odb, but that's a lot of effort to spend on something we don't need. So we probably don't want this behavior change. I can preserve the existing behavior by passing null_oid() instead, and pass super_oid in the actual "fetch unpopulated submodules" patch. >> @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp, >> if (!S_ISGITLINK(ce->ce_mode)) >> continue; >> >> - task = fetch_task_create(spf->r, ce->name); >> + task = fetch_task_create(spf->r, ce->name, null_oid()); > > Hmm...is the plumbing incomplete? This code is about fetching, but we're > not passing any superproject commit OID here. If this will be fixed in a > future commit, maybe the distribution of what goes into each commit > needs to be revised. > >> @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp, >> continue; >> } >> >> - task->repo = get_submodule_repo_for(spf->r, task->sub->path); >> + task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid()); > > Same comment here. This is intentional (but I admit that I also got confused rereading this). This should be null_oid() (i.e. read from the filesystem) because we are iterating through the index to get submodule paths. So we should pass null_oid() so that we read .gitmodules from the filesystem, not a possibly out-of-sync superproject commit.
diff --git a/submodule.c b/submodule.c index e2405c9f15..d4227ac22d 100644 --- a/submodule.c +++ b/submodule.c @@ -811,6 +811,16 @@ static const char *default_name_or_path(const char *path_or_name) * member of the changed submodule string_list_item. */ struct changed_submodule_data { + /* + * The first superproject commit in the rev walk that points to the + * submodule. + */ + const struct object_id *super_oid; + /* + * Path to the submodule in the superproject commit referenced + * by 'super_oid'. + */ + char *path; /* The submodule commits that have changed in the rev walk. */ struct oid_array *new_commits; }; @@ -819,6 +829,7 @@ static void changed_submodule_data_clear(struct changed_submodule_data *cs_data) { oid_array_clear(cs_data->new_commits); free(cs_data->new_commits); + free(cs_data->path); } static void collect_changed_submodules_cb(struct diff_queue_struct *q, @@ -869,6 +880,8 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q, cs_data = xcalloc(1, sizeof(struct changed_submodule_data)); /* NEEDSWORK: should we have oid_array_init()? */ cs_data->new_commits = xcalloc(1, sizeof(struct oid_array)); + cs_data->super_oid = commit_oid; + cs_data->path = xstrdup(p->two->path); item->util = cs_data; } oid_array_append(cs_data->new_commits, &p->two->oid); @@ -944,6 +957,7 @@ struct has_commit_data { struct repository *repo; int result; const char *path; + const struct object_id *super_oid; }; static int check_has_commit(const struct object_id *oid, void *data) @@ -952,7 +966,7 @@ static int check_has_commit(const struct object_id *oid, void *data) struct repository subrepo; enum object_type type; - if (repo_submodule_init(&subrepo, cb->repo, cb->path, null_oid())) { + if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) { cb->result = 0; goto cleanup; } @@ -980,9 +994,10 @@ static int check_has_commit(const struct object_id *oid, void *data) static int submodule_has_commits(struct repository *r, const char *path, + const struct object_id *super_oid, struct oid_array *commits) { - struct has_commit_data has_commit = { r, 1, path }; + struct has_commit_data has_commit = { r, 1, path, super_oid }; /* * Perform a cheap, but incorrect check for the existence of 'commits'. @@ -1029,7 +1044,7 @@ static int submodule_needs_pushing(struct repository *r, const char *path, struct oid_array *commits) { - if (!submodule_has_commits(r, path, commits)) + if (!submodule_has_commits(r, path, null_oid(), commits)) /* * NOTE: We do consider it safe to return "no" here. The * correct answer would be "We do not know" instead of @@ -1280,7 +1295,7 @@ static void calculate_changed_submodule_paths(struct repository *r, const struct submodule *submodule; const char *path = NULL; - submodule = submodule_from_name(r, null_oid(), name->string); + submodule = submodule_from_name(r, cs_data->super_oid, name->string); if (submodule) path = submodule->path; else @@ -1289,7 +1304,7 @@ static void calculate_changed_submodule_paths(struct repository *r, if (!path) continue; - if (submodule_has_commits(r, path, cs_data->new_commits)) { + if (submodule_has_commits(r, path, cs_data->super_oid, cs_data->new_commits)) { oid_array_clear(cs_data->new_commits); *name->string = '\0'; } @@ -1414,12 +1429,13 @@ static const struct submodule *get_non_gitmodules_submodule(const char *path) } static struct fetch_task *fetch_task_create(struct repository *r, - const char *path) + const char *path, + const struct object_id *treeish_name) { struct fetch_task *task = xmalloc(sizeof(*task)); memset(task, 0, sizeof(*task)); - task->sub = submodule_from_path(r, null_oid(), path); + task->sub = submodule_from_path(r, treeish_name, path); if (!task->sub) { /* * No entry in .gitmodules? Technically not a submodule, @@ -1451,11 +1467,12 @@ static void fetch_task_release(struct fetch_task *p) } static struct repository *get_submodule_repo_for(struct repository *r, - const char *path) + const char *path, + const struct object_id *treeish_name) { struct repository *ret = xmalloc(sizeof(*ret)); - if (repo_submodule_init(ret, r, path, null_oid())) { + if (repo_submodule_init(ret, r, path, treeish_name)) { free(ret); return NULL; } @@ -1476,7 +1493,7 @@ static int get_next_submodule(struct child_process *cp, if (!S_ISGITLINK(ce->ce_mode)) continue; - task = fetch_task_create(spf->r, ce->name); + task = fetch_task_create(spf->r, ce->name, null_oid()); if (!task) continue; @@ -1499,7 +1516,7 @@ static int get_next_submodule(struct child_process *cp, continue; } - task->repo = get_submodule_repo_for(spf->r, task->sub->path); + task->repo = get_submodule_repo_for(spf->r, task->sub->path, null_oid()); if (task->repo) { struct strbuf submodule_prefix = STRBUF_INIT; child_process_init(cp);
A future commit will teach "fetch --recurse-submodules" to fetch unpopulated submodules. Prepare for this by teaching the necessary static functions to read submodules from superproject commits instead of the index and filesystem. Then, store the necessary fields (path and super_oid), and use them in "fetch --recurse-submodules" where possible. As a result, "git fetch" now reads changed submodules using the `.gitmodules` and path from super_oid's tree (which is where "git fetch" actually noticed the changed submodule) instead of the filesystem. Signed-off-by: Glen Choo <chooglen@google.com> --- submodule.c | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-)