diff mbox series

[v4,04/11] commit-reach(paint_down_to_common): prepare for handling shallow commits

Message ID 837aa5a89c640427a5de064da93f1de4934d8212.1709113458.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 2d2da172f37f5a406a0f5a099cd268bcb1bd7d42
Headers show
Series The merge-base logic vs missing commit objects | expand

Commit Message

Johannes Schindelin Feb. 28, 2024, 9:44 a.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

When `git fetch --update-shallow` needs to test for commit ancestry, it
can naturally run into a missing object (e.g. if it is a parent of a
shallow commit). For the purpose of `--update-shallow`, this needs to be
treated as if the child commit did not even have that parent, i.e. the
commit history needs to be clamped.

For all other scenarios, clamping the commit history is actually a bug,
as it would hide repository corruption (for an analysis regarding
shallow and partial clones, see the analysis further down).

Add a flag to optionally ask the function to ignore missing commits, as
`--update-shallow` needs it to, while detecting missing objects as a
repository corruption error by default.

This flag is needed, and cannot replaced by `is_repository_shallow()` to
indicate that situation, because that function would return 0 in the
`--update-shallow` scenario: There is not actually a `shallow` file in
that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
with receive.updateshallow on") and t5538.4 ("add new shallow root with
receive.updateshallow on").

Note: shallow commits' parents are set to `NULL` internally already,
therefore there is no need to special-case shallow repositories here, as
the merge-base logic will not try to access parent commits of shallow
commits.

Likewise, partial clones aren't an issue either: If a commit is missing
during the revision walk in the merge-base logic, it is fetched via
`promisor_remote_get_direct()`. And not only the single missing commit
object: Due to the way the "promised" objects are fetched (in
`fetch_objects()` in `promisor-remote.c`, using `fetch
--filter=blob:none`), there is no actual way to fetch a single commit
object, as the remote side will pass that commit OID to `pack-objects
--revs [...]` which in turn passes it to `rev-list` which interprets
this as a commit _range_ instead of a single object. Therefore, in
partial clones (unless they are shallow in addition), all commits
reachable from a commit that is in the local object database are also
present in that local database.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 commit-reach.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Feb. 28, 2024, 8:24 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> When `git fetch --update-shallow` needs to test for commit ancestry, it
> can naturally run into a missing object (e.g. if it is a parent of a
> shallow commit). For the purpose of `--update-shallow`, this needs to be
> treated as if the child commit did not even have that parent, i.e. the
> commit history needs to be clamped.
>
> For all other scenarios, clamping the commit history is actually a bug,
> as it would hide repository corruption (for an analysis regarding
> shallow and partial clones, see the analysis further down).
>
> Add a flag to optionally ask the function to ignore missing commits, as
> `--update-shallow` needs it to, while detecting missing objects as a
> repository corruption error by default.
>
> This flag is needed, and cannot replaced by `is_repository_shallow()` to
> indicate that situation, because that function would return 0 in the
> `--update-shallow` scenario: There is not actually a `shallow` file in
> that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
> with receive.updateshallow on") and t5538.4 ("add new shallow root with
> receive.updateshallow on").

Nicely written.

The description above that has been totally revamped reads much much
clearer, at least to me, compared to the previous round.  

Should we declare the topic done and mark it for 'next'?

Thanks.
Dirk Gouders Feb. 28, 2024, 8:59 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>>
>> When `git fetch --update-shallow` needs to test for commit ancestry, it
>> can naturally run into a missing object (e.g. if it is a parent of a
>> shallow commit). For the purpose of `--update-shallow`, this needs to be
>> treated as if the child commit did not even have that parent, i.e. the
>> commit history needs to be clamped.
>>
>> For all other scenarios, clamping the commit history is actually a bug,
>> as it would hide repository corruption (for an analysis regarding
>> shallow and partial clones, see the analysis further down).
>>
>> Add a flag to optionally ask the function to ignore missing commits, as
>> `--update-shallow` needs it to, while detecting missing objects as a
>> repository corruption error by default.
>>
>> This flag is needed, and cannot replaced by `is_repository_shallow()` to
>> indicate that situation, because that function would return 0 in the
>> `--update-shallow` scenario: There is not actually a `shallow` file in
>> that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
>> with receive.updateshallow on") and t5538.4 ("add new shallow root with
>> receive.updateshallow on").
>
> Nicely written.
>
> The description above that has been totally revamped reads much much
> clearer, at least to me, compared to the previous round.  
>
> Should we declare the topic done and mark it for 'next'?
>
> Thanks.

I agree that this text reads much clearer -- even to me with close to
zero experience, here.

Thank you for taking the time to rewrite the text, Johannes.

Dirk
Johannes Schindelin Feb. 29, 2024, 9:53 a.m. UTC | #3
Hi Junio,

On Wed, 28 Feb 2024, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > When `git fetch --update-shallow` needs to test for commit ancestry, it
> > can naturally run into a missing object (e.g. if it is a parent of a
> > shallow commit). For the purpose of `--update-shallow`, this needs to be
> > treated as if the child commit did not even have that parent, i.e. the
> > commit history needs to be clamped.
> >
> > For all other scenarios, clamping the commit history is actually a bug,
> > as it would hide repository corruption (for an analysis regarding
> > shallow and partial clones, see the analysis further down).
> >
> > Add a flag to optionally ask the function to ignore missing commits, as
> > `--update-shallow` needs it to, while detecting missing objects as a
> > repository corruption error by default.
> >
> > This flag is needed, and cannot replaced by `is_repository_shallow()` to

Hrmpf. I just spotted the missing "be" between "cannot" and "replaced".

Junio, would you kindly amend the commit message accordingly?

> > indicate that situation, because that function would return 0 in the
> > `--update-shallow` scenario: There is not actually a `shallow` file in
> > that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
> > with receive.updateshallow on") and t5538.4 ("add new shallow root with
> > receive.updateshallow on").
>
> Nicely written.
>
> The description above that has been totally revamped reads much much
> clearer, at least to me, compared to the previous round.

Thank you!

> Should we declare the topic done and mark it for 'next'?

After you looked over the correctness of the patches, I would be
comfortable with that.

Thanks!
Johannes
Johannes Schindelin Feb. 29, 2024, 9:54 a.m. UTC | #4
Hi Dirk,

On Wed, 28 Feb 2024, Dirk Gouders wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> > writes:
> >
> >> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >>
> >> When `git fetch --update-shallow` needs to test for commit ancestry, it
> >> can naturally run into a missing object (e.g. if it is a parent of a
> >> shallow commit). For the purpose of `--update-shallow`, this needs to be
> >> treated as if the child commit did not even have that parent, i.e. the
> >> commit history needs to be clamped.
> >>
> >> For all other scenarios, clamping the commit history is actually a bug,
> >> as it would hide repository corruption (for an analysis regarding
> >> shallow and partial clones, see the analysis further down).
> >>
> >> Add a flag to optionally ask the function to ignore missing commits, as
> >> `--update-shallow` needs it to, while detecting missing objects as a
> >> repository corruption error by default.
> >>
> >> This flag is needed, and cannot replaced by `is_repository_shallow()` to
> >> indicate that situation, because that function would return 0 in the
> >> `--update-shallow` scenario: There is not actually a `shallow` file in
> >> that scenario, as demonstrated e.g. by t5537.10 ("add new shallow root
> >> with receive.updateshallow on") and t5538.4 ("add new shallow root with
> >> receive.updateshallow on").
> >
> > Nicely written.
> >
> > The description above that has been totally revamped reads much much
> > clearer, at least to me, compared to the previous round.
> >
> > Should we declare the topic done and mark it for 'next'?
> >
> > Thanks.
>
> I agree that this text reads much clearer -- even to me with close to
> zero experience, here.
>
> Thank you for taking the time to rewrite the text, Johannes.

Thank _you_ for taking the time to review the patches and help me with
improving them!

Ciao,
Johannes
diff mbox series

Patch

diff --git a/commit-reach.c b/commit-reach.c
index 5ff71d72d51..7112b10eeea 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -53,7 +53,8 @@  static int queue_has_nonstale(struct prio_queue *queue)
 static struct commit_list *paint_down_to_common(struct repository *r,
 						struct commit *one, int n,
 						struct commit **twos,
-						timestamp_t min_generation)
+						timestamp_t min_generation,
+						int ignore_missing_commits)
 {
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 	struct commit_list *result = NULL;
@@ -108,6 +109,13 @@  static struct commit_list *paint_down_to_common(struct repository *r,
 			if (repo_parse_commit(r, p)) {
 				clear_prio_queue(&queue);
 				free_commit_list(result);
+				/*
+				 * At this stage, we know that the commit is
+				 * missing: `repo_parse_commit()` uses
+				 * `OBJECT_INFO_DIE_IF_CORRUPT` and therefore
+				 * corrupt commits would already have been
+				 * dispatched with a `die()`.
+				 */
 				return NULL;
 			}
 			p->object.flags |= flags;
@@ -143,7 +151,7 @@  static struct commit_list *merge_bases_many(struct repository *r,
 			return NULL;
 	}
 
-	list = paint_down_to_common(r, one, n, twos, 0);
+	list = paint_down_to_common(r, one, n, twos, 0, 0);
 
 	while (list) {
 		struct commit *commit = pop_commit(&list);
@@ -214,7 +222,7 @@  static int remove_redundant_no_gen(struct repository *r,
 				min_generation = curr_generation;
 		}
 		common = paint_down_to_common(r, array[i], filled,
-					      work, min_generation);
+					      work, min_generation, 0);
 		if (array[i]->object.flags & PARENT2)
 			redundant[i] = 1;
 		for (j = 0; j < filled; j++)
@@ -504,7 +512,7 @@  int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     generation);
+				     generation, ignore_missing_commits);
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);