diff mbox series

[v3,04/11] Prepare `paint_down_to_common()` for handling shallow commits

Message ID 84e7fbc07e08956e6c493baf499fee455887b16c.1709040499.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series The merge-base logic vs missing commit objects | expand

Commit Message

Johannes Schindelin Feb. 27, 2024, 1:28 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

When `git fetch --update-shallow` needs to test for commit ancestry, it
can naturally run into a missing object (e.g. if it is a parent of a
shallow commit). To accommodate, the merge base logic will need to be
able to handle this situation gracefully.

Currently, that logic pretends that a missing parent commit is
equivalent to a missing parent commit, and for the purpose of
`--update-shallow` that is exactly what we need it to do.

Therefore, let's introduce a flag to indicate when that is precisely the
logic we want.

We need a flag, and cannot rely on `is_repository_shallow()` to indicate
that situation, because that function would return 0: There may not
actually be a `shallow` file, as demonstrated e.g. by t5537.10 ("add new
shallow root with receive.updateshallow on") and t5538.4 ("add new
shallow root with receive.updateshallow on").

Note: shallow commits' parents are set to `NULL` internally already,
therefore there is no need to special-case shallow repositories here, as
the merge-base logic will not try to access parent commits of shallow
commits.

Likewise, partial clones aren't an issue either: If a commit is missing
during the revision walk in the merge-base logic, it is fetched via
`promisor_remote_get_direct()`. And not only the single missing commit
object: Due to the way the "promised" objects are fetched (in
`fetch_objects()` in `promisor-remote.c`, using `fetch
--filter=blob:none`), there is no actual way to fetch a single commit
object, as the remote side will pass that commit OID to `pack-objects
--revs [...]` which in turn passes it to `rev-list` which interprets
this as a commit _range_ instead of a single object. Therefore, in
partial clones (unless they are shallow in addition), all commits
reachable from a commit that is in the local object database are also
present in that local database.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 commit-reach.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Dirk Gouders Feb. 27, 2024, 2:46 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Currently, that logic pretends that a missing parent commit is
                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> equivalent to a missing parent commit, and for the purpose of
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                  
> `--update-shallow` that is exactly what we need it to do.

Chances are that I am wrong, but to me the above sounds very irritating.

> Therefore, let's introduce a flag to indicate when that is precisely the
> logic we want.
>
> We need a flag, and cannot rely on `is_repository_shallow()` to indicate
> that situation, because that function would return 0: There may not
> actually be a `shallow` file, as demonstrated e.g. by t5537.10 ("add new

Again, I'm not a native speaker but I understand the above as
"There may not even be an existing `shallow` file...".

I'm not sure -- the sentence just "feels" uncomfortable to me.

Dirk
Johannes Schindelin Feb. 27, 2024, 3 p.m. UTC | #2
Hi Dirk,

On Tue, 27 Feb 2024, Dirk Gouders wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Currently, that logic pretends that a missing parent commit is
>                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > equivalent to a missing parent commit, and for the purpose of
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > `--update-shallow` that is exactly what we need it to do.
>
> Chances are that I am wrong, but to me the above sounds very irritating.

Not only that, it's also wrong 
Junio C Hamano Feb. 27, 2024, 6:08 p.m. UTC | #3
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> 	Currently, that logic pretends that a commit whose parent
> 	commit is missing is a root commit (and likewise merge commits
> 	with missing parent commits are handled incorrectly, too).
> 	However, for the purpose of `--update-shallow` that is exactly
> 	what we need to do (and only then).

I suspect that what made it harder to follow in the original
construct is that we called the behaviour "incorrect" upfront and
then come back with "that incorrectness is what we want".  I wonder
if it makes it easier to follow by flipping them around.

    For the purpose of `--update-shallow`, when some of the parent
    commits of a commit are missing from the repository, we need to
    treat as if the parents of the commit are only the ones that do
    exist in the repository and these missing commits have no
    ancestry relationship with it.  If all its parent commits are
    missing, the commit needs to be treated as if it were a root
    commit.

    Add a flag to optionally ask for such a behaviour, while
    detecting missing objects as a repository corruption error by
    default ...

or something?

> 	Therefore [...]
>
> Better?
>
> Ciao,
> Johannes
Junio C Hamano Feb. 27, 2024, 6:10 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> I suspect that what made it harder to follow in the original
> construct is that we called the behaviour "incorrect" upfront and
> then come back with "that incorrectness is what we want".

Oh, I forgot to say that lack of "<area>: " on the title of the
earlier parts of the series were a bit uncomfortable to read.
Dirk Gouders Feb. 27, 2024, 7:07 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> 	Currently, that logic pretends that a commit whose parent
>> 	commit is missing is a root commit (and likewise merge commits
>> 	with missing parent commits are handled incorrectly, too).
>> 	However, for the purpose of `--update-shallow` that is exactly
>> 	what we need to do (and only then).
>
> I suspect that what made it harder to follow in the original
> construct is that we called the behaviour "incorrect" upfront and
> then come back with "that incorrectness is what we want".  I wonder
> if it makes it easier to follow by flipping them around.
>
>     For the purpose of `--update-shallow`, when some of the parent
>     commits of a commit are missing from the repository, we need to
>     treat as if the parents of the commit are only the ones that do
>     exist in the repository and these missing commits have no
>     ancestry relationship with it.  If all its parent commits are
>     missing, the commit needs to be treated as if it were a root
>     commit.
>
>     Add a flag to optionally ask for such a behaviour, while
>     detecting missing objects as a repository corruption error by
>     default ...
>
> or something?

At least to me this is more understandable.

Dirk
diff mbox series

Patch

diff --git a/commit-reach.c b/commit-reach.c
index 5ff71d72d51..7112b10eeea 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -53,7 +53,8 @@  static int queue_has_nonstale(struct prio_queue *queue)
 static struct commit_list *paint_down_to_common(struct repository *r,
 						struct commit *one, int n,
 						struct commit **twos,
-						timestamp_t min_generation)
+						timestamp_t min_generation,
+						int ignore_missing_commits)
 {
 	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 	struct commit_list *result = NULL;
@@ -108,6 +109,13 @@  static struct commit_list *paint_down_to_common(struct repository *r,
 			if (repo_parse_commit(r, p)) {
 				clear_prio_queue(&queue);
 				free_commit_list(result);
+				/*
+				 * At this stage, we know that the commit is
+				 * missing: `repo_parse_commit()` uses
+				 * `OBJECT_INFO_DIE_IF_CORRUPT` and therefore
+				 * corrupt commits would already have been
+				 * dispatched with a `die()`.
+				 */
 				return NULL;
 			}
 			p->object.flags |= flags;
@@ -143,7 +151,7 @@  static struct commit_list *merge_bases_many(struct repository *r,
 			return NULL;
 	}
 
-	list = paint_down_to_common(r, one, n, twos, 0);
+	list = paint_down_to_common(r, one, n, twos, 0, 0);
 
 	while (list) {
 		struct commit *commit = pop_commit(&list);
@@ -214,7 +222,7 @@  static int remove_redundant_no_gen(struct repository *r,
 				min_generation = curr_generation;
 		}
 		common = paint_down_to_common(r, array[i], filled,
-					      work, min_generation);
+					      work, min_generation, 0);
 		if (array[i]->object.flags & PARENT2)
 			redundant[i] = 1;
 		for (j = 0; j < filled; j++)
@@ -504,7 +512,7 @@  int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
 
 	bases = paint_down_to_common(r, commit,
 				     nr_reference, reference,
-				     generation);
+				     generation, ignore_missing_commits);
 	if (commit->object.flags & PARENT2)
 		ret = 1;
 	clear_commit_marks(commit, all_flags);