diff mbox series

revision: allow missing promisor objects on CLI

Message ID 20191228003430.241283-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series revision: allow missing promisor objects on CLI | expand

Commit Message

Jonathan Tan Dec. 28, 2019, 12:34 a.m. UTC
Commit 4cf67869b2 ("list-objects.c: don't segfault for missing cmdline
objects", 2018-12-06) prevented some segmentation faults from occurring
by tightening handling of missing objects provided through the CLI: if
--ignore-missing is set, then it is OK (and the missing object ignored,
just like one would if encountered in traversal).

However, in the case that --ignore-missing is not set but
--exclude-promisor-objects is set, there is still no distinction between
the case wherein the missing object is a promisor object and the case
wherein it is not. This is unnecessarily restrictive, since if a missing
promisor object is encountered in traversal, it is ignored; likewise it
should be ignored if provided through the CLI. Therefore, distinguish
between these 2 cases. (As a bonus, the code is now simpler.)

(Note that this only affects handling of missing promisor objects.
Handling of non-missing promisor objects is already done by setting all
of them to UNINTERESTING in prepare_revision_walk().)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
For those curious, I discovered this while trying to extend the
check-targets-only optimization done in dfa33a298d ("clone: do faster
object check for partial clones", 2019-04-21) to fetches as well. While
investigating my previous work of adding check-target functionality in
addition to the usual connectivity check (for correctness, not for
performance) in 35f9e3e5e7 ("fetch: in partial clone, check presence of
targets", 2018-09-21), I discovered that at current master, the check
was somehow no longer needed because rev-list dies on missing objects on
CLI. But I don't think that the current behavior is obvious, hence this
commit (which also restores the need for the check-target
functionality).
---
 revision.c               | 10 +++++++++-
 t/t0410-partial-clone.sh | 10 ++--------
 2 files changed, 11 insertions(+), 9 deletions(-)

Comments

Junio C Hamano Dec. 28, 2019, 3:50 a.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

>  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
>  	if (!object)
> -		return revs->ignore_missing ? 0 : -1;
> +		/*
> +		 * Either this object is missing and ignore_missing is true, or
> +		 * this object is a (missing) promisor object and
> +		 * exclude_promisor_objects is true.

I had to guess and dig where these assertions are coming from; we
should not force future readers of the code to.

At least this comment must say why these assertions hold.  Say
something like "get_reference() yields NULL on only such and such
cases" before concluding with "and in any of these cases, we can
safely ignore it because ...".

I think the two cases the comment covers are safe for this caller to
silently return 0.  Another case get_reference() yields NULL is when
oid_object_info() says it is a commit but it turns out that the
object is found by repo_parse_commit() to be a non-commit, isn't it?
I am not sure if it is safe for this caller to just return 0.  There
may be some other "unusual-but-not-fatal" cases where get_reference()
does not hit a die() but returns NULL.

Thanks.
Jonathan Tan Dec. 30, 2019, 6:38 p.m. UTC | #2
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
> >  	if (!object)
> > -		return revs->ignore_missing ? 0 : -1;
> > +		/*
> > +		 * Either this object is missing and ignore_missing is true, or
> > +		 * this object is a (missing) promisor object and
> > +		 * exclude_promisor_objects is true.
> 
> I had to guess and dig where these assertions are coming from; we
> should not force future readers of the code to.
> 
> At least this comment must say why these assertions hold.  Say
> something like "get_reference() yields NULL on only such and such
> cases" before concluding with "and in any of these cases, we can
> safely ignore it because ...".

OK, will do.

> I think the two cases the comment covers are safe for this caller to
> silently return 0.  Another case get_reference() yields NULL is when
> oid_object_info() says it is a commit but it turns out that the
> object is found by repo_parse_commit() to be a non-commit, isn't it?
> I am not sure if it is safe for this caller to just return 0.  There
> may be some other "unusual-but-not-fatal" cases where get_reference()
> does not hit a die() but returns NULL.

I don't think there is any other case where get_reference() yields NULL,
at least where I based my patch (99c33bed56 ("Git 2.25-rc0",
2019-12-25)). Quoting the entire get_reference():

> static struct object *get_reference(struct rev_info *revs, const char *name,
>                                     const struct object_id *oid,
>                                     unsigned int flags)
> {
>         struct object *object;
> 
>         /*
>          * If the repository has commit graphs, repo_parse_commit() avoids
>          * reading the object buffer, so use it whenever possible.
>          */
>         if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
>                 struct commit *c = lookup_commit(revs->repo, oid);
>                 if (!repo_parse_commit(revs->repo, c))
>                         object = (struct object *) c;
>                 else
>                         object = NULL;
>         } else {
>                 object = parse_object(revs->repo, oid);
>         }

No return statements at all prior to this line.

>         if (!object) {
>                 if (revs->ignore_missing)
>                         return object;

Return NULL (the value of object).

>                 if (revs->exclude_promisor_objects && is_promisor_object(oid))
>                         return NULL;

Return NULL.

>                 die("bad object %s", name);

Die (so this function invocation never returns). In conclusion, if
object is NULL at this point in time, get_reference() either returns
NULL or dies.

>         }

Since get_reference() did not return NULL or die, object is non-NULL
here.

>         object->flags |= flags;
>         return object;

Nothing has overridden object since, so we're returning non-NULL here.

> }

So I think get_reference() only returns NULL in those two safe cases.
(Or did I miss something?)
Junio C Hamano Dec. 30, 2019, 8:33 p.m. UTC | #3
Jonathan Tan <jonathantanmy@google.com> writes:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> >  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
>> >  	if (!object)
>> > -		return revs->ignore_missing ? 0 : -1;
>> > +		/*
>> > +		 * Either this object is missing and ignore_missing is true, or
>> > +		 * this object is a (missing) promisor object and
>> > +		 * exclude_promisor_objects is true.
>> 
>> I had to guess and dig where these assertions are coming from; we
>> should not force future readers of the code to.
>> 
>> At least this comment must say why these assertions hold.  Say
>> something like "get_reference() yields NULL on only such and such
>> cases" before concluding with "and in any of these cases, we can
>> safely ignore it because ...".
>
> OK, will do.
>
>> I think the two cases the comment covers are safe for this caller to
>> silently return 0.  Another case get_reference() yields NULL is when
>> oid_object_info() says it is a commit but it turns out that the
>> object is found by repo_parse_commit() to be a non-commit, isn't it?
>> I am not sure if it is safe for this caller to just return 0.  There
>> may be some other "unusual-but-not-fatal" cases where get_reference()
>> does not hit a die() but returns NULL.
>
> I don't think there is any other case where get_reference() yields NULL,
> at least where I based my patch (99c33bed56 ("Git 2.25-rc0",
> 2019-12-25)). Quoting the entire get_reference():
>
>> static struct object *get_reference(struct rev_info *revs, const char *name,
>>                                     const struct object_id *oid,
>>                                     unsigned int flags)
>> {
>>         struct object *object;
>> 
>>         /*
>>          * If the repository has commit graphs, repo_parse_commit() avoids
>>          * reading the object buffer, so use it whenever possible.
>>          */
>>         if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
>>                 struct commit *c = lookup_commit(revs->repo, oid);
>>                 if (!repo_parse_commit(revs->repo, c))
>>                         object = (struct object *) c;
>>                 else
>>                         object = NULL;

This is the case where oid must be COMMIT from oid_object_info()'s
point of view, but repo_parse_commit() finds it as a non-commit, and
object becomes NULL.  This is quite different from the normal lazy
clone case where exclude-promisor-objects etc. wants to cover, that
the object whose name is oid is truly missing because it can be
fetched later from elsewhere.  Instead, we have found that there is
an inconsistency in the data we have about the object, iow, a
possible corruption.

>>         if (!object) {
>>                 if (revs->ignore_missing)
>>                         return object;
>
> Return NULL (the value of object).
>
>>                 if (revs->exclude_promisor_objects && is_promisor_object(oid))
>>                         return NULL;
>
> Return NULL.
>
>>                 die("bad object %s", name);
>
> Die (so this function invocation never returns). In conclusion, if
> object is NULL at this point in time, get_reference() either returns
> NULL or dies.

And when !object, this does not die if

 - ignore-missing is in effect, or
 - exclude-promisor is in effect and this is a promisor object that
   is missing from the local repository.

and instead return NULL.

It just felt funny that the "we found something fishy about the
asked-for object" case is treated the same way for the purpose of
ignore-missing and exclude-promisor.  The asked-for objet is
certainly not missing (i.e. we know more than we want to know about
the object---some part of our database says it is a commit and some
other part says it is not).
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 8136929e23..345615e300 100644
--- a/revision.c
+++ b/revision.c
@@ -1907,7 +1907,15 @@  int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, &oid, flags ^ local_flags);
 	if (!object)
-		return revs->ignore_missing ? 0 : -1;
+		/*
+		 * Either this object is missing and ignore_missing is true, or
+		 * this object is a (missing) promisor object and
+		 * exclude_promisor_objects is true. In any case, we are
+		 * allowed to skip processing of this object; this object will
+		 * not appear in output and cannot be used as a source of
+		 * UNINTERESTING ancestors (since it is missing).
+		 */
+		return 0;
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
 	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	free(oc.path);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a3988bd4b8..fd28f5402a 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -416,15 +416,9 @@  test_expect_success 'rev-list dies for missing objects on cmd line' '
 	git -C repo config extensions.partialclone "arbitrary string" &&
 
 	for OBJ in "$COMMIT" "$TREE" "$BLOB"; do
-		test_must_fail git -C repo rev-list --objects \
+		git -C repo rev-list --objects \
 			--exclude-promisor-objects "$OBJ" &&
-		test_must_fail git -C repo rev-list --objects-edge-aggressive \
-			--exclude-promisor-objects "$OBJ" &&
-
-		# Do not die or crash when --ignore-missing is passed.
-		git -C repo rev-list --ignore-missing --objects \
-			--exclude-promisor-objects "$OBJ" &&
-		git -C repo rev-list --ignore-missing --objects-edge-aggressive \
+		git -C repo rev-list --objects-edge-aggressive \
 			--exclude-promisor-objects "$OBJ"
 	done
 '