Message ID | 20191228003430.241283-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | revision: allow missing promisor objects on CLI | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > object = get_reference(revs, arg, &oid, flags ^ local_flags); > if (!object) > - return revs->ignore_missing ? 0 : -1; > + /* > + * Either this object is missing and ignore_missing is true, or > + * this object is a (missing) promisor object and > + * exclude_promisor_objects is true. I had to guess and dig where these assertions are coming from; we should not force future readers of the code to. At least this comment must say why these assertions hold. Say something like "get_reference() yields NULL on only such and such cases" before concluding with "and in any of these cases, we can safely ignore it because ...". I think the two cases the comment covers are safe for this caller to silently return 0. Another case get_reference() yields NULL is when oid_object_info() says it is a commit but it turns out that the object is found by repo_parse_commit() to be a non-commit, isn't it? I am not sure if it is safe for this caller to just return 0. There may be some other "unusual-but-not-fatal" cases where get_reference() does not hit a die() but returns NULL. Thanks.
> Jonathan Tan <jonathantanmy@google.com> writes: > > > object = get_reference(revs, arg, &oid, flags ^ local_flags); > > if (!object) > > - return revs->ignore_missing ? 0 : -1; > > + /* > > + * Either this object is missing and ignore_missing is true, or > > + * this object is a (missing) promisor object and > > + * exclude_promisor_objects is true. > > I had to guess and dig where these assertions are coming from; we > should not force future readers of the code to. > > At least this comment must say why these assertions hold. Say > something like "get_reference() yields NULL on only such and such > cases" before concluding with "and in any of these cases, we can > safely ignore it because ...". OK, will do. > I think the two cases the comment covers are safe for this caller to > silently return 0. Another case get_reference() yields NULL is when > oid_object_info() says it is a commit but it turns out that the > object is found by repo_parse_commit() to be a non-commit, isn't it? > I am not sure if it is safe for this caller to just return 0. There > may be some other "unusual-but-not-fatal" cases where get_reference() > does not hit a die() but returns NULL. I don't think there is any other case where get_reference() yields NULL, at least where I based my patch (99c33bed56 ("Git 2.25-rc0", 2019-12-25)). Quoting the entire get_reference(): > static struct object *get_reference(struct rev_info *revs, const char *name, > const struct object_id *oid, > unsigned int flags) > { > struct object *object; > > /* > * If the repository has commit graphs, repo_parse_commit() avoids > * reading the object buffer, so use it whenever possible. > */ > if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) { > struct commit *c = lookup_commit(revs->repo, oid); > if (!repo_parse_commit(revs->repo, c)) > object = (struct object *) c; > else > object = NULL; > } else { > object = parse_object(revs->repo, oid); > } No return statements at all prior to this line. > if (!object) { > if (revs->ignore_missing) > return object; Return NULL (the value of object). > if (revs->exclude_promisor_objects && is_promisor_object(oid)) > return NULL; Return NULL. > die("bad object %s", name); Die (so this function invocation never returns). In conclusion, if object is NULL at this point in time, get_reference() either returns NULL or dies. > } Since get_reference() did not return NULL or die, object is non-NULL here. > object->flags |= flags; > return object; Nothing has overridden object since, so we're returning non-NULL here. > } So I think get_reference() only returns NULL in those two safe cases. (Or did I miss something?)
Jonathan Tan <jonathantanmy@google.com> writes: >> Jonathan Tan <jonathantanmy@google.com> writes: >> >> > object = get_reference(revs, arg, &oid, flags ^ local_flags); >> > if (!object) >> > - return revs->ignore_missing ? 0 : -1; >> > + /* >> > + * Either this object is missing and ignore_missing is true, or >> > + * this object is a (missing) promisor object and >> > + * exclude_promisor_objects is true. >> >> I had to guess and dig where these assertions are coming from; we >> should not force future readers of the code to. >> >> At least this comment must say why these assertions hold. Say >> something like "get_reference() yields NULL on only such and such >> cases" before concluding with "and in any of these cases, we can >> safely ignore it because ...". > > OK, will do. > >> I think the two cases the comment covers are safe for this caller to >> silently return 0. Another case get_reference() yields NULL is when >> oid_object_info() says it is a commit but it turns out that the >> object is found by repo_parse_commit() to be a non-commit, isn't it? >> I am not sure if it is safe for this caller to just return 0. There >> may be some other "unusual-but-not-fatal" cases where get_reference() >> does not hit a die() but returns NULL. > > I don't think there is any other case where get_reference() yields NULL, > at least where I based my patch (99c33bed56 ("Git 2.25-rc0", > 2019-12-25)). Quoting the entire get_reference(): > >> static struct object *get_reference(struct rev_info *revs, const char *name, >> const struct object_id *oid, >> unsigned int flags) >> { >> struct object *object; >> >> /* >> * If the repository has commit graphs, repo_parse_commit() avoids >> * reading the object buffer, so use it whenever possible. >> */ >> if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) { >> struct commit *c = lookup_commit(revs->repo, oid); >> if (!repo_parse_commit(revs->repo, c)) >> object = (struct object *) c; >> else >> object = NULL; This is the case where oid must be COMMIT from oid_object_info()'s point of view, but repo_parse_commit() finds it as a non-commit, and object becomes NULL. This is quite different from the normal lazy clone case where exclude-promisor-objects etc. wants to cover, that the object whose name is oid is truly missing because it can be fetched later from elsewhere. Instead, we have found that there is an inconsistency in the data we have about the object, iow, a possible corruption. >> if (!object) { >> if (revs->ignore_missing) >> return object; > > Return NULL (the value of object). > >> if (revs->exclude_promisor_objects && is_promisor_object(oid)) >> return NULL; > > Return NULL. > >> die("bad object %s", name); > > Die (so this function invocation never returns). In conclusion, if > object is NULL at this point in time, get_reference() either returns > NULL or dies. And when !object, this does not die if - ignore-missing is in effect, or - exclude-promisor is in effect and this is a promisor object that is missing from the local repository. and instead return NULL. It just felt funny that the "we found something fishy about the asked-for object" case is treated the same way for the purpose of ignore-missing and exclude-promisor. The asked-for objet is certainly not missing (i.e. we know more than we want to know about the object---some part of our database says it is a commit and some other part says it is not).
diff --git a/revision.c b/revision.c index 8136929e23..345615e300 100644 --- a/revision.c +++ b/revision.c @@ -1907,7 +1907,15 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi verify_non_filename(revs->prefix, arg); object = get_reference(revs, arg, &oid, flags ^ local_flags); if (!object) - return revs->ignore_missing ? 0 : -1; + /* + * Either this object is missing and ignore_missing is true, or + * this object is a (missing) promisor object and + * exclude_promisor_objects is true. In any case, we are + * allowed to skip processing of this object; this object will + * not appear in output and cannot be used as a source of + * UNINTERESTING ancestors (since it is missing). + */ + return 0; add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags); add_pending_object_with_path(revs, object, arg, oc.mode, oc.path); free(oc.path); diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index a3988bd4b8..fd28f5402a 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -416,15 +416,9 @@ test_expect_success 'rev-list dies for missing objects on cmd line' ' git -C repo config extensions.partialclone "arbitrary string" && for OBJ in "$COMMIT" "$TREE" "$BLOB"; do - test_must_fail git -C repo rev-list --objects \ + git -C repo rev-list --objects \ --exclude-promisor-objects "$OBJ" && - test_must_fail git -C repo rev-list --objects-edge-aggressive \ - --exclude-promisor-objects "$OBJ" && - - # Do not die or crash when --ignore-missing is passed. - git -C repo rev-list --ignore-missing --objects \ - --exclude-promisor-objects "$OBJ" && - git -C repo rev-list --ignore-missing --objects-edge-aggressive \ + git -C repo rev-list --objects-edge-aggressive \ --exclude-promisor-objects "$OBJ" done '
Commit 4cf67869b2 ("list-objects.c: don't segfault for missing cmdline objects", 2018-12-06) prevented some segmentation faults from occurring by tightening handling of missing objects provided through the CLI: if --ignore-missing is set, then it is OK (and the missing object ignored, just like one would if encountered in traversal). However, in the case that --ignore-missing is not set but --exclude-promisor-objects is set, there is still no distinction between the case wherein the missing object is a promisor object and the case wherein it is not. This is unnecessarily restrictive, since if a missing promisor object is encountered in traversal, it is ignored; likewise it should be ignored if provided through the CLI. Therefore, distinguish between these 2 cases. (As a bonus, the code is now simpler.) (Note that this only affects handling of missing promisor objects. Handling of non-missing promisor objects is already done by setting all of them to UNINTERESTING in prepare_revision_walk().) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- For those curious, I discovered this while trying to extend the check-targets-only optimization done in dfa33a298d ("clone: do faster object check for partial clones", 2019-04-21) to fetches as well. While investigating my previous work of adding check-target functionality in addition to the usual connectivity check (for correctness, not for performance) in 35f9e3e5e7 ("fetch: in partial clone, check presence of targets", 2018-09-21), I discovered that at current master, the check was somehow no longer needed because rev-list dies on missing objects on CLI. But I don't think that the current behavior is obvious, hence this commit (which also restores the need for the check-target functionality). --- revision.c | 10 +++++++++- t/t0410-partial-clone.sh | 10 ++-------- 2 files changed, 11 insertions(+), 9 deletions(-)