Message ID | 20220608210537.185094-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | connected: distinguish local/remote bad objects | expand |
> diff --git a/builtin/fetch.c b/builtin/fetch.c > index ac29c2b1ae..6f43b2bf8d 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -1133,7 +1133,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, > > rm = ref_map; > if (check_connected(iterate_ref_map, &rm, &opt)) { > - rc = error(_("%s did not send all necessary objects\n"), url); > + rc = error(_("connectivity check failed for %s\n"), url); > goto abort; > } > } Clever. I was wondering how you are going to deal with the different exit condition from the rev-list that is only expressed with the two different messages. You _could_ grep for "from local" vs "from remote", but you just let the human user who is staring at the error message to read it, and stop mentioning the details in this message. OK. > diff --git a/connected.c b/connected.c > index ed3025e7a2..ea773f25db 100644 > --- a/connected.c > +++ b/connected.c > @@ -94,6 +94,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data, > strvec_push(&rev_list.args, opt->shallow_file); > } > strvec_push(&rev_list.args,"rev-list"); > + strvec_push(&rev_list.args, "--detailed-bad-object"); > strvec_push(&rev_list.args, "--objects"); > strvec_push(&rev_list.args, "--stdin"); > if (has_promisor_remote()) > diff --git a/revision.c b/revision.c > index 090a967bf4..777e762373 100644 > --- a/revision.c > +++ b/revision.c > @@ -367,6 +367,16 @@ void add_head_to_pending(struct rev_info *revs) > add_pending_object(revs, obj, "HEAD"); > } > > +static void NORETURN bad_object(struct rev_info *revs, const char *name, > + unsigned int flags) > +{ > + if (!revs->detailed_bad_object) > + die("bad object %s", name); > + if (flags & UNINTERESTING) > + die("bad object %s (from local object store)", name); > + die("bad object %s (from remote)", name); > +} If the missing object you found is reachable from existing tips (i.e. local aka UNINTERESTING) and also from what they should have sent (i.e. remote tips), when we discover that the object does not exist locally (but we can have an in-core shell object whose object name is already known because child objects that are closer to the tips than the missing object do exist and point at it), does this new heuristic work reliably? Do we always die and report bad_object() with UNINTERESTING in the the flags variable, or only when we are lucky? IOW, if our current branch pionts at A, while the other side says they are updating it to B, X-----o-----A \ x---B we try to traverse "rev-list ^A B" and make sure everything exists. If we find objects 'o' missing, it is clear that it was something we were supposed to have on the local side even before we started the fetch. But if 'X' is missing, by the time we notice and report a missing object, do we always reliably know that it ought to be reachable from both? Or do we have 50/50 chance that the traversal comes from 'o' earlier than from 'x' (in which case X is found to be missing when we try to see who is the parent of 'o', and flags have UNINTERESTING bit), or later than from 'x' (in which case X is found when trying to see who the parents of 'x' is, and we may not know and we may not bother to find out if X is also a parent of 'o', hence we'd still say 'You do not have X, and we were looking at 'x', which we got from the other end, so they were supposed to have sent it', which would be a misdiagnosis)? Thanks.
Jonathan Tan <jonathantanmy@google.com> writes: > builtin/fetch.c | 2 +- > connected.c | 1 + > revision.c | 16 ++++++++++++-- > revision.h | 3 +++ > t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++ > 5 files changed, 62 insertions(+), 3 deletions(-) This seems to break linux-leaks CI job by making 5518, which was marked in some topic in flight to expect to be leak-free, fail. Because of the way linux-leaks test framework is done, it is not easy to tell if the code changes essential to this topic introduced new leaks, in which case we would want to fix that. Note that this may not the fault of the code changes in this patch. If the tests added by the patch started using git commands that are known to leak (i.e. not ready to be subjected to the "leaks" test) in order to prepare the scenario or to inspect the result, even if the code changes in this topic did not introduce any leak, we can see the same breakage in linux-leaks CI job. An easy way out would be to disable leak-check CI for the entire 5518, but that is not very satisfactory, as the earlier part of that script should still be leak-free. Another way out might be to add these two tests in a new script, which is not marked as not-leaking. After all, what the new topic adds is not about exit status but how that exit status comes about, so it might not be a bad idea even without the CI leak stuff anyway. Ævar, does the internal state used for revision walking count as leaking when it is still held by the time we hit die() in bad_object(), or anything on stack when we die() are still reachable and won't be reported as a failure? Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Jonathan Tan <jonathantanmy@google.com> writes: > > > builtin/fetch.c | 2 +- > > connected.c | 1 + > > revision.c | 16 ++++++++++++-- > > revision.h | 3 +++ > > t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++ > > 5 files changed, 62 insertions(+), 3 deletions(-) > > This seems to break linux-leaks CI job by making 5518, which was > marked in some topic in flight to expect to be leak-free, fail. > > Because of the way linux-leaks test framework is done, it is not > easy to tell if the code changes essential to this topic introduced > new leaks, in which case we would want to fix that. Thanks. I'll recheck this once I make a new version.
Junio C Hamano <gitster@pobox.com> writes: > If the missing object you found is reachable from existing tips > (i.e. local aka UNINTERESTING) and also from what they should have > sent (i.e. remote tips), when we discover that the object does not > exist locally (but we can have an in-core shell object whose object > name is already known because child objects that are closer to the > tips than the missing object do exist and point at it), does this > new heuristic work reliably? > > Do we always die and report bad_object() with UNINTERESTING in the > the flags variable, or only when we are lucky? > > IOW, if our current branch pionts at A, while the other side says > they are updating it to B, > > X-----o-----A > \ > x---B > > we try to traverse "rev-list ^A B" and make sure everything exists. > If we find objects 'o' missing, it is clear that it was something we > were supposed to have on the local side even before we started the > fetch. But if 'X' is missing, by the time we notice and report a > missing object, do we always reliably know that it ought to be > reachable from both? Or do we have 50/50 chance that the traversal > comes from 'o' earlier than from 'x' (in which case X is found to be > missing when we try to see who is the parent of 'o', and flags have > UNINTERESTING bit), or later than from 'x' (in which case X is found > when trying to see who the parents of 'x' is, and we may not know > and we may not bother to find out if X is also a parent of 'o', > hence we'd still say 'You do not have X, and we were looking at 'x', > which we got from the other end, so they were supposed to have sent > it', which would be a misdiagnosis)? > > Thanks. Ah, good catch. I'll take a look into fixing this. (A misdiagnosis would defeat the purpose of this patch, yes.)
On Thu, Jun 09 2022, Junio C Hamano wrote: > Jonathan Tan <jonathantanmy@google.com> writes: > >> builtin/fetch.c | 2 +- >> connected.c | 1 + >> revision.c | 16 ++++++++++++-- >> revision.h | 3 +++ >> t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++ >> 5 files changed, 62 insertions(+), 3 deletions(-) > > This seems to break linux-leaks CI job by making 5518, which was > marked in some topic in flight to expect to be leak-free, fail. > > Because of the way linux-leaks test framework is done, it is not > easy to tell if the code changes essential to this topic introduced > new leaks, in which case we would want to fix that. I think this is just an existing leak that happens to be exposed by a new (in this file) test, i.e. transport_get() leaks via an xmalloc() for transport_helper. > Note that this may not the fault of the code changes in this patch. > If the tests added by the patch started using git commands that are > known to leak (i.e. not ready to be subjected to the "leaks" test) > in order to prepare the scenario or to inspect the result, even if > the code changes in this topic did not introduce any leak, we can > see the same breakage in linux-leaks CI job. An easy way out would > be to disable leak-check CI for the entire 5518, but that is not > very satisfactory, as the earlier part of that script should still > be leak-free. I think doing that would be fine in this case. It will get easier to fix leaks now that "struct rev_info" is out of the way (and I've got a lot of pending patches), but I can always loop back & re-mark this particular test as leak-free at some future date. > Another way out might be to add these two tests in a > new script, which is not marked as not-leaking. After all, what the > new topic adds is not about exit status but how that exit status > comes about, so it might not be a bad idea even without the CI leak > stuff anyway. Yeah, that sounds especially good in this case, as if we can't run httpd we'll print a meaningful "skip" message in that case. See 0a2bfccb9c8 (t0051: use "skip_all" under !MINGW in single-test file, 2022-02-04) > Ævar, does the internal state used for revision walking count as > leaking when it is still held by the time we hit die() in > bad_object(), or anything on stack when we die() are still reachable > and won't be reported as a failure? No, but in this case the variable containing the leaked data isn't in scope by the time we exit, i.e. it was used by fetch_one() which had it malloc'd, but the struct it lived in went away, and now we're exiting from cmd_fetch() etc.
diff --git a/builtin/fetch.c b/builtin/fetch.c index ac29c2b1ae..6f43b2bf8d 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1133,7 +1133,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, rm = ref_map; if (check_connected(iterate_ref_map, &rm, &opt)) { - rc = error(_("%s did not send all necessary objects\n"), url); + rc = error(_("connectivity check failed for %s\n"), url); goto abort; } } diff --git a/connected.c b/connected.c index ed3025e7a2..ea773f25db 100644 --- a/connected.c +++ b/connected.c @@ -94,6 +94,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data, strvec_push(&rev_list.args, opt->shallow_file); } strvec_push(&rev_list.args,"rev-list"); + strvec_push(&rev_list.args, "--detailed-bad-object"); strvec_push(&rev_list.args, "--objects"); strvec_push(&rev_list.args, "--stdin"); if (has_promisor_remote()) diff --git a/revision.c b/revision.c index 090a967bf4..777e762373 100644 --- a/revision.c +++ b/revision.c @@ -367,6 +367,16 @@ void add_head_to_pending(struct rev_info *revs) add_pending_object(revs, obj, "HEAD"); } +static void NORETURN bad_object(struct rev_info *revs, const char *name, + unsigned int flags) +{ + if (!revs->detailed_bad_object) + die("bad object %s", name); + if (flags & UNINTERESTING) + die("bad object %s (from local object store)", name); + die("bad object %s (from remote)", name); +} + static struct object *get_reference(struct rev_info *revs, const char *name, const struct object_id *oid, unsigned int flags) @@ -390,7 +400,7 @@ static struct object *get_reference(struct rev_info *revs, const char *name, return object; if (revs->exclude_promisor_objects && is_promisor_object(oid)) return NULL; - die("bad object %s", name); + bad_object(revs, name, flags); } object->flags |= flags; return object; @@ -426,7 +436,7 @@ static struct commit *handle_commit(struct rev_info *revs, if (revs->exclude_promisor_objects && is_promisor_object(&tag->tagged->oid)) return NULL; - die("bad object %s", oid_to_hex(&tag->tagged->oid)); + bad_object(revs, oid_to_hex(&tag->tagged->oid), flags); } object->flags |= flags; /* @@ -2537,6 +2547,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg if (fetch_if_missing) BUG("exclude_promisor_objects can only be used when fetch_if_missing is 0"); revs->exclude_promisor_objects = 1; + } else if (!strcmp(arg, "--detailed-bad-object")) { + revs->detailed_bad_object = 1; } else { int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix); if (!opts) diff --git a/revision.h b/revision.h index e80c148b19..7f685dd5bb 100644 --- a/revision.h +++ b/revision.h @@ -328,6 +328,9 @@ struct rev_info { /* Location where temporary objects for remerge-diff are written. */ struct tmp_objdir *remerge_objdir; + + /* Error reporting info */ + unsigned detailed_bad_object : 1; }; int ref_excluded(struct string_list *, const char *path); diff --git a/t/t5518-fetch-exit-status.sh b/t/t5518-fetch-exit-status.sh index 5c4ac2556e..f1adac1dd6 100755 --- a/t/t5518-fetch-exit-status.sh +++ b/t/t5518-fetch-exit-status.sh @@ -37,4 +37,47 @@ test_expect_success 'forced update' ' ' +. "$TEST_DIRECTORY"/lib-httpd.sh +start_httpd + +test_expect_success 'connectivity check failure due to missing local object' ' + SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && + test_when_finished "rm -rf \"$SERVER\" client" && + test_create_repo "$SERVER" && + test_commit -C "$SERVER" foo && + + test_create_repo client && + test_commit -C client bar && + + # Simulate missing client objects. + rm -rf client/.git/objects/* && + test_must_fail git -C client fetch $HTTPD_URL/smart/server 2>err && + grep "(from local object store)" err && + ! grep "(from remote)" err && + grep "error: connectivity check failed for" err +' + +test_expect_success 'connectivity check failure due to missing remote object' ' + SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && + test_when_finished "rm -rf \"$SERVER\" client" && + test_create_repo "$SERVER" && + test_commit -C "$SERVER" foo && + git -C "$SERVER" config uploadpack.allowRefInWant true && + SERVER_HEAD=$(git -C "$SERVER" rev-parse HEAD) && + SERVER_FAKE_HEAD=$(echo $SERVER_HEAD | tr "0123456789abcdef" "123456789abcdef0") && + + test_create_repo client && + + # Make the server claim that it has $SERVER_FAKE_HEAD as + # refs/heads/main. The server still sends $SERVER_HEAD in the packfile, + # so the client will see $SERVER_FAKE_HEAD as missing. + echo "s=$SERVER_HEAD refs/heads/main=$SERVER_FAKE_HEAD refs/heads/main= if /wanted-refs/../packfile/" >"$HTTPD_ROOT_PATH/one-time-perl" && + + test_must_fail git -C client fetch $HTTPD_URL/one_time_perl/server refs/heads/main 2>err && + grep "(from remote)" err && + ! grep "(from local object store)" err && + grep "error: connectivity check failed for" err +' + test_done +
When the connectivity check after a fetch fails, an error message "<remote> did not send all necessary objects" is printed. That error message is printed regardless of the reason of failure: in particular, that message may be printed if the connectivity check fails because a local object is missing. (The connectivity check reads local objects too because it compares the set of objects that the remote claims to send against the set of objects that our refs directly or indirectly reference.) The connectivity check passes, to "git rev-list", remote objects directly and local objects through "--not". And internally, the latter are marked with the UNINTERESTING flag. When reading a commit during the commit walk, we know whether the commit came from an UNINTERESTING commit or not. Therefore, use this flag to produce a clearer error message when a bad object is read. This necessitates changes in revision.c which is used by components other than the connectivity check and may have different meanings for objects passe with and without "--not", so guard the extra diagnostics behind a CLI argument. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- We noticed, at $DAYJOB, some of these messages that were likely caused by missing objects in the local repository instead, so here is a patch that will make it easier to diagnose such issues. --- builtin/fetch.c | 2 +- connected.c | 1 + revision.c | 16 ++++++++++++-- revision.h | 3 +++ t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++ 5 files changed, 62 insertions(+), 3 deletions(-)