diff mbox series

connected: distinguish local/remote bad objects

Message ID 20220608210537.185094-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series connected: distinguish local/remote bad objects | expand

Commit Message

Jonathan Tan June 8, 2022, 9:05 p.m. UTC
When the connectivity check after a fetch fails, an error message
"<remote> did not send all necessary objects" is printed. That error
message is printed regardless of the reason of failure: in particular,
that message may be printed if the connectivity check fails because a
local object is missing. (The connectivity check reads local objects too
because it compares the set of objects that the remote claims to send
against the set of objects that our refs directly or indirectly
reference.)

The connectivity check passes, to "git rev-list", remote objects
directly and local objects through "--not". And internally, the latter
are marked with the UNINTERESTING flag. When reading a commit during the
commit walk, we know whether the commit came from an UNINTERESTING
commit or not. Therefore, use this flag to produce a clearer error
message when a bad object is read.

This necessitates changes in revision.c which is used by components
other than the connectivity check and may have different meanings for
objects passe with and without "--not", so guard the extra diagnostics
behind a CLI argument.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
We noticed, at $DAYJOB, some of these messages that were likely caused
by missing objects in the local repository instead, so here is a patch
that will make it easier to diagnose such issues.
---
 builtin/fetch.c              |  2 +-
 connected.c                  |  1 +
 revision.c                   | 16 ++++++++++++--
 revision.h                   |  3 +++
 t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++
 5 files changed, 62 insertions(+), 3 deletions(-)

Comments

Junio C Hamano June 8, 2022, 10:33 p.m. UTC | #1
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index ac29c2b1ae..6f43b2bf8d 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1133,7 +1133,7 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
>  
>  		rm = ref_map;
>  		if (check_connected(iterate_ref_map, &rm, &opt)) {
> -			rc = error(_("%s did not send all necessary objects\n"), url);
> +			rc = error(_("connectivity check failed for %s\n"), url);
>  			goto abort;
>  		}
>  	}

Clever.

I was wondering how you are going to deal with the different exit
condition from the rev-list that is only expressed with the two
different messages.  You _could_ grep for "from local" vs "from
remote", but you just let the human user who is staring at the error
message to read it, and stop mentioning the details in this message.

OK.

> diff --git a/connected.c b/connected.c
> index ed3025e7a2..ea773f25db 100644
> --- a/connected.c
> +++ b/connected.c
> @@ -94,6 +94,7 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
>  		strvec_push(&rev_list.args, opt->shallow_file);
>  	}
>  	strvec_push(&rev_list.args,"rev-list");
> +	strvec_push(&rev_list.args, "--detailed-bad-object");
>  	strvec_push(&rev_list.args, "--objects");
>  	strvec_push(&rev_list.args, "--stdin");
>  	if (has_promisor_remote())
> diff --git a/revision.c b/revision.c
> index 090a967bf4..777e762373 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -367,6 +367,16 @@ void add_head_to_pending(struct rev_info *revs)
>  	add_pending_object(revs, obj, "HEAD");
>  }
>  
> +static void NORETURN bad_object(struct rev_info *revs, const char *name,
> +				unsigned int flags)
> +{
> +	if (!revs->detailed_bad_object)
> +		die("bad object %s", name);
> +	if (flags & UNINTERESTING)
> +		die("bad object %s (from local object store)", name);
> +	die("bad object %s (from remote)", name);
> +}

If the missing object you found is reachable from existing tips
(i.e. local aka UNINTERESTING) and also from what they should have
sent (i.e. remote tips), when we discover that the object does not
exist locally (but we can have an in-core shell object whose object
name is already known because child objects that are closer to the
tips than the missing object do exist and point at it), does this
new heuristic work reliably?  

Do we always die and report bad_object() with UNINTERESTING in the
the flags variable, or only when we are lucky?

IOW, if our current branch pionts at A, while the other side says
they are updating it to B,

    X-----o-----A
     \
      x---B

we try to traverse "rev-list ^A B" and make sure everything exists.
If we find objects 'o' missing, it is clear that it was something we
were supposed to have on the local side even before we started the
fetch.  But if 'X' is missing, by the time we notice and report a
missing object, do we always reliably know that it ought to be
reachable from both?  Or do we have 50/50 chance that the traversal
comes from 'o' earlier than from 'x' (in which case X is found to be
missing when we try to see who is the parent of 'o', and flags have
UNINTERESTING bit), or later than from 'x' (in which case X is found
when trying to see who the parents of 'x' is, and we may not know
and we may not bother to find out if X is also a parent of 'o',
hence we'd still say 'You do not have X, and we were looking at 'x',
which we got from the other end, so they were supposed to have sent
it', which would be a misdiagnosis)?

Thanks.
Junio C Hamano June 9, 2022, 4:55 p.m. UTC | #2
Jonathan Tan <jonathantanmy@google.com> writes:

>  builtin/fetch.c              |  2 +-
>  connected.c                  |  1 +
>  revision.c                   | 16 ++++++++++++--
>  revision.h                   |  3 +++
>  t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 62 insertions(+), 3 deletions(-)

This seems to break linux-leaks CI job by making 5518, which was
marked in some topic in flight to expect to be leak-free, fail.

Because of the way linux-leaks test framework is done, it is not
easy to tell if the code changes essential to this topic introduced
new leaks, in which case we would want to fix that.

Note that this may not the fault of the code changes in this patch.
If the tests added by the patch started using git commands that are
known to leak (i.e. not ready to be subjected to the "leaks" test)
in order to prepare the scenario or to inspect the result, even if
the code changes in this topic did not introduce any leak, we can
see the same breakage in linux-leaks CI job.  An easy way out would
be to disable leak-check CI for the entire 5518, but that is not
very satisfactory, as the earlier part of that script should still
be leak-free.  Another way out might be to add these two tests in a
new script, which is not marked as not-leaking.  After all, what the
new topic adds is not about exit status but how that exit status
comes about, so it might not be a bad idea even without the CI leak
stuff anyway.

Ævar, does the internal state used for revision walking count as
leaking when it is still held by the time we hit die() in
bad_object(), or anything on stack when we die() are still reachable
and won't be reported as a failure?

Thanks.
Jonathan Tan June 9, 2022, 5:17 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> >  builtin/fetch.c              |  2 +-
> >  connected.c                  |  1 +
> >  revision.c                   | 16 ++++++++++++--
> >  revision.h                   |  3 +++
> >  t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++
> >  5 files changed, 62 insertions(+), 3 deletions(-)
> 
> This seems to break linux-leaks CI job by making 5518, which was
> marked in some topic in flight to expect to be leak-free, fail.
> 
> Because of the way linux-leaks test framework is done, it is not
> easy to tell if the code changes essential to this topic introduced
> new leaks, in which case we would want to fix that.

Thanks. I'll recheck this once I make a new version.
Jonathan Tan June 9, 2022, 5:17 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:
> If the missing object you found is reachable from existing tips
> (i.e. local aka UNINTERESTING) and also from what they should have
> sent (i.e. remote tips), when we discover that the object does not
> exist locally (but we can have an in-core shell object whose object
> name is already known because child objects that are closer to the
> tips than the missing object do exist and point at it), does this
> new heuristic work reliably?  
> 
> Do we always die and report bad_object() with UNINTERESTING in the
> the flags variable, or only when we are lucky?
> 
> IOW, if our current branch pionts at A, while the other side says
> they are updating it to B,
> 
>     X-----o-----A
>      \
>       x---B
> 
> we try to traverse "rev-list ^A B" and make sure everything exists.
> If we find objects 'o' missing, it is clear that it was something we
> were supposed to have on the local side even before we started the
> fetch.  But if 'X' is missing, by the time we notice and report a
> missing object, do we always reliably know that it ought to be
> reachable from both?  Or do we have 50/50 chance that the traversal
> comes from 'o' earlier than from 'x' (in which case X is found to be
> missing when we try to see who is the parent of 'o', and flags have
> UNINTERESTING bit), or later than from 'x' (in which case X is found
> when trying to see who the parents of 'x' is, and we may not know
> and we may not bother to find out if X is also a parent of 'o',
> hence we'd still say 'You do not have X, and we were looking at 'x',
> which we got from the other end, so they were supposed to have sent
> it', which would be a misdiagnosis)?
> 
> Thanks.

Ah, good catch. I'll take a look into fixing this. (A misdiagnosis would
defeat the purpose of this patch, yes.)
Ævar Arnfjörð Bjarmason June 9, 2022, 6 p.m. UTC | #5
On Thu, Jun 09 2022, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>>  builtin/fetch.c              |  2 +-
>>  connected.c                  |  1 +
>>  revision.c                   | 16 ++++++++++++--
>>  revision.h                   |  3 +++
>>  t/t5518-fetch-exit-status.sh | 43 ++++++++++++++++++++++++++++++++++++
>>  5 files changed, 62 insertions(+), 3 deletions(-)
>
> This seems to break linux-leaks CI job by making 5518, which was
> marked in some topic in flight to expect to be leak-free, fail.
>
> Because of the way linux-leaks test framework is done, it is not
> easy to tell if the code changes essential to this topic introduced
> new leaks, in which case we would want to fix that.

I think this is just an existing leak that happens to be exposed by a
new (in this file) test, i.e. transport_get() leaks via an xmalloc() for
transport_helper.

> Note that this may not the fault of the code changes in this patch.
> If the tests added by the patch started using git commands that are
> known to leak (i.e. not ready to be subjected to the "leaks" test)
> in order to prepare the scenario or to inspect the result, even if
> the code changes in this topic did not introduce any leak, we can
> see the same breakage in linux-leaks CI job.  An easy way out would
> be to disable leak-check CI for the entire 5518, but that is not
> very satisfactory, as the earlier part of that script should still
> be leak-free.

I think doing that would be fine in this case. It will get easier to fix
leaks now that "struct rev_info" is out of the way (and I've got a lot
of pending patches), but I can always loop back & re-mark this
particular test as leak-free at some future date.

> Another way out might be to add these two tests in a
> new script, which is not marked as not-leaking.  After all, what the
> new topic adds is not about exit status but how that exit status
> comes about, so it might not be a bad idea even without the CI leak
> stuff anyway.

Yeah, that sounds especially good in this case, as if we can't run httpd
we'll print a meaningful "skip" message in that case. See 0a2bfccb9c8
(t0051: use "skip_all" under !MINGW in single-test file, 2022-02-04)

> Ævar, does the internal state used for revision walking count as
> leaking when it is still held by the time we hit die() in
> bad_object(), or anything on stack when we die() are still reachable
> and won't be reported as a failure?

No, but in this case the variable containing the leaked data isn't in
scope by the time we exit, i.e. it was used by fetch_one() which had it
malloc'd, but the struct it lived in went away, and now we're exiting
from cmd_fetch() etc.
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ac29c2b1ae..6f43b2bf8d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1133,7 +1133,7 @@  static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 		rm = ref_map;
 		if (check_connected(iterate_ref_map, &rm, &opt)) {
-			rc = error(_("%s did not send all necessary objects\n"), url);
+			rc = error(_("connectivity check failed for %s\n"), url);
 			goto abort;
 		}
 	}
diff --git a/connected.c b/connected.c
index ed3025e7a2..ea773f25db 100644
--- a/connected.c
+++ b/connected.c
@@ -94,6 +94,7 @@  int check_connected(oid_iterate_fn fn, void *cb_data,
 		strvec_push(&rev_list.args, opt->shallow_file);
 	}
 	strvec_push(&rev_list.args,"rev-list");
+	strvec_push(&rev_list.args, "--detailed-bad-object");
 	strvec_push(&rev_list.args, "--objects");
 	strvec_push(&rev_list.args, "--stdin");
 	if (has_promisor_remote())
diff --git a/revision.c b/revision.c
index 090a967bf4..777e762373 100644
--- a/revision.c
+++ b/revision.c
@@ -367,6 +367,16 @@  void add_head_to_pending(struct rev_info *revs)
 	add_pending_object(revs, obj, "HEAD");
 }
 
+static void NORETURN bad_object(struct rev_info *revs, const char *name,
+				unsigned int flags)
+{
+	if (!revs->detailed_bad_object)
+		die("bad object %s", name);
+	if (flags & UNINTERESTING)
+		die("bad object %s (from local object store)", name);
+	die("bad object %s (from remote)", name);
+}
+
 static struct object *get_reference(struct rev_info *revs, const char *name,
 				    const struct object_id *oid,
 				    unsigned int flags)
@@ -390,7 +400,7 @@  static struct object *get_reference(struct rev_info *revs, const char *name,
 			return object;
 		if (revs->exclude_promisor_objects && is_promisor_object(oid))
 			return NULL;
-		die("bad object %s", name);
+		bad_object(revs, name, flags);
 	}
 	object->flags |= flags;
 	return object;
@@ -426,7 +436,7 @@  static struct commit *handle_commit(struct rev_info *revs,
 			if (revs->exclude_promisor_objects &&
 			    is_promisor_object(&tag->tagged->oid))
 				return NULL;
-			die("bad object %s", oid_to_hex(&tag->tagged->oid));
+			bad_object(revs, oid_to_hex(&tag->tagged->oid), flags);
 		}
 		object->flags |= flags;
 		/*
@@ -2537,6 +2547,8 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		if (fetch_if_missing)
 			BUG("exclude_promisor_objects can only be used when fetch_if_missing is 0");
 		revs->exclude_promisor_objects = 1;
+	} else if (!strcmp(arg, "--detailed-bad-object")) {
+		revs->detailed_bad_object = 1;
 	} else {
 		int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix);
 		if (!opts)
diff --git a/revision.h b/revision.h
index e80c148b19..7f685dd5bb 100644
--- a/revision.h
+++ b/revision.h
@@ -328,6 +328,9 @@  struct rev_info {
 
 	/* Location where temporary objects for remerge-diff are written. */
 	struct tmp_objdir *remerge_objdir;
+
+	/* Error reporting info */
+	unsigned detailed_bad_object : 1;
 };
 
 int ref_excluded(struct string_list *, const char *path);
diff --git a/t/t5518-fetch-exit-status.sh b/t/t5518-fetch-exit-status.sh
index 5c4ac2556e..f1adac1dd6 100755
--- a/t/t5518-fetch-exit-status.sh
+++ b/t/t5518-fetch-exit-status.sh
@@ -37,4 +37,47 @@  test_expect_success 'forced update' '
 
 '
 
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'connectivity check failure due to missing local object' '
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	test_when_finished "rm -rf \"$SERVER\" client" &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+
+	test_create_repo client &&
+	test_commit -C client bar &&
+
+	# Simulate missing client objects.
+	rm -rf client/.git/objects/* &&
+	test_must_fail git -C client fetch $HTTPD_URL/smart/server 2>err &&
+	grep "(from local object store)" err &&
+	! grep "(from remote)" err &&
+	grep "error: connectivity check failed for" err
+'
+
+test_expect_success 'connectivity check failure due to missing remote object' '
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	test_when_finished "rm -rf \"$SERVER\" client" &&
+	test_create_repo "$SERVER" &&
+	test_commit -C "$SERVER" foo &&
+	git -C "$SERVER" config uploadpack.allowRefInWant true &&
+	SERVER_HEAD=$(git -C "$SERVER" rev-parse HEAD) &&
+	SERVER_FAKE_HEAD=$(echo $SERVER_HEAD | tr "0123456789abcdef" "123456789abcdef0") &&
+
+	test_create_repo client &&
+
+	# Make the server claim that it has $SERVER_FAKE_HEAD as
+	# refs/heads/main. The server still sends $SERVER_HEAD in the packfile,
+	# so the client will see $SERVER_FAKE_HEAD as missing.
+	echo "s=$SERVER_HEAD refs/heads/main=$SERVER_FAKE_HEAD refs/heads/main= if /wanted-refs/../packfile/" >"$HTTPD_ROOT_PATH/one-time-perl" &&
+
+	test_must_fail git -C client fetch $HTTPD_URL/one_time_perl/server refs/heads/main 2>err &&
+	grep "(from remote)" err &&
+	! grep "(from local object store)" err &&
+	grep "error: connectivity check failed for" err
+'
+
 test_done
+