[v2] fetch,fetch-pack: clarify connectivity check error

Message ID	20220610195247.1177549-1-jonathantanmy@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Fri, 10 Jun 2022 12:52:47 -0700 In-Reply-To: <20220608210537.185094-1-jonathantanmy@google.com> Message-Id: <20220610195247.1177549-1-jonathantanmy@google.com> Mime-Version: 1.0 References: <20220608210537.185094-1-jonathantanmy@google.com> Subject: [PATCH v2] fetch,fetch-pack: clarify connectivity check error From: Jonathan Tan <jonathantanmy@google.com> To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com>, gitster@pobox.com, avarab@gmail.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	[v2] fetch,fetch-pack: clarify connectivity check error \| expand [v2] fetch,fetch-pack: clarify connectivity check error

Message ID

20220610195247.1177549-1-jonathantanmy@google.com (mailing list archive)

State

New, archived

Headers

Date: Fri, 10 Jun 2022 12:52:47 -0700
In-Reply-To: <20220608210537.185094-1-jonathantanmy@google.com>
Message-Id: <20220610195247.1177549-1-jonathantanmy@google.com>
Mime-Version: 1.0
References: <20220608210537.185094-1-jonathantanmy@google.com>
Subject: [PATCH v2] fetch,fetch-pack: clarify connectivity check error
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>, gitster@pobox.com,
        avarab@gmail.com
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

Series

[v2] fetch,fetch-pack: clarify connectivity check error | expand

Commit Message

Jonathan Tan June 10, 2022, 7:52 p.m. UTC

When the connectivity check after a fetch fails, an error message
"<remote> did not send all necessary objects" is printed. That error
message is printed regardless of the reason of failure: in particular,
that message may be printed if the connectivity check fails because a
local object is missing. (The connectivity check reads local objects too
because it compares the set of objects that the remote claims to send
against the set of objects that our refs directly or indirectly
reference.)

Since it is not necessarily true that the remote did not send all
necessary objects, update the error message to be something that more
correctly reflects the situation.

The updated error message is admittedly vague and one alternative
solution would be to update the revision walking code to be able to more
precisely specify if a bad object is supposed to be available locally or
supposed to have been sent by the remote. That would likely require, in
the revision walking code, to delay parsing any object until all its
children has been parsed, which seems to be a significant undertaking.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
As Junio said in [1], my original v1 code doesn't work when an object is
referenced both by a local object and a remote one. I tried looking into
making it work but it looks difficult.

So here is a patch that just changes the error message to a vaguer but
hopefully more correct one. I wonder what the community thinks of it -
I think it is more correct (and means that we do not need to say "it's
not the remote fault despite what the error message says") but at the
same time, many people are already used to this message (and a search
online reveals several web sites that talk about this).

[1] https://lore.kernel.org/git/xmqqh74upyz3.fsf@gitster.g/
---
 builtin/fetch.c | 2 +-
 fetch-pack.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Junio C Hamano June 10, 2022, 8:25 p.m. UTC | #1

Jonathan Tan <jonathantanmy@google.com> writes:

> When the connectivity check after a fetch fails, an error message
> "<remote> did not send all necessary objects" is printed. That error
> message is printed regardless of the reason of failure: in particular,
> that message may be printed if the connectivity check fails because a
> local object is missing. (The connectivity check reads local objects too
> because it compares the set of objects that the remote claims to send
> against the set of objects that our refs directly or indirectly
> reference.)
>
> Since it is not necessarily true that the remote did not send all
> necessary objects, update the error message to be something that more
> correctly reflects the situation.
>
> The updated error message is admittedly vague and one alternative
> solution would be to update the revision walking code to be able to more
> precisely specify if a bad object is supposed to be available locally or
> supposed to have been sent by the remote. That would likely require, in
> the revision walking code, to delay parsing any object until all its
> children has been parsed, which seems to be a significant undertaking.
>

While that is all true, I do not think the posted patch as-is is a
good idea.  After all, object missing on our side locally before you
start "git fetch" is a local repository corruption, which ought to
be a much rarer event than a still in development and in flux server
end sending nonsense packfiles, no?

At least, "they didn't do X" would give the user somewhere to start
investigation (e.g. complain to the server folks about the error,
stating where they started from and what they tried to fetch).  The
new message may be playing it "safe" by not saying anything you are
not absolutely sure about, but that is much less useful to the users
who got the message.

Now, when check_connected() fails early in store_updated_refs(), we
haven't updated the remote-tracking refs to point at the tips we
obtained, have we?

One thing we could do is to run "git fsck" to see if the local
history is broken.  We may have added unreferenced objects that have
come from the remote end, but having dangling or unreachable objects
is not an error, while it is a sure sign of repository corruption if
there are missing objects.  Because we haven't update any of the
refs (local or remote-tracking), we still should be able to see if
the problem is on our end at this point

If "o" or "X" (in the illustrated history in my review of your v1)
is found to be missing by "git fsck" (or "git fsck A"), then you
know that the server end is not the culprit (or at least may not be
the sole culprit) for this connectivity check failure.  If "git
fsck" reports no missing objects, then what was found to be missing
is not "X" but is "x" or "B", and we are more certain that the
remote end is more likely to be problematic.

As it is on the error codepath, we could even automatically run fsck
to figure out if we have objects missing after giving an initial
diagnosis, but at least it would be easy to give an advice at this
point in the code to run fsck after seeing a failure from this
fetch.

As you have two places that show the error, perhaps the first step
is to introduce a helper function:

	int diagnose_check_connected_error(const char *url)
	{
		return error(_("%s may not have sent necessary objects"),
                		url ? url : "remote");
	}

and call it instead of calling error() in the places you touched in
v2.

As a follow up step, then, we could do something like

	int diagnose_check_connected_error(const char *url)
	{
+		static const char advice[] =
+			N_("run 'git fsck' to see if your repository "
+			"is missing objects; if it passes, the problem is "
+			"on the remote end");
+
+		advise_if_enabled(ADVICE_CONNECTIVETY_FAILURE, advice);
		return error(_("%s may not have sent necessary	objects"),
                		url ? url : "remote");
	}

or if we are more ambitious, actually run "git fsck" via the run_command() 
API for the user.

Jonathan Tan June 17, 2022, 8:03 p.m. UTC | #2

Junio C Hamano <gitster@pobox.com> writes:
> While that is all true, I do not think the posted patch as-is is a
> good idea.  After all, object missing on our side locally before you
> start "git fetch" is a local repository corruption, which ought to
> be a much rarer event than a still in development and in flux server
> end sending nonsense packfiles, no?
> 
> At least, "they didn't do X" would give the user somewhere to start
> investigation (e.g. complain to the server folks about the error,
> stating where they started from and what they tried to fetch).  The
> new message may be playing it "safe" by not saying anything you are
> not absolutely sure about, but that is much less useful to the users
> who got the message.

Hmm...that's true. Let me try to understand the revision walking code a
bit more, and if by then I still don't end up changing the revision walk
to avoid prematurely parsing a parent, I'll make a patch with your
suggestion.

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ac29c2b1ae..15737ca293 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1133,7 +1133,7 @@  static int store_updated_refs(const char *raw_url, const char *remote_name,
 
 		rm = ref_map;
 		if (check_connected(iterate_ref_map, &rm, &opt)) {
-			rc = error(_("%s did not send all necessary objects\n"), url);
+			rc = error(_("connectivity check failed after fetching from %s\n"), url);
 			goto abort;
 		}
 	}
diff --git a/fetch-pack.c b/fetch-pack.c
index cb6647d657..91269008cc 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -2021,7 +2021,7 @@  struct ref *fetch_pack(struct fetch_pack_args *args,
 		if (args->deepen)
 			opt.is_deepening_fetch = 1;
 		if (check_connected(iterate_ref_map, &iterator, &opt)) {
-			error(_("remote did not send all necessary objects"));
+			error(_("connectivity check failed after fetching from remote"));
 			free_refs(ref_cpy);
 			ref_cpy = NULL;
 			rollback_shallow_file(the_repository, &shallow_lock);

[v2] fetch,fetch-pack: clarify connectivity check error

Commit Message

Comments

Patch