diff mbox series

[1/2] http: advertise capabilities when cloning empty repos

Message ID 20230426205324.326501-2-sandals@crustytoothpaste.net (mailing list archive)
State Superseded
Headers show
Series Fix empty SHA-256 clones with v0 and v1 | expand

Commit Message

brian m. carlson April 26, 2023, 8:53 p.m. UTC
From: "brian m. carlson" <bk2204@github.com>

When cloning an empty repository, the HTTP protocol version 0 currently
offers nothing but the header and flush packets for the /info/refs
endpoint. This means that no capabilities are provided, so the client
side doesn't know what capabilities are present.

However, this does pose a problem when working with SHA-256
repositories, since we use the capabilities to know the remote side's
object format (hash algorithm).  It used to be possible to set the
correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS
testsuite did), but this no longer works as of 8b214c2e9d ("clone:
propagate object-format when cloning from void", 2023-04-05), since
there we always read the hash algorithm from the remote.  If there is no
hash algorithm provided, we default to SHA-1 for backwards
compatibility.

Fortunately, the push version of the protocol already indicates a clue
for how to solve this.  When the /info/refs endpoint is accessed for a
push and the remote is empty, we include a dummy "capabilities^{}" ref
pointing to the all-zeros object ID.  The protocol documentation already
indicates this should _always_ be sent, even for fetches and clones, so
let's just do that, which means we'll properly announce the hash
algorithm as part of the capabilities.  This just works with the
existing code because we share the same ref code for fetches and clones,
and libgit2 does as well.

Signed-off-by: brian m. carlson <bk2204@github.com>
---
 t/t5551-http-fetch-smart.sh | 27 +++++++++++++++++++++++++++
 t/t5700-protocol-v1.sh      | 21 +++++++++++++++++++--
 upload-pack.c               |  4 ++++
 3 files changed, 50 insertions(+), 2 deletions(-)

Comments

Junio C Hamano April 26, 2023, 9:14 p.m. UTC | #1
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> From: "brian m. carlson" <bk2204@github.com>
>
> When cloning an empty repository, the HTTP protocol version 0 currently
> offers nothing but the header and flush packets for the /info/refs
> endpoint. This means that no capabilities are provided, so the client
> side doesn't know what capabilities are present.
>
> However, this does pose a problem when working with SHA-256
> repositories, since we use the capabilities to know the remote side's
> object format (hash algorithm).  It used to be possible to set the
> correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS
> testsuite did), but this no longer works as of 8b214c2e9d ("clone:

"this no longer works as of" -> "this was a mistake and was fixed by".

> propagate object-format when cloning from void", 2023-04-05), since
> there we always read the hash algorithm from the remote.  If there is no
> hash algorithm provided, we default to SHA-1 for backwards
> compatibility.

Other than that, looks good to me.

Thanks.  Will queue.
brian m. carlson April 26, 2023, 9:28 p.m. UTC | #2
On 2023-04-26 at 21:14:37, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > From: "brian m. carlson" <bk2204@github.com>
> >
> > When cloning an empty repository, the HTTP protocol version 0 currently
> > offers nothing but the header and flush packets for the /info/refs
> > endpoint. This means that no capabilities are provided, so the client
> > side doesn't know what capabilities are present.
> >
> > However, this does pose a problem when working with SHA-256
> > repositories, since we use the capabilities to know the remote side's
> > object format (hash algorithm).  It used to be possible to set the
> > correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS
> > testsuite did), but this no longer works as of 8b214c2e9d ("clone:
> 
> "this no longer works as of" -> "this was a mistake and was fixed by".

I tend to disagree.  While I agree that change is valuable because it
fixes v2, which we want, it does cause a change in user-visible
behaviour, which broke the Git LFS testsuite.  Whether we like things
working that way or not, clearly there were people relying on it.

Fortunately, in that case, Git LFS can just enable protocol v2 and
things work again, but I think "this no longer works" is accurate and
more neutral, and addresses the issue.  We wouldn't have to deal with
that issue if we could gracefully handle git clone --local with older
versions of the protocol, but one of the tests fails when we do that.
I'll take some more time to see if I can come up with a nice way to
gracefully handle that, and if so, I'll send a v2.
Jeff King April 27, 2023, 5 a.m. UTC | #3
On Wed, Apr 26, 2023 at 09:28:31PM +0000, brian m. carlson wrote:

> > > However, this does pose a problem when working with SHA-256
> > > repositories, since we use the capabilities to know the remote side's
> > > object format (hash algorithm).  It used to be possible to set the
> > > correct algorithm with `GIT_DEFAULT_HASH` (which is what the Git LFS
> > > testsuite did), but this no longer works as of 8b214c2e9d ("clone:
> > 
> > "this no longer works as of" -> "this was a mistake and was fixed by".
> 
> I tend to disagree.  While I agree that change is valuable because it
> fixes v2, which we want, it does cause a change in user-visible
> behaviour, which broke the Git LFS testsuite.  Whether we like things
> working that way or not, clearly there were people relying on it.
> 
> Fortunately, in that case, Git LFS can just enable protocol v2 and
> things work again, but I think "this no longer works" is accurate and
> more neutral, and addresses the issue.  We wouldn't have to deal with
> that issue if we could gracefully handle git clone --local with older
> versions of the protocol, but one of the tests fails when we do that.
> I'll take some more time to see if I can come up with a nice way to
> gracefully handle that, and if so, I'll send a v2.

Reiterating what I said upthread, I think it was always 50% broken.
Taking the local hash format over the remote one was always the wrong
thing to do, but it sometimes worked out (because we happened to match
the remote).

But the opposite case:

  git init --object-format=sha1 dst.git
  GIT_DEFAULT_HASH=sha256 git clone dst.git

would previously have done the wrong thing. We just flipped which half
was broken and which was not.

-Peff
Jeff King April 27, 2023, 5:30 a.m. UTC | #4
On Wed, Apr 26, 2023 at 08:53:23PM +0000, brian m. carlson wrote:

> From: "brian m. carlson" <bk2204@github.com>
> 
> When cloning an empty repository, the HTTP protocol version 0 currently
> offers nothing but the header and flush packets for the /info/refs
> endpoint. This means that no capabilities are provided, so the client
> side doesn't know what capabilities are present.

Is this really an HTTP problem?

If I do:

  git init --bare --object-format=sha256 remote.git
  git -c protocol.version=0 clone --bare remote.git local.git
  git -C local.git rev-parse --show-object-format

I will get sha1, which is wrong. Likewise with GIT_DEFAULT_HASH=sha256
on the clone (after Junio's recent patch), regardless of what the server
claims. This is really a git-protocol issue that affects all transports.

So I think in this hunk:

> @@ -1379,6 +1381,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
>  			data.no_done = 1;
>  		head_ref_namespaced(send_ref, &data);
>  		for_each_namespaced_ref(send_ref, &data);
> +		if (!data.sent_capabilities && advertise_refs)
> +			send_ref("capabilities^{}", null_oid(), 0, &data);
>  		/*
>  		 * fflush stdout before calling advertise_shallow_grafts because send_ref
>  		 * uses stdio.

you would want to drop the "&& advertise_refs" bit, after which both of
the cases above would yield a sha256 repository.

There is one other catch, though. Doing as I suggest results in a
failure in t5509, because the new code does not interact correctly with
namespaces. That is true of your version, as well; it's just that the
test suite does not cover the combination of namespaces, http, and empty
repos.

The issue is that send_ref() will try to strip the namespace, and end up
with NULL (which on my glibc system ends up with a ref named "(null)",
but obviously could segfault, too).

Something like this fixes it:

diff --git a/environment.c b/environment.c
index 8a96997539..37cd66b295 100644
--- a/environment.c
+++ b/environment.c
@@ -234,6 +234,8 @@ const char *get_git_namespace(void)
 const char *strip_namespace(const char *namespaced_ref)
 {
 	const char *out;
+	if (!strcmp(namespaced_ref, "capabilities^{}"))
+		return namespaced_ref; /* magic ref */
 	if (skip_prefix(namespaced_ref, get_git_namespace(), &out))
 		return out;
 	return NULL;

but I suspect it would be cleaner to refactor send_ref() to allow
sending a name more directly.

(As an aside, it feels like send_ref() is also wrong not to check for
NULL from strip_namespace(), but I guess in practice we do not feed
it names outside of the namespace. Might be a good candidate for a BUG()
check or other assertion).

> +test_expect_success 'clone empty SHA-256 repository with protocol v0' '
> +	rm -fr sha256 &&
> +	echo sha256 >expected &&
> +	GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" &&
> +	git -C sha256 rev-parse --show-object-format >actual &&
> +	test_cmp actual expected &&
> +	git ls-remote "$HTTPD_URL/smart/sha256.git" >actual &&
> +	test_must_be_empty actual
> +'

This looks reasonable, though I think if we do not need HTTP to
demonstrate the issue (and I don't think we do), then we should probably
avoid it, just to get test coverage on platforms that don't support
HTTP.

-Peff
Junio C Hamano April 27, 2023, 8:40 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

> So I think in this hunk:
>
>> @@ -1379,6 +1381,8 @@ void upload_pack(const int advertise_refs, const int stateless_rpc,
>>  			data.no_done = 1;
>>  		head_ref_namespaced(send_ref, &data);
>>  		for_each_namespaced_ref(send_ref, &data);
>> +		if (!data.sent_capabilities && advertise_refs)
>> +			send_ref("capabilities^{}", null_oid(), 0, &data);
>>  		/*
>>  		 * fflush stdout before calling advertise_shallow_grafts because send_ref
>>  		 * uses stdio.
>
> you would want to drop the "&& advertise_refs" bit, after which both of
> the cases above would yield a sha256 repository.

Good suggestion.

>> +test_expect_success 'clone empty SHA-256 repository with protocol v0' '
>> +	rm -fr sha256 &&
>> +	echo sha256 >expected &&
>> +	GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" &&
>> +	git -C sha256 rev-parse --show-object-format >actual &&
>> +	test_cmp actual expected &&
>> +	git ls-remote "$HTTPD_URL/smart/sha256.git" >actual &&
>> +	test_must_be_empty actual
>> +'
>
> This looks reasonable, though I think if we do not need HTTP to
> demonstrate the issue (and I don't think we do), then we should probably
> avoid it, just to get test coverage on platforms that don't support
> HTTP.

HTTP tests tend to be more cumbersome to set up and harder to debug
than the plain vanilla "over the pipe on the same machine"
transport, so I tend to agree with the statement.

They however represent a more common use case, so having HTTP tests
in addition to non-HTTP tests would be nicer, if we can afford to.

Thanks.
diff mbox series

Patch

diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 0908534f25..21b7767cbd 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -611,6 +611,33 @@  test_expect_success 'client falls back from v2 to v0 to match server' '
 	grep symref=HEAD:refs/heads/ trace
 '
 
+test_expect_success 'create empty http-accessible SHA-256 repository' '
+	mkdir "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" &&
+	(cd "$HTTPD_DOCUMENT_ROOT_PATH/sha256.git" &&
+	 git --bare init --object-format=sha256
+	)
+'
+
+test_expect_success 'clone empty SHA-256 repository with protocol v2' '
+	rm -fr sha256 &&
+	echo sha256 >expected &&
+	git -c protocol.version=2 clone "$HTTPD_URL/smart/sha256.git" &&
+	git -C sha256 rev-parse --show-object-format >actual &&
+	test_cmp actual expected &&
+	git ls-remote "$HTTPD_URL/smart/sha256.git" >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'clone empty SHA-256 repository with protocol v0' '
+	rm -fr sha256 &&
+	echo sha256 >expected &&
+	GIT_TRACE=1 GIT_TRACE_PACKET=1 git -c protocol.version=0 clone "$HTTPD_URL/smart/sha256.git" &&
+	git -C sha256 rev-parse --show-object-format >actual &&
+	test_cmp actual expected &&
+	git ls-remote "$HTTPD_URL/smart/sha256.git" >actual &&
+	test_must_be_empty actual
+'
+
 test_expect_success 'passing hostname resolution information works' '
 	BOGUS_HOST=gitbogusexamplehost.invalid &&
 	BOGUS_HTTPD_URL=$HTTPD_PROTO://$BOGUS_HOST:$LIB_HTTPD_PORT &&
diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
index 6c8d4c6cf1..3cd9db9012 100755
--- a/t/t5700-protocol-v1.sh
+++ b/t/t5700-protocol-v1.sh
@@ -249,10 +249,12 @@  test_expect_success 'push with ssh:// using protocol v1' '
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-test_expect_success 'create repo to be served by http:// transport' '
+test_expect_success 'create repos to be served by http:// transport' '
 	git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
 	git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack true &&
-	test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one
+	test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one &&
+	git init --object-format=sha256 "$HTTPD_DOCUMENT_ROOT_PATH/sha256" &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/sha256" config http.receivepack true
 '
 
 test_expect_success 'clone with http:// using protocol v1' '
@@ -269,6 +271,21 @@  test_expect_success 'clone with http:// using protocol v1' '
 	grep "git< version 1" log
 '
 
+test_expect_success 'clone with http:// using protocol v1 with empty SHA-256 repo' '
+	GIT_TRACE_PACKET=1 GIT_TRACE_CURL=1 git -c protocol.version=1 \
+		clone "$HTTPD_URL/smart/sha256" sha256 2>log &&
+
+	cat log &&
+	echo sha256 >expect &&
+	git -C sha256 rev-parse --show-object-format >actual &&
+	test_cmp expect actual &&
+
+	# Client requested to use protocol v1
+	grep "Git-Protocol: version=1" log &&
+	# Server responded using protocol v1
+	grep "git< version 1" log
+'
+
 test_expect_success 'fetch with http:// using protocol v1' '
 	test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" two &&
 
diff --git a/upload-pack.c b/upload-pack.c
index 08633dc121..5ef9b162b6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -120,6 +120,7 @@  struct upload_pack_data {
 	unsigned allow_ref_in_want : 1;				/* v2 only */
 	unsigned allow_sideband_all : 1;			/* v2 only */
 	unsigned advertise_sid : 1;
+	unsigned sent_capabilities : 1;
 };
 
 static void upload_pack_data_init(struct upload_pack_data *data)
@@ -1240,6 +1241,7 @@  static int send_ref(const char *refname, const struct object_id *oid,
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 		strbuf_release(&session_id);
+		data->sent_capabilities = 1;
 	} else {
 		packet_fwrite_fmt(stdout, "%s %s\n", oid_to_hex(oid), refname_nons);
 	}
@@ -1379,6 +1381,8 @@  void upload_pack(const int advertise_refs, const int stateless_rpc,
 			data.no_done = 1;
 		head_ref_namespaced(send_ref, &data);
 		for_each_namespaced_ref(send_ref, &data);
+		if (!data.sent_capabilities && advertise_refs)
+			send_ref("capabilities^{}", null_oid(), 0, &data);
 		/*
 		 * fflush stdout before calling advertise_shallow_grafts because send_ref
 		 * uses stdio.