diff mbox series

remote-curl: fix clone on sha256 repos

Message ID 20210511103730.GA15003@dcvr (mailing list archive)
State Accepted
Commit 6d6a81717aea7ad14eace78beb71800d281cd6a6
Headers show
Series remote-curl: fix clone on sha256 repos | expand

Commit Message

Eric Wong May 11, 2021, 10:37 a.m. UTC
I'm not very familiar with the way some of this stuff works, but
the patch below seems to fix clone for me.

I originally tried changing the cmd_main->set_option code path,
but value is always set to "true" for object-format because
it only sees "option object-format" with no arg

		} else if (skip_prefix(buf.buf, "option ", &arg)) {
			char *value = strchr(arg, ' ');
			int result;

			if (value)
				*value++ = '\0';
			else
				value = "true";

			result = set_option(arg, value);

So when set_option gets called, hash_algo_by_name isn't:

	} else if (!strcmp(name, "object-format")) {
		int algo;
		options.object_format = 1;
		if (strcmp(value, "true")) {
			/* XXX this branch is never taken: */
			algo = hash_algo_by_name(value);
			if (algo == GIT_HASH_UNKNOWN)
				die("unknown object format '%s'", value);
			options.hash_algo = &hash_algos[algo];
		}
		return 0;

So I'm not sure if the above is incomplete or dead code.
Anyways, I arrived at the following and it works for me:

-----------8<---------
Subject: [PATCH] remote-curl: fix clone on sha256 repos

The remote-https process needs to update it's own instance of
`the_repository' when it sees an HTTP(S) remote is using sha256.
Without this, parse_oid_hex() fails to handle sha256 OIDs when
it's eventually called by parse_fetch().

Tested with:

	git clone https://yhbt.net/sha256test.git
	GIT_SMART_HTTP=0 git clone https://yhbt.net/sha256test.git
	(plain http:// also works)

Cloning the URL via git:// required no changes

Signed-off-by: Eric Wong <e@80x24.org>
---
 remote-curl.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Junio C Hamano May 11, 2021, 1:36 p.m. UTC | #1
Eric Wong <e@80x24.org> writes:

> The remote-https process needs to update it's own instance of
> `the_repository' when it sees an HTTP(S) remote is using sha256.
> Without this, parse_oid_hex() fails to handle sha256 OIDs when
> it's eventually called by parse_fetch().
>
> Tested with:
>
> 	git clone https://yhbt.net/sha256test.git
> 	GIT_SMART_HTTP=0 git clone https://yhbt.net/sha256test.git
> 	(plain http:// also works)
>
> Cloning the URL via git:// required no changes

OK, so smart-http is disabled because it is just a slight variation
of the native Git protocol in disguise running over the http
transport, while the non-smart-http uses totally different codepath
to initialize the repository, and this bug does not appear when the
native Git protocol is in use?

I guess we use "git clone" over HTTP in many place in our tests, so
there is no need to add a new one to safeguard this fix from future
breakage (instead we can just run the whole test suite with SHA256)?

Will wait for brian to comment.

Thanks.

> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  remote-curl.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/remote-curl.c b/remote-curl.c
> index 0290b04891..9d432c299a 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -555,6 +555,8 @@ static void output_refs(struct ref *refs)
>  	struct ref *posn;
>  	if (options.object_format && options.hash_algo) {
>  		printf(":object-format %s\n", options.hash_algo->name);
> +		repo_set_hash_algo(the_repository,
> +				hash_algo_by_ptr(options.hash_algo));
>  	}
>  	for (posn = refs; posn; posn = posn->next) {
>  		if (posn->symref)
Ævar Arnfjörð Bjarmason May 11, 2021, 2:47 p.m. UTC | #2
On Tue, May 11 2021, Eric Wong wrote:

> I'm not very familiar with the way some of this stuff works, but
> the patch below seems to fix clone for me.

Me neither, but it seems that whatever issue we have here we've got a
big blind spot in our test suite.

GIT_SMART_HTTP in the environment will be ignored by test-lib.sh,
manually changing the code to use it leads to a lot of test failures,
some are definitely expected (incompatible server responses), but some
might not be...
Jeff King May 11, 2021, 6:25 p.m. UTC | #3
On Tue, May 11, 2021 at 04:47:28PM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> On Tue, May 11 2021, Eric Wong wrote:
> 
> > I'm not very familiar with the way some of this stuff works, but
> > the patch below seems to fix clone for me.
> 
> Me neither, but it seems that whatever issue we have here we've got a
> big blind spot in our test suite.
> 
> GIT_SMART_HTTP in the environment will be ignored by test-lib.sh,
> manually changing the code to use it leads to a lot of test failures,
> some are definitely expected (incompatible server responses), but some
> might not be...

There are specific dumb-http tests in t5550. And I think we can and do
run the suite with GIT_TEST_DEFAULT_HASH=sha256. But I suspect that
covers up the problem. If I understand the problem correctly, it comes
about when the client thinks the default hash is sha1, but the server is
sha256. Whereas GIT_TEST_DEFAULT_HASH affects _both_ sides.

So I think we'd want interop tests in t5550 that specifically create a
sha256 server and access it with a sha1 client (and possibly vice
versa).

It would be nice if there was a way to use an environment variable like
GIT_TEST_DEFAULT_HASH to mean "be hash X on the server, but Y on the
client". But I don't think we can easily do that. The "git init" command
which is used to create a repo that is later used for server access
doesn't _know_ that's what it's being used for. Possibly we could have
an extra variable that instructs git-clone to use a separate default
hash. And then:

  GIT_TEST_DEFAULT_HASH=sha256 \
  GIT_TEST_CLONE_DEFAULT_HASH=sha1 \
  make test

might possibly trigger some interesting cases.

-Peff
Junio C Hamano May 11, 2021, 9:02 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> There are specific dumb-http tests in t5550. And I think we can and do
> run the suite with GIT_TEST_DEFAULT_HASH=sha256. But I suspect that
> covers up the problem. If I understand the problem correctly, it comes
> about when the client thinks the default hash is sha1, but the server is
> sha256. Whereas GIT_TEST_DEFAULT_HASH affects _both_ sides.
>
> So I think we'd want interop tests in t5550 that specifically create a
> sha256 server and access it with a sha1 client (and possibly vice
> versa).

Ahh, you're right.  Such a combination is worth testing but is
tricky to arrange.
Jeff King May 11, 2021, 9:23 p.m. UTC | #5
On Tue, May 11, 2021 at 02:25:58PM -0400, Jeff King wrote:

> It would be nice if there was a way to use an environment variable like
> GIT_TEST_DEFAULT_HASH to mean "be hash X on the server, but Y on the
> client". But I don't think we can easily do that. The "git init" command
> which is used to create a repo that is later used for server access
> doesn't _know_ that's what it's being used for. Possibly we could have
> an extra variable that instructs git-clone to use a separate default
> hash. And then:
> 
>   GIT_TEST_DEFAULT_HASH=sha256 \
>   GIT_TEST_CLONE_DEFAULT_HASH=sha1 \
>   make test
> 
> might possibly trigger some interesting cases.

So this triggers several test failures:

diff --git a/builtin/clone.c b/builtin/clone.c
index eeb74c0217..622447daf7 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -992,6 +992,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 	packet_trace_identity("clone");
 
+	/* hack to test cross-hash interop */
+	{
+		const char *x = getenv("GIT_TEST_DEFAULT_CLONE_HASH");
+		if (x)
+			setenv("GIT_DEFAULT_HASH", x, 1);
+	}
+
 	git_config(git_clone_config, NULL);
 
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,

For example, running:

  GIT_TEST_DEFAULT_CLONE_HASH=sha1 \
  GIT_TEST_DEFAULT_HASH=sha256 \
  ./t5550-http-fetch-dumb.sh -v -i

fails in test 3, which shows the problem Eric found. And indeed,
applying his patch fixes it. We then go on to fail in test 7 ("empty
dumb HTTP repository has default hash algorithm"), which is not all that
surprising, since there is no longer one "the default".

So I'm not sure if the technique is a good one. It did find a bug, but
it's not clear to me if the other dozen failures it finds are also bugs,
or places where the tests should be modified to handle this case, or
just weird problems caused by the hacky implementation above. A lot of
them seem to involve submodules, which is not surprising; we'd probably
end up here with a sha256 superproject and cloned sha1 modules.

-Peff
brian m. carlson May 11, 2021, 11:01 p.m. UTC | #6
On 2021-05-11 at 10:37:30, Eric Wong wrote:
> I'm not very familiar with the way some of this stuff works, but
> the patch below seems to fix clone for me.
> 
> I originally tried changing the cmd_main->set_option code path,
> but value is always set to "true" for object-format because
> it only sees "option object-format" with no arg
> 
> 		} else if (skip_prefix(buf.buf, "option ", &arg)) {
> 			char *value = strchr(arg, ' ');
> 			int result;
> 
> 			if (value)
> 				*value++ = '\0';
> 			else
> 				value = "true";
> 
> 			result = set_option(arg, value);
> 
> So when set_option gets called, hash_algo_by_name isn't:
> 
> 	} else if (!strcmp(name, "object-format")) {
> 		int algo;
> 		options.object_format = 1;
> 		if (strcmp(value, "true")) {
> 			/* XXX this branch is never taken: */
> 			algo = hash_algo_by_name(value);
> 			if (algo == GIT_HASH_UNKNOWN)
> 				die("unknown object format '%s'", value);
> 			options.hash_algo = &hash_algos[algo];
> 		}
> 		return 0;
> 
> So I'm not sure if the above is incomplete or dead code.
> Anyways, I arrived at the following and it works for me:
> 
> -----------8<---------
> Subject: [PATCH] remote-curl: fix clone on sha256 repos
> 
> The remote-https process needs to update it's own instance of
> `the_repository' when it sees an HTTP(S) remote is using sha256.
> Without this, parse_oid_hex() fails to handle sha256 OIDs when
> it's eventually called by parse_fetch().

This seems fine as a solution for now.  I tried to keep the transport
code mostly independent of the local repository settings, but in this
case because the HTTP walker mucks around with the internals of the
local pack files, I don't think we can avoid this without some major
restructuring, which I'm not interested in sitting down and writing this
evening.

I'll clean this up in a nicer way once I get interop working.  Thanks
for sending a patch for this.
brian m. carlson May 11, 2021, 11:04 p.m. UTC | #7
On 2021-05-11 at 18:25:58, Jeff King wrote:
> On Tue, May 11, 2021 at 04:47:28PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> > 
> > On Tue, May 11 2021, Eric Wong wrote:
> > 
> > > I'm not very familiar with the way some of this stuff works, but
> > > the patch below seems to fix clone for me.
> > 
> > Me neither, but it seems that whatever issue we have here we've got a
> > big blind spot in our test suite.
> > 
> > GIT_SMART_HTTP in the environment will be ignored by test-lib.sh,
> > manually changing the code to use it leads to a lot of test failures,
> > some are definitely expected (incompatible server responses), but some
> > might not be...
> 
> There are specific dumb-http tests in t5550. And I think we can and do
> run the suite with GIT_TEST_DEFAULT_HASH=sha256. But I suspect that
> covers up the problem. If I understand the problem correctly, it comes
> about when the client thinks the default hash is sha1, but the server is
> sha256. Whereas GIT_TEST_DEFAULT_HASH affects _both_ sides.
> 
> So I think we'd want interop tests in t5550 that specifically create a
> sha256 server and access it with a sha1 client (and possibly vice
> versa).

This is pretty easy to do just by unsetting GIT_TEST_DEFAULT_HASH from
the environment.  We do that in some places in the init tests.
Junio C Hamano May 11, 2021, 11:48 p.m. UTC | #8
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> This seems fine as a solution for now.  I tried to keep the transport
> code mostly independent of the local repository settings, but in this
> case because the HTTP walker mucks around with the internals of the
> local pack files, I don't think we can avoid this without some major
> restructuring, which I'm not interested in sitting down and writing this
> evening.
>
> I'll clean this up in a nicer way once I get interop working.  Thanks
> for sending a patch for this.

Thanks, both.

As an "experimental" stuff, I do not think SHA256 "fix" is as urgent
as (or of higher priority than) other stuff, like reducing
inter-developer stepping-on-others-toes, so I'll refrain from
picking Eric's patch up myself and let you include/handle it later.
brian m. carlson May 11, 2021, 11:51 p.m. UTC | #9
On 2021-05-11 at 23:48:51, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > This seems fine as a solution for now.  I tried to keep the transport
> > code mostly independent of the local repository settings, but in this
> > case because the HTTP walker mucks around with the internals of the
> > local pack files, I don't think we can avoid this without some major
> > restructuring, which I'm not interested in sitting down and writing this
> > evening.
> >
> > I'll clean this up in a nicer way once I get interop working.  Thanks
> > for sending a patch for this.
> 
> Thanks, both.
> 
> As an "experimental" stuff, I do not think SHA256 "fix" is as urgent
> as (or of higher priority than) other stuff, like reducing
> inter-developer stepping-on-others-toes, so I'll refrain from
> picking Eric's patch up myself and let you include/handle it later.

No, please do pick up the patch.  The time frame for which I'm looking
at fixing this is several months out and I think some solution is best
adopted now.  Since Eric has sent a patch that works, I think it's best
to fix the immediate problem and let me clean up things later on.
diff mbox series

Patch

diff --git a/remote-curl.c b/remote-curl.c
index 0290b04891..9d432c299a 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -555,6 +555,8 @@  static void output_refs(struct ref *refs)
 	struct ref *posn;
 	if (options.object_format && options.hash_algo) {
 		printf(":object-format %s\n", options.hash_algo->name);
+		repo_set_hash_algo(the_repository,
+				hash_algo_by_ptr(options.hash_algo));
 	}
 	for (posn = refs; posn; posn = posn->next) {
 		if (posn->symref)