diff mbox series

builtin/clone: avoid failure with GIT_DEFAULT_HASH

Message ID 20200911233815.2808426-1-sandals@crustytoothpaste.net
State Superseded
Headers show
Series builtin/clone: avoid failure with GIT_DEFAULT_HASH | expand

Commit Message

brian m. carlson Sept. 11, 2020, 11:38 p.m. UTC
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256".  This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).

This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using.  We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.

We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be.  And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.

Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1.  This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.

Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/clone.c   |  2 +-
 builtin/init-db.c |  6 ++++--
 cache.h           |  2 +-
 t/t5601-clone.sh  | 10 ++++++++++
 4 files changed, 16 insertions(+), 4 deletions(-)

Comments

Taylor Blau Sept. 12, 2020, 3:24 a.m. UTC | #1
On Fri, Sep 11, 2020 at 11:38:15PM +0000, brian m. carlson wrote:
> Neither of those are appealing, so let's tell the repository
> initialization code if we're doing a reinit like this, and if so, to
> clear the extension if we're using SHA-1.  This makes sure we produce a
> valid and functional repository and doesn't break any of our other use
> cases.

Your explanation makes sense, and what you're proposing here seems like
a very reasonable path forward. I can't think of anything that would
cause it not to work as expected.

> -void initialize_repository_version(int hash_algo);
> +void initialize_repository_version(int hash_algo, int reinit);

I'm not a huge fan of adding a 'reinit' parameter to a function that
itself begins with the word 'initialize' (why wouldn't you call
'reinitialize_repository_version()' instead?), but seeing as there are
only a couple of callers, maybe it is OK.

Alternatively, I certainly wouldn't complain if you did introduce a new
function and updated the call-site that passes reinit as 1.

> +test_expect_success 'clone with GIT_DEFAULT_HASH' '
> +	(
> +		sane_unset GIT_DEFAULT_HASH &&
> +		git init test
> +	) &&
> +	test_commit -C test foo &&
> +	git clone test test-clone &&
> +	git -C test-clone status
> +'
> +

This test looks very reasonable, and certainly demonstrates the bug and
fix. Thanks.

With or without my nitpick:

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor
brian m. carlson Sept. 12, 2020, 7:52 p.m. UTC | #2
On 2020-09-12 at 03:24:48, Taylor Blau wrote:
> On Fri, Sep 11, 2020 at 11:38:15PM +0000, brian m. carlson wrote:
> > -void initialize_repository_version(int hash_algo);
> > +void initialize_repository_version(int hash_algo, int reinit);
> 
> I'm not a huge fan of adding a 'reinit' parameter to a function that
> itself begins with the word 'initialize' (why wouldn't you call
> 'reinitialize_repository_version()' instead?), but seeing as there are
> only a couple of callers, maybe it is OK.
> 
> Alternatively, I certainly wouldn't complain if you did introduce a new
> function and updated the call-site that passes reinit as 1.

I thought about introducing a new function, but since it would share
almost all of the code, it seemed a bit wasteful, even if the function
is small.  We do have only two callers, I believe, since I recall
making this function non-static and calling it from clone, so I think
it's okay.

> > +test_expect_success 'clone with GIT_DEFAULT_HASH' '
> > +	(
> > +		sane_unset GIT_DEFAULT_HASH &&
> > +		git init test
> > +	) &&
> > +	test_commit -C test foo &&
> > +	git clone test test-clone &&
> > +	git -C test-clone status
> > +'
> > +
> 
> This test looks very reasonable, and certainly demonstrates the bug and
> fix. Thanks.

This is essentially Matheus's testcase.  I considered using env -u to
avoid the subshell but POSIX doesn't specify that, so we have a
subshell.
Junio C Hamano Sept. 14, 2020, 9:37 p.m. UTC | #3
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> Alternatively, I certainly wouldn't complain if you did introduce a new
>> function and updated the call-site that passes reinit as 1.
>
> I thought about introducing a new function, but since it would share
> almost all of the code, it seemed a bit wasteful, even if the function
> is small.  We do have only two callers, I believe, since I recall
> making this function non-static and calling it from clone, so I think
> it's okay.

Perhaps.

FWIW, this seems to have strange interaction with something in
'seen' when merged; I suspect it is the topic that mucks with the
set-up sequence for "git clone", but didn't check.
Junio C Hamano Sept. 14, 2020, 9:44 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
>>> Alternatively, I certainly wouldn't complain if you did introduce a new
>>> function and updated the call-site that passes reinit as 1.
>>
>> I thought about introducing a new function, but since it would share
>> almost all of the code, it seemed a bit wasteful, even if the function
>> is small.  We do have only two callers, I believe, since I recall
>> making this function non-static and calling it from clone, so I think
>> it's okay.
>
> Perhaps.
>
> FWIW, this seems to have strange interaction with something in
> 'seen' when merged; I suspect it is the topic that mucks with the
> set-up sequence for "git clone", but didn't check.

Actually I have to take it back.  I have this directly on top of
v2.28.0 and it already breaks tests big time.  For example, here is
how "cd t && sh t0021-conversion.sh -i -v" breaks:

    ...
    Cloning into 'repo-cloned'...
    fatal: could not unset 'extensions.objectformat'
    fatal: the remote end hung up unexpectedly
    not ok 25 - delayed checkout in process filter

The story is the same if it is applied on top of 'master' (which is
expected, as we haven't done anything to affect this area since
v2.28.0).
brian m. carlson Sept. 15, 2020, 1:32 a.m. UTC | #5
On 2020-09-14 at 21:44:45, Junio C Hamano wrote:
> Actually I have to take it back.  I have this directly on top of
> v2.28.0 and it already breaks tests big time.  For example, here is
> how "cd t && sh t0021-conversion.sh -i -v" breaks:
> 
>     ...
>     Cloning into 'repo-cloned'...
>     fatal: could not unset 'extensions.objectformat'
>     fatal: the remote end hung up unexpectedly
>     not ok 25 - delayed checkout in process filter
> 
> The story is the same if it is applied on top of 'master' (which is
> expected, as we haven't done anything to affect this area since
> v2.28.0).

Ah, yes.  I build with GIT_TEST_DEFAULT_HASH=sha256, but if you run the
tests with SHA-1, it fails.  Feel free to drop it for now and I'll send
out a v2 soon.  Sorry about that.
diff mbox series

Patch

diff --git a/builtin/clone.c b/builtin/clone.c
index b087ee40c2..925a2e3dd6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1235,7 +1235,7 @@  int cmd_clone(int argc, const char **argv, const char *prefix)
 		 * Now that we know what algorithm the remote side is using,
 		 * let's set ours to the same thing.
 		 */
-		initialize_repository_version(hash_algo);
+		initialize_repository_version(hash_algo, 1);
 		repo_set_hash_algo(the_repository, hash_algo);
 
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
diff --git a/builtin/init-db.c b/builtin/init-db.c
index cd3e760541..57e15e93da 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -179,7 +179,7 @@  static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo, int reinit)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -195,6 +195,8 @@  void initialize_repository_version(int hash_algo)
 	if (hash_algo != GIT_HASH_SHA1)
 		git_config_set("extensions.objectformat",
 			       hash_algos[hash_algo].name);
+	else if (reinit)
+		git_config_set("extensions.objectformat", NULL);
 }
 
 static int create_default_files(const char *template_path,
@@ -277,7 +279,7 @@  static int create_default_files(const char *template_path,
 		free(ref);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, 0);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
diff --git a/cache.h b/cache.h
index cee8aa5dc3..c0072d43b1 100644
--- a/cache.h
+++ b/cache.h
@@ -629,7 +629,7 @@  int path_inside_repo(const char *prefix, const char *path);
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
 	    const char *initial_branch, unsigned int flags);
-void initialize_repository_version(int hash_algo);
+void initialize_repository_version(int hash_algo, int reinit);
 
 void sanitize_stdfds(void);
 int daemonize(void);
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 15fb64c18d..570d989795 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -631,6 +631,16 @@  test_expect_success CASE_INSENSITIVE_FS 'colliding file detection' '
 	test_i18ngrep "the following paths have collided" icasefs/warning
 '
 
+test_expect_success 'clone with GIT_DEFAULT_HASH' '
+	(
+		sane_unset GIT_DEFAULT_HASH &&
+		git init test
+	) &&
+	test_commit -C test foo &&
+	git clone test test-clone &&
+	git -C test-clone status
+'
+
 partial_clone_server () {
 	       SERVER="$1" &&