diff mbox series

[2/2] repository: allow repository format upgrade with extensions

Message ID 20200716062818.GC3242764@google.com (mailing list archive)
State New, archived
Headers show
Series extensions.* fixes for 2.28 (Re: [PATCH] setup: warn about un-enabled extensions) | expand

Commit Message

Jonathan Nieder July 16, 2020, 6:28 a.m. UTC
Now that we officially permit repository extensions in repository
format v0, permit upgrading a repository with extensions from v0 to v1
as well.

For example, this means a repository where the user has set
"extensions.preciousObjects" can use "git fetch --filter=blob:none
origin" to upgrade the repository to use v1 and the partial clone
extension.

To avoid mistakes, continue to forbid repository format upgrades in v0
repositories with an unrecognized extension.  This way, a v0 user
using a misspelled extension field gets a chance to correct the
mistake before updating to the less forgiving v1 format.

While we're here, make the error message for failure to upgrade the
repository format a bit shorter, and present it as an error, not a
warning.

Reported-by: Huan Huan Chen <huanhuanchen@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Apologies again for the trouble, and thanks for your patient help.

 cache.h                  |  1 -
 setup.c                  | 12 +++++++-----
 t/t0410-partial-clone.sh |  4 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

Comments

Junio C Hamano July 16, 2020, 7:01 a.m. UTC | #1
Jonathan Nieder <jrnieder@gmail.com> writes:

> Now that we officially permit repository extensions in repository
> format v0, permit upgrading a repository with extensions from v0 to v1
> as well.
>
> For example, this means a repository where the user has set
> "extensions.preciousObjects" can use "git fetch --filter=blob:none
> origin" to upgrade the repository to use v1 and the partial clone
> extension.
>
> To avoid mistakes, continue to forbid repository format upgrades in v0
> repositories with an unrecognized extension.  This way, a v0 user
> using a misspelled extension field gets a chance to correct the
> mistake before updating to the less forgiving v1 format.

This needs to be managed carefully.  When the next extension is
added to the codebase, that extension may be "known" to Git, but I
do not think it is a good idea to honor it in v0 repository, or
allow upgrading v0 repository to v1 with such an extension that
weren't "known" to Git.  For example, a topic in flight adds
objectformat extension and I do not think it should be honored in v0
repository.

Having said that, the approach is OK for now at the tip of tonight's
master, but the point is "known" vs "unknown" must be fixed right
with some means.  E.g. tell people to throw the "new" extensions to
the list of "unknown extensions" in check_repo_format() when they
add new ones, or something.

Thanks.

> +	if (!repo_fmt.version && repo_fmt.unknown_extensions.nr)
> +		return error("cannot upgrade repository format: "
> +			     "unknown extension %s",
> +			     repo_fmt.unknown_extensions.items[0].string);
>  
>  	strbuf_addf(&repo_version, "%d", target_version);
>  	git_config_set("core.repositoryformatversion", repo_version.buf);
> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 51d1eba6050..6aa0f313bdd 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -42,7 +42,7 @@ test_expect_success 'convert shallow clone to partial clone' '
>  	test_cmp_config -C client 1 core.repositoryformatversion
>  '
>  
> -test_expect_success 'converting to partial clone fails with noop extension' '
> +test_expect_success 'convert to partial clone with noop extension' '
>  	rm -fr server client &&
>  	test_create_repo server &&
>  	test_commit -C server my_commit 1 &&
> @@ -50,7 +50,7 @@ test_expect_success 'converting to partial clone fails with noop extension' '
>  	git clone --depth=1 "file://$(pwd)/server" client &&
>  	test_cmp_config -C client 0 core.repositoryformatversion &&
>  	git -C client config extensions.noop true &&
> -	test_must_fail git -C client fetch --unshallow --filter="blob:none"
> +	git -C client fetch --unshallow --filter="blob:none"
>  '
>  
>  test_expect_success 'converting to partial clone fails with unrecognized extension' '
Jeff King July 16, 2020, 11 a.m. UTC | #2
On Thu, Jul 16, 2020 at 12:01:09AM -0700, Junio C Hamano wrote:

> > To avoid mistakes, continue to forbid repository format upgrades in v0
> > repositories with an unrecognized extension.  This way, a v0 user
> > using a misspelled extension field gets a chance to correct the
> > mistake before updating to the less forgiving v1 format.
> 
> This needs to be managed carefully.  When the next extension is
> added to the codebase, that extension may be "known" to Git, but I
> do not think it is a good idea to honor it in v0 repository, or
> allow upgrading v0 repository to v1 with such an extension that
> weren't "known" to Git.  For example, a topic in flight adds
> objectformat extension and I do not think it should be honored in v0
> repository.
> 
> Having said that, the approach is OK for now at the tip of tonight's
> master, but the point is "known" vs "unknown" must be fixed right
> with some means.  E.g. tell people to throw the "new" extensions to
> the list of "unknown extensions" in check_repo_format() when they
> add new ones, or something.

Yeah, I agree with this line of reasoning. I'd prefer to see it
addressed now, so that we don't have to remember to do anything later.
I.e., for this patch to put the existing known extensions into the
"good" list for v0, locking it into place forever, and leaving the
objectformat topic with nothing particular it needs to do.

But in the name of -rc1 expediency, I'm also OK moving forward with this
for now.

-Peff
Jeff King July 16, 2020, 12:25 p.m. UTC | #3
On Thu, Jul 16, 2020 at 07:00:08AM -0400, Jeff King wrote:

> > > To avoid mistakes, continue to forbid repository format upgrades in v0
> > > repositories with an unrecognized extension.  This way, a v0 user
> > > using a misspelled extension field gets a chance to correct the
> > > mistake before updating to the less forgiving v1 format.
> > 
> > This needs to be managed carefully.  When the next extension is
> > added to the codebase, that extension may be "known" to Git, but I
> > do not think it is a good idea to honor it in v0 repository, or
> > allow upgrading v0 repository to v1 with such an extension that
> > weren't "known" to Git.  For example, a topic in flight adds
> > objectformat extension and I do not think it should be honored in v0
> > repository.
> > 
> > Having said that, the approach is OK for now at the tip of tonight's
> > master, but the point is "known" vs "unknown" must be fixed right
> > with some means.  E.g. tell people to throw the "new" extensions to
> > the list of "unknown extensions" in check_repo_format() when they
> > add new ones, or something.
> 
> Yeah, I agree with this line of reasoning. I'd prefer to see it
> addressed now, so that we don't have to remember to do anything later.
> I.e., for this patch to put the existing known extensions into the
> "good" list for v0, locking it into place forever, and leaving the
> objectformat topic with nothing particular it needs to do.
> 
> But in the name of -rc1 expediency, I'm also OK moving forward with this
> for now.

Hmm, this is actually a bit trickier than I expected because of the way
the code is written. It's much easier to complain about extensions in a
v0 repository than it is to ignore them. But I'm not sure if that isn't
the right way to go anyway.

The patch I came up with is below (and goes on top of Jonathan's). Even
if we decide this is the right direction, it can definitely happen
post-v2.28.

-- >8 --
Subject: verify_repository_format(): complain about new extensions in v0 repo

We made the mistake in the past of respecting extensions.* even when the
repository format version was set to 0. This is bad because forgetting
to bump the repository version means that older versions of Git (which
do not know about our extensions) won't complain. I.e., it's not a
problem in itself, but it means your repository is in a state which does
not give you the protection you think you're getting from older
versions.

For compatibility reasons, we are stuck with that decision for existing
extensions. However, we'd prefer not to extend the damage further. We
can do that by catching any newly-added extensions and complaining about
the repository format.

Note that this is a pretty heavy hammer: we'll refuse to work with the
repository at all. A lesser option would be to ignore (possibly with a
warning) any new extensions. But because of the way the extensions are
handled, that puts the burden on each new extension that is added to
remember to "undo" itself (because they are handled before we know
for sure whether we are in a v1 repo or not, since we don't insist on a
particular ordering of config entries).

So one option would be to rewrite that handling to record any new
extensions (and their values) during the config parse, and then only
after proceed to handle new ones only if we're in a v1 repository. But
I'm not sure if it's worth the trouble:

  - ignoring extensions is likely to end up with broken results anyway
    (e.g., ignoring a proposed objectformat extension means parsing any
    object data is likely to encounter errors)

  - this is a sign that whatever tool wrote the extension field is
    broken. We may be better off notifying immediately and forcefully so
    that such tools don't even appear to work accidentally.

The only downside is that fixing the istuation is a little tricky,
because programs like "git config" won't want to work with the
repository. But:

  git config --file=.git/config core.repositoryformatversion 1

should still suffice.

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h                 |  2 +
 setup.c                 | 96 ++++++++++++++++++++++++++++++++++-------
 t/t1302-repo-version.sh |  3 ++
 3 files changed, 85 insertions(+), 16 deletions(-)

diff --git a/cache.h b/cache.h
index 654426460c..0290849c19 100644
--- a/cache.h
+++ b/cache.h
@@ -1044,6 +1044,7 @@ struct repository_format {
 	int hash_algo;
 	char *work_tree;
 	struct string_list unknown_extensions;
+	struct string_list v1_only_extensions;
 };
 
 /*
@@ -1057,6 +1058,7 @@ struct repository_format {
 	.is_bare = -1, \
 	.hash_algo = GIT_HASH_SHA1, \
 	.unknown_extensions = STRING_LIST_INIT_DUP, \
+	.v1_only_extensions = STRING_LIST_INIT_DUP, \
 }
 
 /*
diff --git a/setup.c b/setup.c
index 3a81307602..c1480b2b60 100644
--- a/setup.c
+++ b/setup.c
@@ -447,6 +447,54 @@ static int read_worktree_config(const char *var, const char *value, void *vdata)
 	return 0;
 }
 
+enum extension_result {
+	EXTENSION_ERROR = -1, /* compatible with error(), etc */
+	EXTENSION_UNKNOWN = 0,
+	EXTENSION_OK = 1
+};
+
+/*
+ * Do not add new extensions to this function. It handles extensions which are
+ * respected even in v0-format repositories for historical compatibility.
+ */
+enum extension_result handle_extension_v0(const char *var,
+					  const char *value,
+					  const char *ext,
+					  struct repository_format *data)
+{
+		if (!strcmp(ext, "noop")) {
+			return EXTENSION_OK;
+		} else if (!strcmp(ext, "preciousobjects")) {
+			data->precious_objects = git_config_bool(var, value);
+			return EXTENSION_OK;
+		} else if (!strcmp(ext, "partialclone")) {
+			if (!value)
+				return config_error_nonbool(var);
+			data->partial_clone = xstrdup(value);
+			return EXTENSION_OK;
+		} else if (!strcmp(ext, "worktreeconfig")) {
+			data->worktree_config = git_config_bool(var, value);
+			return EXTENSION_OK;
+		}
+
+		return EXTENSION_UNKNOWN;
+}
+
+/*
+ * Record any new extensions in this function.
+ */
+enum extension_result handle_extension(const char *var,
+				       const char *value,
+				       const char *ext,
+				       struct repository_format *data)
+{
+	if (!strcmp(ext, "noop-v1")) {
+		return EXTENSION_OK;
+	}
+
+	return EXTENSION_UNKNOWN;
+}
+
 static int check_repo_format(const char *var, const char *value, void *vdata)
 {
 	struct repository_format *data = vdata;
@@ -455,23 +503,25 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 	if (strcmp(var, "core.repositoryformatversion") == 0)
 		data->version = git_config_int(var, value);
 	else if (skip_prefix(var, "extensions.", &ext)) {
-		/*
-		 * record any known extensions here; otherwise,
-		 * we fall through to recording it as unknown, and
-		 * check_repository_format will complain
-		 */
-		if (!strcmp(ext, "noop"))
-			;
-		else if (!strcmp(ext, "preciousobjects"))
-			data->precious_objects = git_config_bool(var, value);
-		else if (!strcmp(ext, "partialclone")) {
-			if (!value)
-				return config_error_nonbool(var);
-			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
-			data->worktree_config = git_config_bool(var, value);
-		else
+		switch (handle_extension_v0(var, value, ext, data)) {
+		case EXTENSION_ERROR:
+			return -1;
+		case EXTENSION_OK:
+			return 0;
+		case EXTENSION_UNKNOWN:
+			break;
+		}
+
+		switch (handle_extension(var, value, ext, data)) {
+		case EXTENSION_ERROR:
+			return -1;
+		case EXTENSION_OK:
+			string_list_append(&data->v1_only_extensions, ext);
+			return 0;
+		case EXTENSION_UNKNOWN:
 			string_list_append(&data->unknown_extensions, ext);
+			return 0;
+		}
 	}
 
 	return read_worktree_config(var, value, vdata);
@@ -510,6 +560,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
 	set_repository_format_partial_clone(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
+	string_list_clear(&candidate->v1_only_extensions, 0);
 
 	if (repository_format_worktree_config) {
 		/*
@@ -588,6 +639,7 @@ int read_repository_format(struct repository_format *format, const char *path)
 void clear_repository_format(struct repository_format *format)
 {
 	string_list_clear(&format->unknown_extensions, 0);
+	string_list_clear(&format->v1_only_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
 	init_repository_format(format);
@@ -613,6 +665,18 @@ int verify_repository_format(const struct repository_format *format,
 		return -1;
 	}
 
+	if (format->version == 0 && format->v1_only_extensions.nr) {
+		int i;
+
+		strbuf_addstr(err,
+			      _("repo version is 0, but v1-only extensions found:"));
+
+		for (i = 0; i < format->v1_only_extensions.nr; i++)
+			strbuf_addf(err, "\n\t%s",
+				    format->v1_only_extensions.items[i].string);
+		return -1;
+	}
+
 	return 0;
 }
 
diff --git a/t/t1302-repo-version.sh b/t/t1302-repo-version.sh
index d60c042ce8..0acabb6d11 100755
--- a/t/t1302-repo-version.sh
+++ b/t/t1302-repo-version.sh
@@ -87,6 +87,9 @@ allow 1
 allow 1 noop
 abort 1 no-such-extension
 allow 0 no-such-extension
+allow 0 noop
+abort 0 noop-v1
+allow 1 noop-v1
 EOF
 
 test_expect_success 'precious-objects allowed' '
Derrick Stolee July 16, 2020, 12:53 p.m. UTC | #4
On 7/16/2020 8:25 AM, Jeff King wrote:
> On Thu, Jul 16, 2020 at 07:00:08AM -0400, Jeff King wrote:
> 
>>>> To avoid mistakes, continue to forbid repository format upgrades in v0
>>>> repositories with an unrecognized extension.  This way, a v0 user
>>>> using a misspelled extension field gets a chance to correct the
>>>> mistake before updating to the less forgiving v1 format.
>>>
>>> This needs to be managed carefully.  When the next extension is
>>> added to the codebase, that extension may be "known" to Git, but I
>>> do not think it is a good idea to honor it in v0 repository, or
>>> allow upgrading v0 repository to v1 with such an extension that
>>> weren't "known" to Git.  For example, a topic in flight adds
>>> objectformat extension and I do not think it should be honored in v0
>>> repository.
>>>
>>> Having said that, the approach is OK for now at the tip of tonight's
>>> master, but the point is "known" vs "unknown" must be fixed right
>>> with some means.  E.g. tell people to throw the "new" extensions to
>>> the list of "unknown extensions" in check_repo_format() when they
>>> add new ones, or something.
>>
>> Yeah, I agree with this line of reasoning. I'd prefer to see it
>> addressed now, so that we don't have to remember to do anything later.
>> I.e., for this patch to put the existing known extensions into the
>> "good" list for v0, locking it into place forever, and leaving the
>> objectformat topic with nothing particular it needs to do.
>>
>> But in the name of -rc1 expediency, I'm also OK moving forward with this
>> for now.
> 
> Hmm, this is actually a bit trickier than I expected because of the way
> the code is written. It's much easier to complain about extensions in a
> v0 repository than it is to ignore them. But I'm not sure if that isn't
> the right way to go anyway.
> 
> The patch I came up with is below (and goes on top of Jonathan's). Even
> if we decide this is the right direction, it can definitely happen
> post-v2.28.
> 
> -- >8 --
> Subject: verify_repository_format(): complain about new extensions in v0 repo
> 
> We made the mistake in the past of respecting extensions.* even when the
> repository format version was set to 0. This is bad because forgetting
> to bump the repository version means that older versions of Git (which
> do not know about our extensions) won't complain. I.e., it's not a
> problem in itself, but it means your repository is in a state which does
> not give you the protection you think you're getting from older
> versions.
> 
> For compatibility reasons, we are stuck with that decision for existing
> extensions. However, we'd prefer not to extend the damage further. We
> can do that by catching any newly-added extensions and complaining about
> the repository format.
> 
> Note that this is a pretty heavy hammer: we'll refuse to work with the
> repository at all. A lesser option would be to ignore (possibly with a
> warning) any new extensions. But because of the way the extensions are
> handled, that puts the burden on each new extension that is added to
> remember to "undo" itself (because they are handled before we know
> for sure whether we are in a v1 repo or not, since we don't insist on a
> particular ordering of config entries).
> 
> So one option would be to rewrite that handling to record any new
> extensions (and their values) during the config parse, and then only
> after proceed to handle new ones only if we're in a v1 repository. But
> I'm not sure if it's worth the trouble:
> 
>   - ignoring extensions is likely to end up with broken results anyway
>     (e.g., ignoring a proposed objectformat extension means parsing any
>     object data is likely to encounter errors)
> 
>   - this is a sign that whatever tool wrote the extension field is
>     broken. We may be better off notifying immediately and forcefully so
>     that such tools don't even appear to work accidentally.
> 
> The only downside is that fixing the istuation is a little tricky,

s/istuation/situation

> because programs like "git config" won't want to work with the
> repository. But:
> 
>   git config --file=.git/config core.repositoryformatversion 1
> 
> should still suffice.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  cache.h                 |  2 +
>  setup.c                 | 96 ++++++++++++++++++++++++++++++++++-------
>  t/t1302-repo-version.sh |  3 ++
>  3 files changed, 85 insertions(+), 16 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index 654426460c..0290849c19 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1044,6 +1044,7 @@ struct repository_format {
>  	int hash_algo;
>  	char *work_tree;
>  	struct string_list unknown_extensions;
> +	struct string_list v1_only_extensions;
>  };
>  
>  /*
> @@ -1057,6 +1058,7 @@ struct repository_format {
>  	.is_bare = -1, \
>  	.hash_algo = GIT_HASH_SHA1, \
>  	.unknown_extensions = STRING_LIST_INIT_DUP, \
> +	.v1_only_extensions = STRING_LIST_INIT_DUP, \
>  }
>  
>  /*
> diff --git a/setup.c b/setup.c
> index 3a81307602..c1480b2b60 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -447,6 +447,54 @@ static int read_worktree_config(const char *var, const char *value, void *vdata)
>  	return 0;
>  }
>  
> +enum extension_result {
> +	EXTENSION_ERROR = -1, /* compatible with error(), etc */
> +	EXTENSION_UNKNOWN = 0,
> +	EXTENSION_OK = 1
> +};
> +
> +/*
> + * Do not add new extensions to this function. It handles extensions which are
> + * respected even in v0-format repositories for historical compatibility.
> + */
> +enum extension_result handle_extension_v0(const char *var,
> +					  const char *value,
> +					  const char *ext,
> +					  struct repository_format *data)
...
> +/*
> + * Record any new extensions in this function.
> + */
> +enum extension_result handle_extension(const char *var,
> +				       const char *value,
> +				       const char *ext,
> +				       struct repository_format *data)

I like the split between these two methods to make it
really clear the difference between "v0" and "v1".

>  	struct repository_format *data = vdata;
> @@ -455,23 +503,25 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
>  	if (strcmp(var, "core.repositoryformatversion") == 0)
>  		data->version = git_config_int(var, value);
>  	else if (skip_prefix(var, "extensions.", &ext)) {
...
> +		switch (handle_extension_v0(var, value, ext, data)) {
> +		case EXTENSION_ERROR:
> +			return -1;
> +		case EXTENSION_OK:
> +			return 0;
> +		case EXTENSION_UNKNOWN:
> +			break;
> +		}
> +
> +		switch (handle_extension(var, value, ext, data)) {
> +		case EXTENSION_ERROR:
> +			return -1;
> +		case EXTENSION_OK:
> +			string_list_append(&data->v1_only_extensions, ext);
> +			return 0;
> +		case EXTENSION_UNKNOWN:
>  			string_list_append(&data->unknown_extensions, ext);
> +			return 0;
> +		}
>  	}

And it makes this loop much cleaner.
> @@ -613,6 +665,18 @@ int verify_repository_format(const struct repository_format *format,
>  		return -1;
>  	}
>  
> +	if (format->version == 0 && format->v1_only_extensions.nr) {
> +		int i;
> +
> +		strbuf_addstr(err,
> +			      _("repo version is 0, but v1-only extensions found:"));
> +
> +		for (i = 0; i < format->v1_only_extensions.nr; i++)
> +			strbuf_addf(err, "\n\t%s",
> +				    format->v1_only_extensions.items[i].string);
> +		return -1;
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/t/t1302-repo-version.sh b/t/t1302-repo-version.sh
> index d60c042ce8..0acabb6d11 100755
> --- a/t/t1302-repo-version.sh
> +++ b/t/t1302-repo-version.sh
> @@ -87,6 +87,9 @@ allow 1
>  allow 1 noop
>  abort 1 no-such-extension
>  allow 0 no-such-extension
> +allow 0 noop
> +abort 0 noop-v1
> +allow 1 noop-v1

LGTM.

Thanks,
-Stolee
Junio C Hamano July 16, 2020, 4:10 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

> Yeah, I agree with this line of reasoning. I'd prefer to see it
> addressed now, so that we don't have to remember to do anything later.

Very true.  Also the documentation may need some updating,
e.g. "These 4 extensions are honored without adding
repositoryFormatVersion to your repository (as special cases)" to
avoid further confusion e.g. "why doesn't my objectFormat=SHA-3 does
not take effect by itself?".

> I.e., for this patch to put the existing known extensions into the
> "good" list for v0, locking it into place forever, and leaving the
> objectformat topic with nothing particular it needs to do.
>
> But in the name of -rc1 expediency, I'm also OK moving forward with this
> for now.

I'm OK, too, as I said.

I'd need to kick out bc/sha-2-part-3 topic out of my tree while that
infrastructure is in place on top of these two patches, though.

Thanks.
Junio C Hamano July 16, 2020, 4:32 p.m. UTC | #6
Jeff King <peff@peff.net> writes:

> Hmm, this is actually a bit trickier than I expected because of the way
> the code is written. It's much easier to complain about extensions in a
> v0 repository than it is to ignore them. But I'm not sure if that isn't
> the right way to go anyway.
>
> The patch I came up with is below (and goes on top of Jonathan's). Even
> if we decide this is the right direction, it can definitely happen
> post-v2.28.

It must happen already in 'seen' if we want to keep bc/sha-2-part-3
with us, though ;-)

> So one option would be to rewrite that handling to record any new
> extensions (and their values) during the config parse, and then only
> after proceed to handle new ones only if we're in a v1 repository.

I do not think it would be too bad for read_repository_format() to
call git_config_from_file() to collect extensions.* in a string list
while looking for core.repositoryformatversion.  Then the function
can iterate over the string list to call check_repo_format() itself.

> I'm not sure if it's worth the trouble:
>
>   - ignoring extensions is likely to end up with broken results anyway
>     (e.g., ignoring a proposed objectformat extension means parsing any
>     object data is likely to encounter errors)

The primary reason why the safety calls for ignore/reject is the
namespace collision.  We may decide to use extensions.objectformat
to record what hash algorithms are used for objects in the
repository, while the end user and their tools may use it for
totally different purpose (perhaps they have a custom script around
"git repack" that reads the variable to learn what command line
options e.g. --window=800 to pass).  A new version of Git that
supports SHA-2 will suddenly break their configuration, when the
users are 100% happy with the current SHA-1 system, with the way
their tool uses extensions.objectformat configuration variable and a
newer version of Git that happens to know how to also work with SHA-2,
using their v0 repository.

And the reasoning 'ignoring would make the problem get noticed
anyway' does not apply to such users at all, does it?

We need to declare that any names under "extensions.*" is off limits
by end users regardless and write it in big flasing red letters if
we haven't already done so.  It is enforced in v1 repositories by
dying upon seeing an unrecognised extension, but not entirely.  When
the users are lucky, a known-but-name-collided extension may stop
with a type error (e.g. "extensions.objectformat=frotz" may say
"frotz is not among the accepted hash algorithms") but that is not
guaranteed.  In v0 repositories, enforcing it after the fact would
cause the same trouble as the tightening caused.
Junio C Hamano July 16, 2020, 4:49 p.m. UTC | #7
Jeff King <peff@peff.net> writes:

> Subject: verify_repository_format(): complain about new extensions in v0 repo
>
> We made the mistake in the past of respecting extensions.* even when the
> repository format version was set to 0. This is bad because forgetting
> to bump the repository version means that older versions of Git (which
> do not know about our extensions) won't complain. I.e., it's not a
> problem in itself, but it means your repository is in a state which does
> not give you the protection you think you're getting from older
> versions.
>
> For compatibility reasons, we are stuck with that decision for existing
> extensions. However, we'd prefer not to extend the damage further. We
> can do that by catching any newly-added extensions and complaining about
> the repository format.

Looking good overall, but I needed this to build from the source.

Thanks.

 setup.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/setup.c b/setup.c
index e29659b7b9..e69bd28ed6 100644
--- a/setup.c
+++ b/setup.c
@@ -457,10 +457,10 @@ enum extension_result {
  * Do not add new extensions to this function. It handles extensions which are
  * respected even in v0-format repositories for historical compatibility.
  */
-enum extension_result handle_extension_v0(const char *var,
-					  const char *value,
-					  const char *ext,
-					  struct repository_format *data)
+static enum extension_result handle_extension_v0(const char *var,
+						 const char *value,
+						 const char *ext,
+						 struct repository_format *data)
 {
 		if (!strcmp(ext, "noop")) {
 			return EXTENSION_OK;
@@ -483,10 +483,10 @@ enum extension_result handle_extension_v0(const char *var,
 /*
  * Record any new extensions in this function.
  */
-enum extension_result handle_extension(const char *var,
-				       const char *value,
-				       const char *ext,
-				       struct repository_format *data)
+static enum extension_result handle_extension(const char *var,
+					      const char *value,
+					      const char *ext,
+					      struct repository_format *data)
 {
 	if (!strcmp(ext, "noop-v1")) {
 		return EXTENSION_OK;
Jeff King July 16, 2020, 4:53 p.m. UTC | #8
On Thu, Jul 16, 2020 at 09:32:35AM -0700, Junio C Hamano wrote:

> We need to declare that any names under "extensions.*" is off limits
> by end users regardless and write it in big flasing red letters if
> we haven't already done so.

I thought this was already well-understood, and it was definitely part
of the plan since 2015. Are other tools really sticking stuff in
extensions.* that we don't know about?

-Peff
Jeff King July 16, 2020, 4:56 p.m. UTC | #9
On Thu, Jul 16, 2020 at 09:49:14AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Subject: verify_repository_format(): complain about new extensions in v0 repo
> >
> > We made the mistake in the past of respecting extensions.* even when the
> > repository format version was set to 0. This is bad because forgetting
> > to bump the repository version means that older versions of Git (which
> > do not know about our extensions) won't complain. I.e., it's not a
> > problem in itself, but it means your repository is in a state which does
> > not give you the protection you think you're getting from older
> > versions.
> >
> > For compatibility reasons, we are stuck with that decision for existing
> > extensions. However, we'd prefer not to extend the damage further. We
> > can do that by catching any newly-added extensions and complaining about
> > the repository format.
> 
> Looking good overall, but I needed this to build from the source.

Oof, thanks. I did this as a one-off not even on a branch, and my
config.mak magic loosens -Werror in that case (because usually a
detached HEAD means I'm investigating some old commit, and quite a few
of them don't build without warnings these days).

Thankfully it seems I only managed a minor error without the compiler
there to help me. :)

-Peff
Junio C Hamano July 16, 2020, 8:27 p.m. UTC | #10
Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> Hmm, this is actually a bit trickier than I expected because of the way
>> the code is written. It's much easier to complain about extensions in a
>> v0 repository than it is to ignore them. But I'm not sure if that isn't
>> the right way to go anyway.
>>
>> The patch I came up with is below (and goes on top of Jonathan's). Even
>> if we decide this is the right direction, it can definitely happen
>> post-v2.28.
>
> It must happen already in 'seen' if we want to keep bc/sha-2-part-3
> with us, though ;-)

FWIW, I needed to adjust t0001 while merging the SHA-2 topic.  The
internal use of "git config" via test_config triggers the "this is
not a Git repository as the value of repositoryformatversion and the
defined set of extensions are incompatible".

diff --cc t/t0001-init.sh
index 6d2467995e,34d2064660..ff538c0eed
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
 ...
 -test_expect_success 'extensions.objectFormat is not honored with repo version 0' '
++test_expect_success 'extensions.objectFormat would cause an error in repo version 0' '
+ 	git init --object-format=sha256 explicit-v0 &&
 -	test_config -C explicit-v0 core.repositoryformatversion 0 &&
 -	git -C explicit-v0 rev-parse --show-object-format >actual &&
 -	echo sha1 >expected &&
 -	test_cmp expected actual
++	v=$(git config --file=explicit-v0/.git/config core.repositoryformatversion 0) &&
++	test_when_finished "
++		git config --file=explicit-v0/.git/config core.repositoryformatversion $v
++	" &&
++	git config --file=explicit-v0/.git/config core.repositoryformatversion 0 &&
++	test_must_fail git -C explicit-v0 rev-parse --show-object-format >actual
+ '
Jonathan Nieder July 16, 2020, 10:37 p.m. UTC | #11
(replying from vacation; back tomorrow)
Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:

>> Yeah, I agree with this line of reasoning. I'd prefer to see it
>> addressed now, so that we don't have to remember to do anything later.
>
> Very true.  Also the documentation may need some updating,
> e.g. "These 4 extensions are honored without adding
> repositoryFormatVersion to your repository (as special cases)" to
> avoid further confusion e.g. "why doesn't my objectFormat=SHA-3 does
> not take effect by itself?".

Yes, I agree, especially about documentation.

For 2.29, I would like to do or see the following:

- defining the list of repository format v0 supported extensions as
  "these and no more", futureproofing along the lines suggested in
  Peff's patch.  I like the general approach taken there since it
  allows parsing the relevant config in a single pass, so I think
  it basically takes the right approach.  (That said, it might be
  possible to simplify a bit with further changes, e.g. by using the
  configset API.)

  When doing this for real, we'd want to document the set of
  supported extensions.  That is especially useful to independent
  implementers wanting to support Git's formats, since it tells
  them "this is the minimum set of extensions that you must
  either handle or error out cleanly on to maintain compatibility
  with Git's repository format v0".
 
- improving the behavior when an extension not supported in v0 is
  encountered in a v0 repository.  For extensions that are supported
  in v1 and not v0, we should presumably error out so the user can
  repair the repository, and we can put the "noop" extension in that
  category for the sake of easy testing.  We can also include a check
  in "git fsck" for repositories that request the undefined behavior
  of v0 repositories with non-v0 extensions, for faster diagnosis.

  What about unrecognized extensions that are potentially extensions
  yet to be defined?  Should these be silently ignored to match the
  historical behavior, or should we error out even in repository
  format v0?  I lean toward the latter; we'll need to be cautious,
  though, e.g. by making this a separate patch so we can easily tweak
  it if this ends up being disruptive in some unanticipated way.

- making "git init" use repository format v1 by default.  It's been
  long enough that users can count on Git implementations supporting
  it.  This way, users are less likely to run into v0+extensions
  confusion, just because users are less likely to be using v0.

Does that sound like a good plan to others?  If so, are there any
steps beyond the two first patches in jn/v0-with-extensions-fix that
we would want in order to prepare for it in 2.28?

My preference would be to move forward in 2.28 with the first two
patches in that topic branch (i.e., *not* the third yet), since they
don't produce any user facing behavior that would create danger for
users or clash with this plan.  Today, the only extensions we
recognize are in that set of extensions that we'll want to continue to
recognize in v0 (except possibly the for-testing extension "noop").
The steps to take with additional extensions are more subtle so it
seems reasonable for them to bake in "next" and then "master" for a
2.29 release.

Thanks,
Jonathan
Junio C Hamano July 16, 2020, 11:50 p.m. UTC | #12
Jonathan Nieder <jrnieder@gmail.com> writes:

> - defining the list of repository format v0 supported extensions as
>   "these and no more", futureproofing along the lines suggested in
>   Peff's patch.  I like the general approach taken there since it
>   allows parsing the relevant config in a single pass, so I think
>   it basically takes the right approach.  (That said, it might be
>   possible to simplify a bit with further changes, e.g. by using the
>   configset API.)
>
>   When doing this for real, we'd want to document the set of
>   supported extensions.  That is especially useful to independent
>   implementers wanting to support Git's formats, since it tells
>   them "this is the minimum set of extensions that you must
>   either handle or error out cleanly on to maintain compatibility
>   with Git's repository format v0".

Good.

> - improving the behavior when an extension not supported in v0 is
>   encountered in a v0 repository.  For extensions that are supported
>   in v1 and not v0, we should presumably error out so the user can
>   repair the repository, and we can put the "noop" extension in that
>   category for the sake of easy testing.  We can also include a check
>   in "git fsck" for repositories that request the undefined behavior
>   of v0 repositories with non-v0 extensions, for faster diagnosis.
>
>   What about unrecognized extensions that are potentially extensions
>   yet to be defined?  Should these be silently ignored to match the
>   historical behavior, or should we error out even in repository
>   format v0?  I lean toward the latter; we'll need to be cautious,
>   though, e.g. by making this a separate patch so we can easily tweak
>   it if this ends up being disruptive in some unanticipated way.

I disagree with your first paragraph.  Those that weren't honored by
mistake back in v0 days, in addition to those that aren't known to us
even now, should just be silently ignored, not causing an error.

> - making "git init" use repository format v1 by default.  It's been
>   long enough that users can count on Git implementations supporting
>   it.  This way, users are less likely to run into v0+extensions
>   confusion, just because users are less likely to be using v0.

Absolutely.  I would think this is a very good move.

> Does that sound like a good plan to others?  If so, are there any
> steps beyond the two first patches in jn/v0-with-extensions-fix that
> we would want in order to prepare for it in 2.28?
>
> My preference would be to move forward in 2.28 with the first two
> patches in that topic branch (i.e., *not* the third yet), since they
> don't produce any user facing behavior that would create danger for
> users or clash with this plan.

Yup, I agree.  I'd give another name to the third commit and then
rewind jn/v0-with-extensions-fix by one to prevent mistakes from
happening.  Thanks.
Jeff King July 17, 2020, 3:22 p.m. UTC | #13
On Thu, Jul 16, 2020 at 03:37:19PM -0700, Jonathan Nieder wrote:

> For 2.29, I would like to do or see the following:
> 
> - defining the list of repository format v0 supported extensions as
>   "these and no more", futureproofing along the lines suggested in
>   Peff's patch.  I like the general approach taken there since it
>   allows parsing the relevant config in a single pass, so I think
>   it basically takes the right approach.  (That said, it might be
>   possible to simplify a bit with further changes, e.g. by using the
>   configset API.)
> 
>   When doing this for real, we'd want to document the set of
>   supported extensions.  That is especially useful to independent
>   implementers wanting to support Git's formats, since it tells
>   them "this is the minimum set of extensions that you must
>   either handle or error out cleanly on to maintain compatibility
>   with Git's repository format v0".

I think we should still consider people setting v0 along with extensions
to be a bug. It was never documented to work that way and we are being
nice by keeping the existing behavior, but it is still wrong (and
pre-extension versions of Git will silently ignore them). I don't mind
making other implementers aware of the special status, but we should be
careful not to endorse the broken setup.

> - making "git init" use repository format v1 by default.  It's been
>   long enough that users can count on Git implementations supporting
>   it.  This way, users are less likely to run into v0+extensions
>   confusion, just because users are less likely to be using v0.

That's probably reasonable. It will be mildly annoying for people like
me who are often testing old versions of Git, but I'm sure I will
survive.

We should make sure that all major implementations handle v1 reasonably
first, though (and that they did so long enough ago that it's not likely
to cause problems).

> My preference would be to move forward in 2.28 with the first two
> patches in that topic branch (i.e., *not* the third yet), since they
> don't produce any user facing behavior that would create danger for
> users or clash with this plan.  Today, the only extensions we
> recognize are in that set of extensions that we'll want to continue to
> recognize in v0 (except possibly the for-testing extension "noop").
> The steps to take with additional extensions are more subtle so it
> seems reasonable for them to bake in "next" and then "master" for a
> 2.29 release.

I'm OK with the plan to ship the first two patches for 2.28, and leave
my patch for later (or even soften it from "die" to "ignore with a
warning").

I think leaving "noop" in that set of special extensions makes sense,
since it lets us test that case easily (and I added a "noop-v1" in my
patch to test the other one; clearly we could also flip it and have
noop-v0).

-Peff
Jeff King July 17, 2020, 3:27 p.m. UTC | #14
On Thu, Jul 16, 2020 at 04:50:34PM -0700, Junio C Hamano wrote:

> > - improving the behavior when an extension not supported in v0 is
> >   encountered in a v0 repository.  For extensions that are supported
> >   in v1 and not v0, we should presumably error out so the user can
> >   repair the repository, and we can put the "noop" extension in that
> >   category for the sake of easy testing.  We can also include a check
> >   in "git fsck" for repositories that request the undefined behavior
> >   of v0 repositories with non-v0 extensions, for faster diagnosis.
> >
> >   What about unrecognized extensions that are potentially extensions
> >   yet to be defined?  Should these be silently ignored to match the
> >   historical behavior, or should we error out even in repository
> >   format v0?  I lean toward the latter; we'll need to be cautious,
> >   though, e.g. by making this a separate patch so we can easily tweak
> >   it if this ends up being disruptive in some unanticipated way.
> 
> I disagree with your first paragraph.  Those that weren't honored by
> mistake back in v0 days, in addition to those that aren't known to us
> even now, should just be silently ignored, not causing an error.

That's very much the opposite of my patch.  As we add new extensions,
those "unknowns" will start to die().

I remain unconvinced that there are a bunch of unknown extension.*
config options hanging around in the wild, but maybe I'm being naive.
It seems to me more likely that users will be helped by warning about
extensions that _should_ have had v1 set than that they will be harmed
because they put random crap in their extensions.* config. But maybe you
know of a specific example?

Anyway, if we move to "v1" as the default for "git init" anyway, then
the number of people being helped would become much smaller.

> > My preference would be to move forward in 2.28 with the first two
> > patches in that topic branch (i.e., *not* the third yet), since they
> > don't produce any user facing behavior that would create danger for
> > users or clash with this plan.
> 
> Yup, I agree.  I'd give another name to the third commit and then
> rewind jn/v0-with-extensions-fix by one to prevent mistakes from
> happening.  Thanks.

OK. I was confused to see it still at the tip in the latest What's
Cooking, but I think we're just crossing emails. :)

-Peff
Junio C Hamano July 17, 2020, 5:07 p.m. UTC | #15
Jeff King <peff@peff.net> writes:

> Anyway, if we move to "v1" as the default for "git init" anyway, then
> the number of people being helped would become much smaller.

Yup. So in that sense I do not think I care too deeply either way.

>> > My preference would be to move forward in 2.28 with the first two
>> > patches in that topic branch (i.e., *not* the third yet), since they
>> > don't produce any user facing behavior that would create danger for
>> > users or clash with this plan.
>> 
>> Yup, I agree.  I'd give another name to the third commit and then
>> rewind jn/v0-with-extensions-fix by one to prevent mistakes from
>> happening.  Thanks.
>
> OK. I was confused to see it still at the tip in the latest What's
> Cooking, but I think we're just crossing emails. :)

Yes.
diff mbox series

Patch

diff --git a/cache.h b/cache.h
index 126ec56c7f3..654426460cc 100644
--- a/cache.h
+++ b/cache.h
@@ -1042,7 +1042,6 @@  struct repository_format {
 	int worktree_config;
 	int is_bare;
 	int hash_algo;
-	int has_extensions;
 	char *work_tree;
 	struct string_list unknown_extensions;
 };
diff --git a/setup.c b/setup.c
index 87bf0112cf3..3a81307602e 100644
--- a/setup.c
+++ b/setup.c
@@ -455,7 +455,6 @@  static int check_repo_format(const char *var, const char *value, void *vdata)
 	if (strcmp(var, "core.repositoryformatversion") == 0)
 		data->version = git_config_int(var, value);
 	else if (skip_prefix(var, "extensions.", &ext)) {
-		data->has_extensions = 1;
 		/*
 		 * record any known extensions here; otherwise,
 		 * we fall through to recording it as unknown, and
@@ -553,13 +552,16 @@  int upgrade_repository_format(int target_version)
 	if (repo_fmt.version >= target_version)
 		return 0;
 
-	if (verify_repository_format(&repo_fmt, &err) < 0 ||
-	    (!repo_fmt.version && repo_fmt.has_extensions)) {
-		warning("unable to upgrade repository format from %d to %d: %s",
-			repo_fmt.version, target_version, err.buf);
+	if (verify_repository_format(&repo_fmt, &err) < 0) {
+		error("cannot upgrade repository format from %d to %d: %s",
+		      repo_fmt.version, target_version, err.buf);
 		strbuf_release(&err);
 		return -1;
 	}
+	if (!repo_fmt.version && repo_fmt.unknown_extensions.nr)
+		return error("cannot upgrade repository format: "
+			     "unknown extension %s",
+			     repo_fmt.unknown_extensions.items[0].string);
 
 	strbuf_addf(&repo_version, "%d", target_version);
 	git_config_set("core.repositoryformatversion", repo_version.buf);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 51d1eba6050..6aa0f313bdd 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -42,7 +42,7 @@  test_expect_success 'convert shallow clone to partial clone' '
 	test_cmp_config -C client 1 core.repositoryformatversion
 '
 
-test_expect_success 'converting to partial clone fails with noop extension' '
+test_expect_success 'convert to partial clone with noop extension' '
 	rm -fr server client &&
 	test_create_repo server &&
 	test_commit -C server my_commit 1 &&
@@ -50,7 +50,7 @@  test_expect_success 'converting to partial clone fails with noop extension' '
 	git clone --depth=1 "file://$(pwd)/server" client &&
 	test_cmp_config -C client 0 core.repositoryformatversion &&
 	git -C client config extensions.noop true &&
-	test_must_fail git -C client fetch --unshallow --filter="blob:none"
+	git -C client fetch --unshallow --filter="blob:none"
 '
 
 test_expect_success 'converting to partial clone fails with unrecognized extension' '