diff mbox series

setup: Only allow extenions.objectFormat to be specified once

Message ID 87h6ngapqb.fsf@gmail.froward.int.ebiederm.org (mailing list archive)
State New, archived
Headers show
Series setup: Only allow extenions.objectFormat to be specified once | expand

Commit Message

Eric W. Biederman Sept. 26, 2023, 4:01 p.m. UTC
Today there is no sanity checking of what happens when
extensions.objectFormat is specified multiple times.  Catch confused git
configurations by only allowing this option to be specified once.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 setup.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Junio C Hamano Sept. 26, 2023, 9:37 p.m. UTC | #1
"Eric W. Biederman" <ebiederm@gmail.com> writes:

> Today there is no sanity checking of what happens when
> extensions.objectFormat is specified multiple times.  Catch confused git
> configurations by only allowing this option to be specified once.

Hmph.  I am not sure if this is worth doing, and especially only for
"objectformat".  Do we intend to apply different rules other than
"you can give it only once" to other extensions, and if so where
will these rules be catalogued?  I do not see particular harm to let
them follow the usual "last one wins".

If the patch were about trying to make sure that extensions, which
are inherentaly per-repository, appear only in $GIT_DIR/config and
complain if the code gets confused and tried to read them from the
system or global configuration files, I would understand, and
strongly support such an effort, ithough.

The real sanity we want to enforce is that what is reported by
running "git config extensions.objectformat" must match the object
format that is used in refs and object database.  Manually futzing
the configuration file and adding an entry with a contradictory
value certainly is one way to break that sanity, and this patch may
catch such a breakage, but once we start worrying about manually
futzing the configuration file, the check added here would easily
miss if the futzing is done by replacing instead of adding, so I am
not sure if this extra code is worth its bits.

But perhaps I am missing something and not seeing why it is worth
insisting on "last one is the first one" for this particular one.

Thanks.

> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  setup.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/setup.c b/setup.c
> index 18927a847b86..ef9f79b8885e 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -580,6 +580,7 @@ static enum extension_result handle_extension(const char *var,
>  	if (!strcmp(ext, "noop-v1")) {
>  		return EXTENSION_OK;
>  	} else if (!strcmp(ext, "objectformat")) {
> +		struct string_list_item *item;
>  		int format;
>  
>  		if (!value)
> @@ -588,6 +589,13 @@ static enum extension_result handle_extension(const char *var,
>  		if (format == GIT_HASH_UNKNOWN)
>  			return error(_("invalid value for '%s': '%s'"),
>  				     "extensions.objectformat", value);
> +		/* Only support objectFormat being specified once. */
> +		for_each_string_list_item(item, &data->v1_only_extensions) {
> +			if (!strcmp(item->string, "objectformat"))
> +				return error(_("'%s' already specified as '%s'"),
> +					"extensions.objectformat",
> +					hash_algos[data->hash_algo].name);
> +		}
>  		data->hash_algo = format;
>  		return EXTENSION_OK;
>  	}
Eric W. Biederman Sept. 27, 2023, 1:11 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> Today there is no sanity checking of what happens when
>> extensions.objectFormat is specified multiple times.  Catch confused git
>> configurations by only allowing this option to be specified once.
>
> Hmph.  I am not sure if this is worth doing, and especially only for
> "objectformat".  Do we intend to apply different rules other than
> "you can give it only once" to other extensions, and if so where
> will these rules be catalogued?  I do not see particular harm to let
> them follow the usual "last one wins".
>
> If the patch were about trying to make sure that extensions, which
> are inherentaly per-repository, appear only in $GIT_DIR/config and
> complain if the code gets confused and tried to read them from the
> system or global configuration files, I would understand, and
> strongly support such an effort, ithough.

Unless I misread something, the code only checks for [extensions]
in $GIT_DIR/config.

> The real sanity we want to enforce is that what is reported by
> running "git config extensions.objectformat" must match the object
> format that is used in refs and object database.

Agreed.  Allowing git config extensions.objectformat to change the
existing value is allowing the repository to be corrupted.

> Manually futzing
> the configuration file and adding an entry with a contradictory
> value certainly is one way to break that sanity, and this patch may
> catch such a breakage, but once we start worrying about manually
> futzing the configuration file, the check added here would easily
> miss if the futzing is done by replacing instead of adding, so I am
> not sure if this extra code is worth its bits.
>
> But perhaps I am missing something and not seeing why it is worth
> insisting on "last one is the first one" for this particular one.

I somewhat have blinders on.  There are 3 configuration options I am
concerned with:

extensions.objectFormat
extensions.compatObjectFormat
core.historicObjectFormat (or whatever name we settle on).

One key concern I heard expressed in earlier reviews is that however we
handle these options we handle them in such a way as to give ourselves
room to rise to challenges in the future.

Whatever we do with parsing we have the following logical
constraints:

For extensions.objectFormat: There can only be a single storage hash.

For extensions.compatObjectFormat: There can be no compatibility hash,
there can be a single compatibility hash, and depending how things go
between now and the next hash function transition we might want multiple
compatibility hashes.

For core.historicObjectFormat: There can be no historic hash function, there
can be a single historic hash function, there can be multiple historic
hash functions.


For the compatibility hash I think it is unlikely we will want to
support more than one compatibility hash in practice but I can imagine
a scenario where we just get into the transition from SHA-1 to SHA-256
and a serious break is discovered that requires switching to FutureHash
ASAP.

For historic object formats like SHA-1 will become post transition there
are references embedded in commit comments, email messages, bug
trackers.  All kinds of places that we can not update so there is
fundamentally a need to be able to find which current objects correspond
to the historic names.  For a project each hash function transition will
create more such objects.


When I looked I saw two ways within current git to specify a list of
values for a single configuration option.
- Give that option multiple times.
- Parse the option value in such a way as to generate a list.

It is my sense just specifying the compatObjectFormat multiple times to
specify multiple compatibility object formats makes the most sense.
Especially as all is needed today is to only allow a single value.

After I had implemented the only allow once logic for compatObjectFormat
I saw that objectFormat had nothing similar, and knowing it is a bug
for multiple objectFormat wrote a patch to enforce only appear once
for objectFormat as well.

For objectFormat I don't care very much.  For compatObjectFormat I truly
care, and for even more for the option that allows finding the current
object from a historic oid (even a truncated one) I care very much.

For me the fundamental question is if we allow multiples compatibility
hashes or historical hashes how do we specify them?  Have the option
appear more than once?  A comma separated list?

Whatever we decided I want to enforce that doesn't appear in current
configurations so we can support for multiples later.

Eric

>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  setup.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/setup.c b/setup.c
>> index 18927a847b86..ef9f79b8885e 100644
>> --- a/setup.c
>> +++ b/setup.c
>> @@ -580,6 +580,7 @@ static enum extension_result handle_extension(const char *var,
>>  	if (!strcmp(ext, "noop-v1")) {
>>  		return EXTENSION_OK;
>>  	} else if (!strcmp(ext, "objectformat")) {
>> +		struct string_list_item *item;
>>  		int format;
>>  
>>  		if (!value)
>> @@ -588,6 +589,13 @@ static enum extension_result handle_extension(const char *var,
>>  		if (format == GIT_HASH_UNKNOWN)
>>  			return error(_("invalid value for '%s': '%s'"),
>>  				     "extensions.objectformat", value);
>> +		/* Only support objectFormat being specified once. */
>> +		for_each_string_list_item(item, &data->v1_only_extensions) {
>> +			if (!strcmp(item->string, "objectformat"))
>> +				return error(_("'%s' already specified as '%s'"),
>> +					"extensions.objectformat",
>> +					hash_algos[data->hash_algo].name);
>> +		}
>>  		data->hash_algo = format;
>>  		return EXTENSION_OK;
>>  	}
Junio C Hamano Sept. 27, 2023, 7:56 p.m. UTC | #3
"Eric W. Biederman" <ebiederm@gmail.com> writes:

> For me the fundamental question is if we allow multiples compatibility
> hashes or historical hashes how do we specify them?  Have the option
> appear more than once?  A comma separated list?

As you found out, we tend to use both, but the former does look more
natural to me.

The "usual" pros and cons [*] involve how easy it is to override the
settings given by more general low-priority configuration files with
more specific high-priority configuration files, and does not apply
to the extensions.* stuff that are by definition repository
specific.


[Footnote]

As I said, this does not apply to the topic of this discussion, but
just for completeness:

 * comma separated list allows overriding everything that was said
   earlier wholesale; there is no ambiguity, which is a plus, but
   there is no incremental updates, which may be a minus when
   flexibility is desired.

 * multi-valued configuration variable allows incremental additions,
   but ad-hoc syntax needs to be invented if incremental
   subtractions or clearing the slate to start from scratch is
   needed.
diff mbox series

Patch

diff --git a/setup.c b/setup.c
index 18927a847b86..ef9f79b8885e 100644
--- a/setup.c
+++ b/setup.c
@@ -580,6 +580,7 @@  static enum extension_result handle_extension(const char *var,
 	if (!strcmp(ext, "noop-v1")) {
 		return EXTENSION_OK;
 	} else if (!strcmp(ext, "objectformat")) {
+		struct string_list_item *item;
 		int format;
 
 		if (!value)
@@ -588,6 +589,13 @@  static enum extension_result handle_extension(const char *var,
 		if (format == GIT_HASH_UNKNOWN)
 			return error(_("invalid value for '%s': '%s'"),
 				     "extensions.objectformat", value);
+		/* Only support objectFormat being specified once. */
+		for_each_string_list_item(item, &data->v1_only_extensions) {
+			if (!strcmp(item->string, "objectformat"))
+				return error(_("'%s' already specified as '%s'"),
+					"extensions.objectformat",
+					hash_algos[data->hash_algo].name);
+		}
 		data->hash_algo = format;
 		return EXTENSION_OK;
 	}