Message ID | 87h6ngapqb.fsf@gmail.froward.int.ebiederm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | setup: Only allow extenions.objectFormat to be specified once | expand |
"Eric W. Biederman" <ebiederm@gmail.com> writes: > Today there is no sanity checking of what happens when > extensions.objectFormat is specified multiple times. Catch confused git > configurations by only allowing this option to be specified once. Hmph. I am not sure if this is worth doing, and especially only for "objectformat". Do we intend to apply different rules other than "you can give it only once" to other extensions, and if so where will these rules be catalogued? I do not see particular harm to let them follow the usual "last one wins". If the patch were about trying to make sure that extensions, which are inherentaly per-repository, appear only in $GIT_DIR/config and complain if the code gets confused and tried to read them from the system or global configuration files, I would understand, and strongly support such an effort, ithough. The real sanity we want to enforce is that what is reported by running "git config extensions.objectformat" must match the object format that is used in refs and object database. Manually futzing the configuration file and adding an entry with a contradictory value certainly is one way to break that sanity, and this patch may catch such a breakage, but once we start worrying about manually futzing the configuration file, the check added here would easily miss if the futzing is done by replacing instead of adding, so I am not sure if this extra code is worth its bits. But perhaps I am missing something and not seeing why it is worth insisting on "last one is the first one" for this particular one. Thanks. > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > setup.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/setup.c b/setup.c > index 18927a847b86..ef9f79b8885e 100644 > --- a/setup.c > +++ b/setup.c > @@ -580,6 +580,7 @@ static enum extension_result handle_extension(const char *var, > if (!strcmp(ext, "noop-v1")) { > return EXTENSION_OK; > } else if (!strcmp(ext, "objectformat")) { > + struct string_list_item *item; > int format; > > if (!value) > @@ -588,6 +589,13 @@ static enum extension_result handle_extension(const char *var, > if (format == GIT_HASH_UNKNOWN) > return error(_("invalid value for '%s': '%s'"), > "extensions.objectformat", value); > + /* Only support objectFormat being specified once. */ > + for_each_string_list_item(item, &data->v1_only_extensions) { > + if (!strcmp(item->string, "objectformat")) > + return error(_("'%s' already specified as '%s'"), > + "extensions.objectformat", > + hash_algos[data->hash_algo].name); > + } > data->hash_algo = format; > return EXTENSION_OK; > }
Junio C Hamano <gitster@pobox.com> writes: > "Eric W. Biederman" <ebiederm@gmail.com> writes: > >> Today there is no sanity checking of what happens when >> extensions.objectFormat is specified multiple times. Catch confused git >> configurations by only allowing this option to be specified once. > > Hmph. I am not sure if this is worth doing, and especially only for > "objectformat". Do we intend to apply different rules other than > "you can give it only once" to other extensions, and if so where > will these rules be catalogued? I do not see particular harm to let > them follow the usual "last one wins". > > If the patch were about trying to make sure that extensions, which > are inherentaly per-repository, appear only in $GIT_DIR/config and > complain if the code gets confused and tried to read them from the > system or global configuration files, I would understand, and > strongly support such an effort, ithough. Unless I misread something, the code only checks for [extensions] in $GIT_DIR/config. > The real sanity we want to enforce is that what is reported by > running "git config extensions.objectformat" must match the object > format that is used in refs and object database. Agreed. Allowing git config extensions.objectformat to change the existing value is allowing the repository to be corrupted. > Manually futzing > the configuration file and adding an entry with a contradictory > value certainly is one way to break that sanity, and this patch may > catch such a breakage, but once we start worrying about manually > futzing the configuration file, the check added here would easily > miss if the futzing is done by replacing instead of adding, so I am > not sure if this extra code is worth its bits. > > But perhaps I am missing something and not seeing why it is worth > insisting on "last one is the first one" for this particular one. I somewhat have blinders on. There are 3 configuration options I am concerned with: extensions.objectFormat extensions.compatObjectFormat core.historicObjectFormat (or whatever name we settle on). One key concern I heard expressed in earlier reviews is that however we handle these options we handle them in such a way as to give ourselves room to rise to challenges in the future. Whatever we do with parsing we have the following logical constraints: For extensions.objectFormat: There can only be a single storage hash. For extensions.compatObjectFormat: There can be no compatibility hash, there can be a single compatibility hash, and depending how things go between now and the next hash function transition we might want multiple compatibility hashes. For core.historicObjectFormat: There can be no historic hash function, there can be a single historic hash function, there can be multiple historic hash functions. For the compatibility hash I think it is unlikely we will want to support more than one compatibility hash in practice but I can imagine a scenario where we just get into the transition from SHA-1 to SHA-256 and a serious break is discovered that requires switching to FutureHash ASAP. For historic object formats like SHA-1 will become post transition there are references embedded in commit comments, email messages, bug trackers. All kinds of places that we can not update so there is fundamentally a need to be able to find which current objects correspond to the historic names. For a project each hash function transition will create more such objects. When I looked I saw two ways within current git to specify a list of values for a single configuration option. - Give that option multiple times. - Parse the option value in such a way as to generate a list. It is my sense just specifying the compatObjectFormat multiple times to specify multiple compatibility object formats makes the most sense. Especially as all is needed today is to only allow a single value. After I had implemented the only allow once logic for compatObjectFormat I saw that objectFormat had nothing similar, and knowing it is a bug for multiple objectFormat wrote a patch to enforce only appear once for objectFormat as well. For objectFormat I don't care very much. For compatObjectFormat I truly care, and for even more for the option that allows finding the current object from a historic oid (even a truncated one) I care very much. For me the fundamental question is if we allow multiples compatibility hashes or historical hashes how do we specify them? Have the option appear more than once? A comma separated list? Whatever we decided I want to enforce that doesn't appear in current configurations so we can support for multiples later. Eric >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> setup.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/setup.c b/setup.c >> index 18927a847b86..ef9f79b8885e 100644 >> --- a/setup.c >> +++ b/setup.c >> @@ -580,6 +580,7 @@ static enum extension_result handle_extension(const char *var, >> if (!strcmp(ext, "noop-v1")) { >> return EXTENSION_OK; >> } else if (!strcmp(ext, "objectformat")) { >> + struct string_list_item *item; >> int format; >> >> if (!value) >> @@ -588,6 +589,13 @@ static enum extension_result handle_extension(const char *var, >> if (format == GIT_HASH_UNKNOWN) >> return error(_("invalid value for '%s': '%s'"), >> "extensions.objectformat", value); >> + /* Only support objectFormat being specified once. */ >> + for_each_string_list_item(item, &data->v1_only_extensions) { >> + if (!strcmp(item->string, "objectformat")) >> + return error(_("'%s' already specified as '%s'"), >> + "extensions.objectformat", >> + hash_algos[data->hash_algo].name); >> + } >> data->hash_algo = format; >> return EXTENSION_OK; >> }
"Eric W. Biederman" <ebiederm@gmail.com> writes: > For me the fundamental question is if we allow multiples compatibility > hashes or historical hashes how do we specify them? Have the option > appear more than once? A comma separated list? As you found out, we tend to use both, but the former does look more natural to me. The "usual" pros and cons [*] involve how easy it is to override the settings given by more general low-priority configuration files with more specific high-priority configuration files, and does not apply to the extensions.* stuff that are by definition repository specific. [Footnote] As I said, this does not apply to the topic of this discussion, but just for completeness: * comma separated list allows overriding everything that was said earlier wholesale; there is no ambiguity, which is a plus, but there is no incremental updates, which may be a minus when flexibility is desired. * multi-valued configuration variable allows incremental additions, but ad-hoc syntax needs to be invented if incremental subtractions or clearing the slate to start from scratch is needed.
diff --git a/setup.c b/setup.c index 18927a847b86..ef9f79b8885e 100644 --- a/setup.c +++ b/setup.c @@ -580,6 +580,7 @@ static enum extension_result handle_extension(const char *var, if (!strcmp(ext, "noop-v1")) { return EXTENSION_OK; } else if (!strcmp(ext, "objectformat")) { + struct string_list_item *item; int format; if (!value) @@ -588,6 +589,13 @@ static enum extension_result handle_extension(const char *var, if (format == GIT_HASH_UNKNOWN) return error(_("invalid value for '%s': '%s'"), "extensions.objectformat", value); + /* Only support objectFormat being specified once. */ + for_each_string_list_item(item, &data->v1_only_extensions) { + if (!strcmp(item->string, "objectformat")) + return error(_("'%s' already specified as '%s'"), + "extensions.objectformat", + hash_algos[data->hash_algo].name); + } data->hash_algo = format; return EXTENSION_OK; }
Today there is no sanity checking of what happens when extensions.objectFormat is specified multiple times. Catch confused git configurations by only allowing this option to be specified once. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- setup.c | 8 ++++++++ 1 file changed, 8 insertions(+)