Message ID | X9D5SnXca2rGnJFl@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] quote: make sq_dequote_step() a public function | expand |
On Wed, Dec 09, 2020 at 11:20:26AM -0500, Jeff King wrote: > One other side effect I just noticed is that we're very aggressive about > trimming leading and trailing whitespace in the old-style format, but > the new one will store values verbatim. IMHO that's better overall, but > we might consider a preparatory patch to remove that trimming > explicitly. Actually, it looks like we just trim either side of the key. Which is...weird. We've never generated any, and I wouldn't expect people to write: git -c ' some.key = value' And even if they did, then "value" would have extra whitespace. So I don't think this is really changing anything important, though I'm still tempted to do something like the patch below to clean up the reading side (and as a bonus, it gets rid of a strbuf_split call, which is a terrible and awkward interface). diff --git a/config.c b/config.c index 04029e45dc..ede33cf3d0 100644 --- a/config.c +++ b/config.c @@ -516,29 +516,21 @@ static int config_parse_pair(const char *key, const char *value, int git_config_parse_parameter(const char *text, config_fn_t fn, void *data) { - const char *value; - struct strbuf **pair; + char *to_free = NULL; + const char *key, *value; int ret; - pair = strbuf_split_str(text, '=', 2); - if (!pair[0]) - return error(_("bogus config parameter: %s"), text); - - if (pair[0]->len && pair[0]->buf[pair[0]->len - 1] == '=') { - strbuf_setlen(pair[0], pair[0]->len - 1); - value = pair[1] ? pair[1]->buf : ""; + value = strchr(text, '='); + if (value) { + key = to_free = xmemdupz(text, value - text); + value++; } else { - value = NULL; + key = text; } - strbuf_trim(pair[0]); - if (!pair[0]->len) { - strbuf_list_free(pair); - return error(_("bogus config parameter: %s"), text); - } + ret = config_parse_pair(key, value, fn, data); - ret = config_parse_pair(pair[0]->buf, value, fn, data); - strbuf_list_free(pair); + free(to_free); return ret; }
On Wed, Dec 09 2020, Jeff King wrote: > The previous commit added a new format for $GIT_CONFIG_PARAMETERS which > is able to robustly handle subsections with "=" in them. Let's start > writing the new format. Unfortunately, this does much less than you'd > hope, because "git -c" itself has the same ambiguity problem! But it's > still worth doing: > > - we've now pushed the problem from the inter-process communication > into the "-c" command-line parser. This would free us up to later > add an unambiguous format there (e.g., separate arguments like "git > --config key value", etc). > > - for --config-env, the parser already disallows "=" in the > environment variable name. So: > > git --config-env section.with=equals.key=ENVVAR > > will robustly set section.with=equals.key to the contents of > $ENVVAR. > > The new test shows the improvement for --config-env. > > Signed-off-by: Jeff King <peff@peff.net> > --- > One other side effect I just noticed is that we're very aggressive about > trimming leading and trailing whitespace in the old-style format, but > the new one will store values verbatim. IMHO that's better overall, but > we might consider a preparatory patch to remove that trimming > explicitly. > > config.c | 52 ++++++++++++++++++++++++++++++++++++++++------- > t/t1300-config.sh | 8 ++++++++ > 2 files changed, 53 insertions(+), 7 deletions(-) > > diff --git a/config.c b/config.c > index fb160c33d2..04029e45dc 100644 > --- a/config.c > +++ b/config.c > @@ -333,38 +333,76 @@ int git_config_include(const char *var, const char *value, void *data) > return ret; > } > > -void git_config_push_parameter(const char *text) > +static void git_config_push_split_parameter(const char *key, const char *value) > { > struct strbuf env = STRBUF_INIT; > const char *old = getenv(CONFIG_DATA_ENVIRONMENT); > if (old && *old) { > strbuf_addstr(&env, old); > strbuf_addch(&env, ' '); > } > - sq_quote_buf(&env, text); > + sq_quote_buf(&env, key); > + strbuf_addch(&env, '='); > + if (value) > + sq_quote_buf(&env, value); > setenv(CONFIG_DATA_ENVIRONMENT, env.buf, 1); > strbuf_release(&env); > } > > +void git_config_push_parameter(const char *text) > +{ > + const char *value; > + > + /* > + * When we see: > + * > + * section.subsection=with=equals.key=value > + * > + * we cannot tell if it means: > + * > + * [section "subsection=with=equals"] > + * key = value > + * > + * or: > + * > + * [section] > + * subsection = with=equals.key=value > + * > + * We parse left-to-right for the first "=", meaning we'll prefer to > + * keep the value intact over the subsection. This is historical, but > + * also sensible since values are more likely to contain odd or > + * untrusted input than a section name. > + * > + * A missing equals is explicitly allowed (as a bool-only entry). > + */ > + value = strchr(text, '='); > + if (value) { > + char *key = xmemdupz(text, value - text); > + git_config_push_split_parameter(key, value + 1); > + free(key); > + } else { > + git_config_push_split_parameter(text, NULL); > + } > +} > + > void git_config_push_env(const char *spec) > { > - struct strbuf buf = STRBUF_INIT; > + char *key; > const char *env_name; > const char *env_value; > > env_name = strrchr(spec, '='); > if (!env_name) > die("invalid config format: %s", spec); > + key = xmemdupz(spec, env_name - spec); > env_name++; > > env_value = getenv(env_name); > if (!env_value) > die("config variable missing for '%s'", env_name); > > - strbuf_add(&buf, spec, env_name - spec); > - strbuf_addstr(&buf, env_value); > - git_config_push_parameter(buf.buf); > - strbuf_release(&buf); > + git_config_push_split_parameter(key, env_value); > + free(key); > } > > static inline int iskeychar(int c) > diff --git a/t/t1300-config.sh b/t/t1300-config.sh > index bd602e7720..e06961767f 100755 > --- a/t/t1300-config.sh > +++ b/t/t1300-config.sh > @@ -1413,6 +1413,14 @@ test_expect_success 'git -c and --config-env override each other' ' > test_cmp expect actual > ' > > +test_expect_success '--config-env handles keys with equals' ' > + echo value=with=equals >expect && > + ENVVAR=value=with=equals git \ > + --config-env=section.subsection=with=equals.key=ENVVAR \ > + config section.subsection=with=equals.key >actual && > + test_cmp expect actual > +' > + Maybe worth adding a test for the strrchr() semantics here with: perl -we '$ENV{"Y=Z"}="why and zed"; system "Z=zed git --config-env=X=Y=Z ..."' Which would show that we can't look up "Y=Z", but will always get "Z". I think that's fine b.t.w., 1/2 of the minor objection I had to --config-env in https://lore.kernel.org/git/87y2i7vvz4.fsf@evledraar.gmail.com/ was mainly about being unable to e.g. support odd token usernames with the "insteadOf" feature. But aside from having a feature meant to improve security being able to be combined with a config variable we have in a way that leaks the password in ps(1) I think these improved semantics make sense. I.e. I can't imagine someone wants an env var with "=" in it, even though POSIX makes such a thing possible (you just can't do it in a shellscript).
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Wed, Dec 09 2020, Jeff King wrote: > ... >> +test_expect_success '--config-env handles keys with equals' ' >> + echo value=with=equals >expect && >> + ENVVAR=value=with=equals git \ >> + --config-env=section.subsection=with=equals.key=ENVVAR \ >> + config section.subsection=with=equals.key >actual && >> + test_cmp expect actual >> +' >> + > > Maybe worth adding a test for the strrchr() semantics here with: > > perl -we '$ENV{"Y=Z"}="why and zed"; system "Z=zed git --config-env=X=Y=Z ..."' > > Which would show that we can't look up "Y=Z", but will always get "Z". Yes, that was explained in the cover letter of these three patches in <X9D23LQv34A5Q5DC@coredump.intra.peff.net>. We really should document that <envvar> can't contain an "=" sign, but I do not see much point in casting that limitation in stone with a test. As long as we know things work correctly with environment variables without '=' in their names, we should be happy.
On Thu, Dec 10, 2020 at 09:55:18PM +0100, Ævar Arnfjörð Bjarmason wrote: > > diff --git a/t/t1300-config.sh b/t/t1300-config.sh > > index bd602e7720..e06961767f 100755 > > --- a/t/t1300-config.sh > > +++ b/t/t1300-config.sh > > @@ -1413,6 +1413,14 @@ test_expect_success 'git -c and --config-env override each other' ' > > test_cmp expect actual > > ' > > > > +test_expect_success '--config-env handles keys with equals' ' > > + echo value=with=equals >expect && > > + ENVVAR=value=with=equals git \ > > + --config-env=section.subsection=with=equals.key=ENVVAR \ > > + config section.subsection=with=equals.key >actual && > > + test_cmp expect actual > > +' > > + > > Maybe worth adding a test for the strrchr() semantics here with: > > perl -we '$ENV{"Y=Z"}="why and zed"; system "Z=zed git --config-env=X=Y=Z ..."' > > Which would show that we can't look up "Y=Z", but will always get "Z". We're already testing those here, though. If we used strchr(), then we'd end up setting section.subsection in the above test. I.e., your X=Y=Z can be tested in either of two ways: - check that X=Y is set to $Z - check that X is not set to Y=Z It's sufficient to test only one, because success in one implies the other. And of the two, I think testing the first is much more interesting (because testing "this expected thing happened" is much more robust than "this unexpected thing didn't happen"). > I.e. I can't imagine someone wants an env var with "=" in it, even > though POSIX makes such a thing possible (you just can't do it in a > shellscript). Yeah, I'm definitely OK with that limitation, but we should document it. -Peff
diff --git a/config.c b/config.c index fb160c33d2..04029e45dc 100644 --- a/config.c +++ b/config.c @@ -333,38 +333,76 @@ int git_config_include(const char *var, const char *value, void *data) return ret; } -void git_config_push_parameter(const char *text) +static void git_config_push_split_parameter(const char *key, const char *value) { struct strbuf env = STRBUF_INIT; const char *old = getenv(CONFIG_DATA_ENVIRONMENT); if (old && *old) { strbuf_addstr(&env, old); strbuf_addch(&env, ' '); } - sq_quote_buf(&env, text); + sq_quote_buf(&env, key); + strbuf_addch(&env, '='); + if (value) + sq_quote_buf(&env, value); setenv(CONFIG_DATA_ENVIRONMENT, env.buf, 1); strbuf_release(&env); } +void git_config_push_parameter(const char *text) +{ + const char *value; + + /* + * When we see: + * + * section.subsection=with=equals.key=value + * + * we cannot tell if it means: + * + * [section "subsection=with=equals"] + * key = value + * + * or: + * + * [section] + * subsection = with=equals.key=value + * + * We parse left-to-right for the first "=", meaning we'll prefer to + * keep the value intact over the subsection. This is historical, but + * also sensible since values are more likely to contain odd or + * untrusted input than a section name. + * + * A missing equals is explicitly allowed (as a bool-only entry). + */ + value = strchr(text, '='); + if (value) { + char *key = xmemdupz(text, value - text); + git_config_push_split_parameter(key, value + 1); + free(key); + } else { + git_config_push_split_parameter(text, NULL); + } +} + void git_config_push_env(const char *spec) { - struct strbuf buf = STRBUF_INIT; + char *key; const char *env_name; const char *env_value; env_name = strrchr(spec, '='); if (!env_name) die("invalid config format: %s", spec); + key = xmemdupz(spec, env_name - spec); env_name++; env_value = getenv(env_name); if (!env_value) die("config variable missing for '%s'", env_name); - strbuf_add(&buf, spec, env_name - spec); - strbuf_addstr(&buf, env_value); - git_config_push_parameter(buf.buf); - strbuf_release(&buf); + git_config_push_split_parameter(key, env_value); + free(key); } static inline int iskeychar(int c) diff --git a/t/t1300-config.sh b/t/t1300-config.sh index bd602e7720..e06961767f 100755 --- a/t/t1300-config.sh +++ b/t/t1300-config.sh @@ -1413,6 +1413,14 @@ test_expect_success 'git -c and --config-env override each other' ' test_cmp expect actual ' +test_expect_success '--config-env handles keys with equals' ' + echo value=with=equals >expect && + ENVVAR=value=with=equals git \ + --config-env=section.subsection=with=equals.key=ENVVAR \ + config section.subsection=with=equals.key >actual && + test_cmp expect actual +' + test_expect_success 'git config handles environment config pairs' ' GIT_CONFIG_COUNT=2 \ GIT_CONFIG_KEY_0="pair.one" GIT_CONFIG_VALUE_0="foo" \
The previous commit added a new format for $GIT_CONFIG_PARAMETERS which is able to robustly handle subsections with "=" in them. Let's start writing the new format. Unfortunately, this does much less than you'd hope, because "git -c" itself has the same ambiguity problem! But it's still worth doing: - we've now pushed the problem from the inter-process communication into the "-c" command-line parser. This would free us up to later add an unambiguous format there (e.g., separate arguments like "git --config key value", etc). - for --config-env, the parser already disallows "=" in the environment variable name. So: git --config-env section.with=equals.key=ENVVAR will robustly set section.with=equals.key to the contents of $ENVVAR. The new test shows the improvement for --config-env. Signed-off-by: Jeff King <peff@peff.net> --- One other side effect I just noticed is that we're very aggressive about trimming leading and trailing whitespace in the old-style format, but the new one will store values verbatim. IMHO that's better overall, but we might consider a preparatory patch to remove that trimming explicitly. config.c | 52 ++++++++++++++++++++++++++++++++++++++++------- t/t1300-config.sh | 8 ++++++++ 2 files changed, 53 insertions(+), 7 deletions(-)