diff mbox series

[3/3] config: store "git -c" variables using more robust format

Message ID X9D5SnXca2rGnJFl@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series [1/3] quote: make sq_dequote_step() a public function | expand

Commit Message

Jeff King Dec. 9, 2020, 4:20 p.m. UTC
The previous commit added a new format for $GIT_CONFIG_PARAMETERS which
is able to robustly handle subsections with "=" in them. Let's start
writing the new format. Unfortunately, this does much less than you'd
hope, because "git -c" itself has the same ambiguity problem! But it's
still worth doing:

  - we've now pushed the problem from the inter-process communication
    into the "-c" command-line parser. This would free us up to later
    add an unambiguous format there (e.g., separate arguments like "git
    --config key value", etc).

  - for --config-env, the parser already disallows "=" in the
    environment variable name. So:

      git --config-env section.with=equals.key=ENVVAR

    will robustly set section.with=equals.key to the contents of
    $ENVVAR.

The new test shows the improvement for --config-env.

Signed-off-by: Jeff King <peff@peff.net>
---
One other side effect I just noticed is that we're very aggressive about
trimming leading and trailing whitespace in the old-style format, but
the new one will store values verbatim. IMHO that's better overall, but
we might consider a preparatory patch to remove that trimming
explicitly.

 config.c          | 52 ++++++++++++++++++++++++++++++++++++++++-------
 t/t1300-config.sh |  8 ++++++++
 2 files changed, 53 insertions(+), 7 deletions(-)

Comments

Jeff King Dec. 9, 2020, 4:34 p.m. UTC | #1
On Wed, Dec 09, 2020 at 11:20:26AM -0500, Jeff King wrote:

> One other side effect I just noticed is that we're very aggressive about
> trimming leading and trailing whitespace in the old-style format, but
> the new one will store values verbatim. IMHO that's better overall, but
> we might consider a preparatory patch to remove that trimming
> explicitly.

Actually, it looks like we just trim either side of the key. Which
is...weird. We've never generated any, and I wouldn't expect people to
write:

  git -c '  some.key = value'

And even if they did, then "value" would have extra whitespace. So I
don't think this is really changing anything important, though I'm still
tempted to do something like the patch below to clean up the reading
side (and as a bonus, it gets rid of a strbuf_split call, which is a
terrible and awkward interface).

diff --git a/config.c b/config.c
index 04029e45dc..ede33cf3d0 100644
--- a/config.c
+++ b/config.c
@@ -516,29 +516,21 @@ static int config_parse_pair(const char *key, const char *value,
 int git_config_parse_parameter(const char *text,
 			       config_fn_t fn, void *data)
 {
-	const char *value;
-	struct strbuf **pair;
+	char *to_free = NULL;
+	const char *key, *value;
 	int ret;
 
-	pair = strbuf_split_str(text, '=', 2);
-	if (!pair[0])
-		return error(_("bogus config parameter: %s"), text);
-
-	if (pair[0]->len && pair[0]->buf[pair[0]->len - 1] == '=') {
-		strbuf_setlen(pair[0], pair[0]->len - 1);
-		value = pair[1] ? pair[1]->buf : "";
+	value = strchr(text, '=');
+	if (value) {
+		key = to_free = xmemdupz(text, value - text);
+		value++;
 	} else {
-		value = NULL;
+		key = text;
 	}
 
-	strbuf_trim(pair[0]);
-	if (!pair[0]->len) {
-		strbuf_list_free(pair);
-		return error(_("bogus config parameter: %s"), text);
-	}
+	ret = config_parse_pair(key, value, fn, data);
 
-	ret = config_parse_pair(pair[0]->buf, value, fn, data);
-	strbuf_list_free(pair);
+	free(to_free);
 	return ret;
 }
Ævar Arnfjörð Bjarmason Dec. 10, 2020, 8:55 p.m. UTC | #2
On Wed, Dec 09 2020, Jeff King wrote:

> The previous commit added a new format for $GIT_CONFIG_PARAMETERS which
> is able to robustly handle subsections with "=" in them. Let's start
> writing the new format. Unfortunately, this does much less than you'd
> hope, because "git -c" itself has the same ambiguity problem! But it's
> still worth doing:
>
>   - we've now pushed the problem from the inter-process communication
>     into the "-c" command-line parser. This would free us up to later
>     add an unambiguous format there (e.g., separate arguments like "git
>     --config key value", etc).
>
>   - for --config-env, the parser already disallows "=" in the
>     environment variable name. So:
>
>       git --config-env section.with=equals.key=ENVVAR
>
>     will robustly set section.with=equals.key to the contents of
>     $ENVVAR.
>
> The new test shows the improvement for --config-env.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> One other side effect I just noticed is that we're very aggressive about
> trimming leading and trailing whitespace in the old-style format, but
> the new one will store values verbatim. IMHO that's better overall, but
> we might consider a preparatory patch to remove that trimming
> explicitly.
>
>  config.c          | 52 ++++++++++++++++++++++++++++++++++++++++-------
>  t/t1300-config.sh |  8 ++++++++
>  2 files changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/config.c b/config.c
> index fb160c33d2..04029e45dc 100644
> --- a/config.c
> +++ b/config.c
> @@ -333,38 +333,76 @@ int git_config_include(const char *var, const char *value, void *data)
>  	return ret;
>  }
>  
> -void git_config_push_parameter(const char *text)
> +static void git_config_push_split_parameter(const char *key, const char *value)
>  {
>  	struct strbuf env = STRBUF_INIT;
>  	const char *old = getenv(CONFIG_DATA_ENVIRONMENT);
>  	if (old && *old) {
>  		strbuf_addstr(&env, old);
>  		strbuf_addch(&env, ' ');
>  	}
> -	sq_quote_buf(&env, text);
> +	sq_quote_buf(&env, key);
> +	strbuf_addch(&env, '=');
> +	if (value)
> +		sq_quote_buf(&env, value);
>  	setenv(CONFIG_DATA_ENVIRONMENT, env.buf, 1);
>  	strbuf_release(&env);
>  }
>  
> +void git_config_push_parameter(const char *text)
> +{
> +	const char *value;
> +
> +	/*
> +	 * When we see:
> +	 *
> +	 *   section.subsection=with=equals.key=value
> +	 *
> +	 * we cannot tell if it means:
> +	 *
> +	 *   [section "subsection=with=equals"]
> +	 *   key = value
> +	 *
> +	 * or:
> +	 *
> +	 *   [section]
> +	 *   subsection = with=equals.key=value
> +	 *
> +	 * We parse left-to-right for the first "=", meaning we'll prefer to
> +	 * keep the value intact over the subsection. This is historical, but
> +	 * also sensible since values are more likely to contain odd or
> +	 * untrusted input than a section name.
> +	 *
> +	 * A missing equals is explicitly allowed (as a bool-only entry).
> +	 */
> +	value = strchr(text, '=');
> +	if (value) {
> +		char *key = xmemdupz(text, value - text);
> +		git_config_push_split_parameter(key, value + 1);
> +		free(key);
> +	} else {
> +		git_config_push_split_parameter(text, NULL);
> +	}
> +}
> +
>  void git_config_push_env(const char *spec)
>  {
> -	struct strbuf buf = STRBUF_INIT;
> +	char *key;
>  	const char *env_name;
>  	const char *env_value;
>  
>  	env_name = strrchr(spec, '=');
>  	if (!env_name)
>  		die("invalid config format: %s", spec);
> +	key = xmemdupz(spec, env_name - spec);
>  	env_name++;
>  
>  	env_value = getenv(env_name);
>  	if (!env_value)
>  		die("config variable missing for '%s'", env_name);
>  
> -	strbuf_add(&buf, spec, env_name - spec);
> -	strbuf_addstr(&buf, env_value);
> -	git_config_push_parameter(buf.buf);
> -	strbuf_release(&buf);
> +	git_config_push_split_parameter(key, env_value);
> +	free(key);
>  }
>  
>  static inline int iskeychar(int c)
> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> index bd602e7720..e06961767f 100755
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -1413,6 +1413,14 @@ test_expect_success 'git -c and --config-env override each other' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success '--config-env handles keys with equals' '
> +	echo value=with=equals >expect &&
> +	ENVVAR=value=with=equals git \
> +		--config-env=section.subsection=with=equals.key=ENVVAR \
> +		config section.subsection=with=equals.key >actual &&
> +	test_cmp expect actual
> +'
> +

Maybe worth adding a test for the strrchr() semantics here with:

    perl -we '$ENV{"Y=Z"}="why and zed"; system "Z=zed git --config-env=X=Y=Z ..."'

Which would show that we can't look up "Y=Z", but will always get "Z".

I think that's fine b.t.w., 1/2 of the minor objection I had to
--config-env in
https://lore.kernel.org/git/87y2i7vvz4.fsf@evledraar.gmail.com/ was
mainly about being unable to e.g. support odd token usernames with the
"insteadOf" feature.

But aside from having a feature meant to improve security being able to
be combined with a config variable we have in a way that leaks the
password in ps(1) I think these improved semantics make sense.

I.e. I can't imagine someone wants an env var with "=" in it, even
though POSIX makes such a thing possible (you just can't do it in a
shellscript).
Junio C Hamano Dec. 10, 2020, 9:49 p.m. UTC | #3
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Dec 09 2020, Jeff King wrote:
> ...
>> +test_expect_success '--config-env handles keys with equals' '
>> +	echo value=with=equals >expect &&
>> +	ENVVAR=value=with=equals git \
>> +		--config-env=section.subsection=with=equals.key=ENVVAR \
>> +		config section.subsection=with=equals.key >actual &&
>> +	test_cmp expect actual
>> +'
>> +
>
> Maybe worth adding a test for the strrchr() semantics here with:
>
>     perl -we '$ENV{"Y=Z"}="why and zed"; system "Z=zed git --config-env=X=Y=Z ..."'
>
> Which would show that we can't look up "Y=Z", but will always get "Z".

Yes, that was explained in the cover letter of these three patches
in <X9D23LQv34A5Q5DC@coredump.intra.peff.net>.  

We really should document that <envvar> can't contain an "=" sign,
but I do not see much point in casting that limitation in stone with
a test.  As long as we know things work correctly with environment
variables without '=' in their names, we should be happy.
Jeff King Dec. 11, 2020, 1:21 p.m. UTC | #4
On Thu, Dec 10, 2020 at 09:55:18PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> > index bd602e7720..e06961767f 100755
> > --- a/t/t1300-config.sh
> > +++ b/t/t1300-config.sh
> > @@ -1413,6 +1413,14 @@ test_expect_success 'git -c and --config-env override each other' '
> >  	test_cmp expect actual
> >  '
> >  
> > +test_expect_success '--config-env handles keys with equals' '
> > +	echo value=with=equals >expect &&
> > +	ENVVAR=value=with=equals git \
> > +		--config-env=section.subsection=with=equals.key=ENVVAR \
> > +		config section.subsection=with=equals.key >actual &&
> > +	test_cmp expect actual
> > +'
> > +
> 
> Maybe worth adding a test for the strrchr() semantics here with:
> 
>     perl -we '$ENV{"Y=Z"}="why and zed"; system "Z=zed git --config-env=X=Y=Z ..."'
> 
> Which would show that we can't look up "Y=Z", but will always get "Z".

We're already testing those here, though. If we used strchr(), then we'd
end up setting section.subsection in the above test.

I.e., your X=Y=Z can be tested in either of two ways:

  - check that X=Y is set to $Z

  - check that X is not set to Y=Z

It's sufficient to test only one, because success in one implies the
other. And of the two, I think testing the first is much more
interesting (because testing "this expected thing happened" is much more
robust than "this unexpected thing didn't happen").

> I.e. I can't imagine someone wants an env var with "=" in it, even
> though POSIX makes such a thing possible (you just can't do it in a
> shellscript).

Yeah, I'm definitely OK with that limitation, but we should document it.

-Peff
diff mbox series

Patch

diff --git a/config.c b/config.c
index fb160c33d2..04029e45dc 100644
--- a/config.c
+++ b/config.c
@@ -333,38 +333,76 @@  int git_config_include(const char *var, const char *value, void *data)
 	return ret;
 }
 
-void git_config_push_parameter(const char *text)
+static void git_config_push_split_parameter(const char *key, const char *value)
 {
 	struct strbuf env = STRBUF_INIT;
 	const char *old = getenv(CONFIG_DATA_ENVIRONMENT);
 	if (old && *old) {
 		strbuf_addstr(&env, old);
 		strbuf_addch(&env, ' ');
 	}
-	sq_quote_buf(&env, text);
+	sq_quote_buf(&env, key);
+	strbuf_addch(&env, '=');
+	if (value)
+		sq_quote_buf(&env, value);
 	setenv(CONFIG_DATA_ENVIRONMENT, env.buf, 1);
 	strbuf_release(&env);
 }
 
+void git_config_push_parameter(const char *text)
+{
+	const char *value;
+
+	/*
+	 * When we see:
+	 *
+	 *   section.subsection=with=equals.key=value
+	 *
+	 * we cannot tell if it means:
+	 *
+	 *   [section "subsection=with=equals"]
+	 *   key = value
+	 *
+	 * or:
+	 *
+	 *   [section]
+	 *   subsection = with=equals.key=value
+	 *
+	 * We parse left-to-right for the first "=", meaning we'll prefer to
+	 * keep the value intact over the subsection. This is historical, but
+	 * also sensible since values are more likely to contain odd or
+	 * untrusted input than a section name.
+	 *
+	 * A missing equals is explicitly allowed (as a bool-only entry).
+	 */
+	value = strchr(text, '=');
+	if (value) {
+		char *key = xmemdupz(text, value - text);
+		git_config_push_split_parameter(key, value + 1);
+		free(key);
+	} else {
+		git_config_push_split_parameter(text, NULL);
+	}
+}
+
 void git_config_push_env(const char *spec)
 {
-	struct strbuf buf = STRBUF_INIT;
+	char *key;
 	const char *env_name;
 	const char *env_value;
 
 	env_name = strrchr(spec, '=');
 	if (!env_name)
 		die("invalid config format: %s", spec);
+	key = xmemdupz(spec, env_name - spec);
 	env_name++;
 
 	env_value = getenv(env_name);
 	if (!env_value)
 		die("config variable missing for '%s'", env_name);
 
-	strbuf_add(&buf, spec, env_name - spec);
-	strbuf_addstr(&buf, env_value);
-	git_config_push_parameter(buf.buf);
-	strbuf_release(&buf);
+	git_config_push_split_parameter(key, env_value);
+	free(key);
 }
 
 static inline int iskeychar(int c)
diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index bd602e7720..e06961767f 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1413,6 +1413,14 @@  test_expect_success 'git -c and --config-env override each other' '
 	test_cmp expect actual
 '
 
+test_expect_success '--config-env handles keys with equals' '
+	echo value=with=equals >expect &&
+	ENVVAR=value=with=equals git \
+		--config-env=section.subsection=with=equals.key=ENVVAR \
+		config section.subsection=with=equals.key >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git config handles environment config pairs' '
 	GIT_CONFIG_COUNT=2 \
 		GIT_CONFIG_KEY_0="pair.one" GIT_CONFIG_VALUE_0="foo" \