From patchwork Wed Dec 9 16:12:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11961831 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9132C4361B for ; Wed, 9 Dec 2020 16:13:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8E1BB23A02 for ; Wed, 9 Dec 2020 16:13:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731491AbgLIQNN (ORCPT ); Wed, 9 Dec 2020 11:13:13 -0500 Received: from cloud.peff.net ([104.130.231.41]:55586 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727008AbgLIQNN (ORCPT ); Wed, 9 Dec 2020 11:13:13 -0500 Received: (qmail 18172 invoked by uid 109); 9 Dec 2020 16:12:33 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Wed, 09 Dec 2020 16:12:33 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 16372 invoked by uid 111); 9 Dec 2020 16:12:32 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Wed, 09 Dec 2020 11:12:32 -0500 Authentication-Results: peff.net; auth=none Date: Wed, 9 Dec 2020 11:12:32 -0500 From: Jeff King To: Patrick Steinhardt Cc: git@vger.kernel.org, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Junio C Hamano , "brian m. carlson" , Philip Oakley Subject: [PATCH 1/3] quote: make sq_dequote_step() a public function Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We provide a function for dequoting an entire string, as well as one for handling a space-separated list of quoted strings. But there's no way for a caller to parse a string like 'foo'='bar', even though it is easy to generate one using sq_quote_buf() or similar. Let's make the single-step function available to callers outside of quote.c. Note that we do need to adjust its implementation slightly: it insists on seeing whitespace between items, and we'd like to be more flexible than that. Since it only has a single caller, we can move that check (and slurping up any extra whitespace) into that caller. Signed-off-by: Jeff King --- quote.c | 15 ++++++++++----- quote.h | 18 ++++++++++++++++-- 2 files changed, 26 insertions(+), 7 deletions(-) diff --git a/quote.c b/quote.c index 69f4ca45da..8a3a5e39eb 100644 --- a/quote.c +++ b/quote.c @@ -116,7 +116,7 @@ void sq_append_quote_argv_pretty(struct strbuf *dst, const char **argv) } } -static char *sq_dequote_step(char *arg, char **next) +char *sq_dequote_step(char *arg, char **next) { char *dst = arg; char *src = arg; @@ -153,11 +153,8 @@ static char *sq_dequote_step(char *arg, char **next) } /* Fallthrough */ default: - if (!next || !isspace(*src)) + if (!next) return NULL; - do { - c = *++src; - } while (isspace(c)); *dst = 0; *next = src; return arg; @@ -182,6 +179,14 @@ static int sq_dequote_to_argv_internal(char *arg, char *dequoted = sq_dequote_step(next, &next); if (!dequoted) return -1; + if (next) { + char c; + if (!isspace(*next)) + return -1; + do { + c = *++next; + } while (isspace(c)); + } if (argv) { ALLOC_GROW(*argv, *nr + 1, *alloc); (*argv)[(*nr)++] = dequoted; diff --git a/quote.h b/quote.h index 4b72a583cf..768cc6338e 100644 --- a/quote.h +++ b/quote.h @@ -42,12 +42,26 @@ void sq_quote_buf_pretty(struct strbuf *, const char *src); void sq_quote_argv_pretty(struct strbuf *, const char **argv); void sq_append_quote_argv_pretty(struct strbuf *dst, const char **argv); -/* This unwraps what sq_quote() produces in place, but returns +/* + * This unwraps what sq_quote() produces in place, but returns * NULL if the input does not look like what sq_quote would have - * produced. + * produced (the full string must be a single quoted item). */ char *sq_dequote(char *); +/* + * Like sq_dequote(), but dequote a single item, and leave "next" pointing to + * the next character. E.g., in the string: + * + * 'one' 'two' 'three' + * + * after the first call, the return value would be the unquoted string "one", + * with "next" pointing to the space between "one" and "two"). The caller is + * responsible for advancing the pointer to the start of the next item before + * calling sq_dequote_step() again. + */ +char *sq_dequote_step(char *src, char **next); + /* * Same as the above, but can be used to unwrap many arguments in the * same string separated by space. Like sq_quote, it works in place, From patchwork Wed Dec 9 16:17:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11961835 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC44CC433FE for ; Wed, 9 Dec 2020 16:17:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8BF2323BE3 for ; Wed, 9 Dec 2020 16:17:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730116AbgLIQR4 (ORCPT ); Wed, 9 Dec 2020 11:17:56 -0500 Received: from cloud.peff.net ([104.130.231.41]:55598 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728404AbgLIQR4 (ORCPT ); Wed, 9 Dec 2020 11:17:56 -0500 Received: (qmail 18209 invoked by uid 109); 9 Dec 2020 16:17:15 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Wed, 09 Dec 2020 16:17:15 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 16432 invoked by uid 111); 9 Dec 2020 16:17:15 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Wed, 09 Dec 2020 11:17:15 -0500 Authentication-Results: peff.net; auth=none Date: Wed, 9 Dec 2020 11:17:14 -0500 From: Jeff King To: Patrick Steinhardt Cc: git@vger.kernel.org, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Junio C Hamano , "brian m. carlson" , Philip Oakley Subject: [PATCH 2/3] config: parse more robust format in GIT_CONFIG_PARAMETERS Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When we stuff config options into GIT_CONFIG_PARAMETERS, we shell-quote each one as a single unit, like: 'section.one=value1' 'section.two=value2' On the reading side, we de-quote to get the individual strings, and then parse them by splitting on the first "=" we find. This format is ambiguous, because an "=" may appear in a subsection. So the config represented in a file by both: [section "subsection=with=equals"] key = value and: [section] subsection = with=equals.key=value ends up in this flattened format like: 'section.subsection=with=equals.key=value' and we can't tell which was desired. We have traditionally resolved this by taking the first "=" we see starting from the left, meaning that we allowed arbitrary content in the value, but not in the subsection. Let's make our environment format a bit more robust by separately quoting the key and value. That turns those examples into: 'section.subsection=with=equals.key'='value' and: 'section.subsection'='with=equals.key=value' respectively, and we can tell the difference between them. We can detect which format is in use for any given element of the list based on the presence of the unquoted "=". That means we can continue to allow the old format to work to support any callers which manually used the old format, and we can even intermingle the two formats. The old format wasn't documented, and nobody was supposed to be using it. But it's likely that such callers exist in the wild, so it's nice if we can avoid breaking them. Likewise, it may be possible to trigger an older version of "git -c" that runs a script that calls into a newer version of "git -c"; that new version would see the intermingled format. This does create one complication, which is that the obvious format in the new scheme for [section] some-bool is: 'section.some-bool' with no equals. We'd mistake that for an old-style variable. And it even has the same meaning in the old style, but: [section "with=equals"] some-bool does not. It would be: 'section.with=equals=some-bool' which we'd take to mean: [section] with = equals=some-bool in the old, ambiguous style. Likewise, we can't use: 'section.some-bool'='' because that's ambiguous with an actual empty string. Instead, we'll again use the shell-quoting to give us a hint, and use: 'section.some-bool'= to show that we have no value. Note that this commit just expands the reading side. We'll start writing the new format via "git -c" in a future patch. In the meantime, the existing "git -c" tests will make sure we didn't break reading the old format. But we'll also add some explicit coverage of the two formats to make sure we continue to handle the old one after we move the writing side over. And one final note: since we're now using the shell-quoting as a semantically meaningful hint, this closes the door to us ever allowing arbitrary shell quoting, like: 'a'shell'would'be'ok'with'this'.key=value But we have never supported that (only what sq_quote() would produce), and we are probably better off keeping things simple, robust, and backwards-compatible, than trying to make it easier for humans. We'll continue not to advertise the format of the variable to users, and instead keep "git -c" as the recommended mechanism for setting config (even if we are trying to be kind not to break users who may be relying on the current undocumented format). Signed-off-by: Jeff King --- config.c | 66 +++++++++++++++++++++++++++++++++++++---------- t/t1300-config.sh | 52 +++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+), 14 deletions(-) diff --git a/config.c b/config.c index 779487bc2d..fb160c33d2 100644 --- a/config.c +++ b/config.c @@ -504,6 +504,57 @@ int git_config_parse_parameter(const char *text, return ret; } +static int parse_config_env_list(char *env, config_fn_t fn, void *data) +{ + char *cur = env; + while (cur && *cur) { + const char *key = sq_dequote_step(cur, &cur); + if (!key) + return error(_("bogus format in %s"), + CONFIG_DATA_ENVIRONMENT); + + if (!cur || isspace(*cur)) { + /* old-style 'key=value' */ + if (git_config_parse_parameter(key, fn, data) < 0) + return -1; + } + else if (*cur == '=') { + /* new-style 'key'='value' */ + const char *value; + + cur++; + if (*cur == '\'') { + /* quoted value */ + value = sq_dequote_step(cur, &cur); + if (!value || (cur && !isspace(*cur))) { + return error(_("bogus format in %s"), + CONFIG_DATA_ENVIRONMENT); + } + } else if (!*cur || isspace(*cur)) { + /* implicit bool: 'key'= */ + value = NULL; + } else { + return error(_("bogus format in %s"), + CONFIG_DATA_ENVIRONMENT); + } + + if (config_parse_pair(key, value, fn, data) < 0) + return -1; + } + else { + /* unknown format */ + return error(_("bogus format in %s"), + CONFIG_DATA_ENVIRONMENT); + } + + if (cur) { + while (isspace(*cur)) + cur++; + } + } + return 0; +} + int git_config_from_parameters(config_fn_t fn, void *data) { const char *env; @@ -563,22 +614,9 @@ int git_config_from_parameters(config_fn_t fn, void *data) env = getenv(CONFIG_DATA_ENVIRONMENT); if (env) { - int nr = 0, alloc = 0; - /* sq_dequote will write over it */ envw = xstrdup(env); - - if (sq_dequote_to_argv(envw, &argv, &nr, &alloc) < 0) { - ret = error(_("bogus format in %s"), CONFIG_DATA_ENVIRONMENT); - goto out; - } - - for (i = 0; i < nr; i++) { - if (git_config_parse_parameter(argv[i], fn, data) < 0) { - ret = -1; - goto out; - } - } + ret = parse_config_env_list(envw, fn, data); } out: diff --git a/t/t1300-config.sh b/t/t1300-config.sh index f157cd217e..bd602e7720 100755 --- a/t/t1300-config.sh +++ b/t/t1300-config.sh @@ -1294,6 +1294,58 @@ test_expect_success 'git -c is not confused by empty environment' ' GIT_CONFIG_PARAMETERS="" git -c x.one=1 config --list ' +test_expect_success 'GIT_CONFIG_PARAMETERS handles old-style entries' ' + v="${SQ}key.one=foo${SQ}" && + v="$v ${SQ}key.two=bar${SQ}" && + v="$v ${SQ}key.ambiguous=section.whatever=value${SQ}" && + GIT_CONFIG_PARAMETERS=$v git config --get-regexp "key.*" >actual && + cat >expect <<-EOF && + key.one foo + key.two bar + key.ambiguous section.whatever=value + EOF + test_cmp expect actual +' + +test_expect_success 'GIT_CONFIG_PARAMETERS handles new-style entries' ' + v="${SQ}key.one${SQ}=${SQ}foo${SQ}" && + v="$v ${SQ}key.two${SQ}=${SQ}bar${SQ}" && + v="$v ${SQ}key.ambiguous=section.whatever${SQ}=${SQ}value${SQ}" && + GIT_CONFIG_PARAMETERS=$v git config --get-regexp "key.*" >actual && + cat >expect <<-EOF && + key.one foo + key.two bar + key.ambiguous=section.whatever value + EOF + test_cmp expect actual +' + +test_expect_success 'old and new-style entries can mix' ' + v="${SQ}key.oldone=oldfoo${SQ}" && + v="$v ${SQ}key.newone${SQ}=${SQ}newfoo${SQ}" && + v="$v ${SQ}key.oldtwo=oldbar${SQ}" && + v="$v ${SQ}key.newtwo${SQ}=${SQ}newbar${SQ}" && + GIT_CONFIG_PARAMETERS=$v git config --get-regexp "key.*" >actual && + cat >expect <<-EOF && + key.oldone oldfoo + key.newone newfoo + key.oldtwo oldbar + key.newtwo newbar + EOF + test_cmp expect actual +' + +test_expect_success 'old and new bools with ambiguous subsection' ' + v="${SQ}key.with=equals.oldbool${SQ}" && + v="$v ${SQ}key.with=equals.newbool${SQ}=" && + GIT_CONFIG_PARAMETERS=$v git config --get-regexp "key.*" >actual && + cat >expect <<-EOF && + key.with equals.oldbool + key.with=equals.newbool + EOF + test_cmp expect actual +' + test_expect_success 'detect bogus GIT_CONFIG_PARAMETERS' ' cat >expect <<-\EOF && env.one one From patchwork Wed Dec 9 16:20:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11961851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30E86C4361B for ; Wed, 9 Dec 2020 16:21:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC04A23BE3 for ; Wed, 9 Dec 2020 16:21:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731889AbgLIQVU (ORCPT ); Wed, 9 Dec 2020 11:21:20 -0500 Received: from cloud.peff.net ([104.130.231.41]:55616 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731881AbgLIQVJ (ORCPT ); Wed, 9 Dec 2020 11:21:09 -0500 Received: (qmail 18333 invoked by uid 109); 9 Dec 2020 16:20:27 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Wed, 09 Dec 2020 16:20:27 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 16482 invoked by uid 111); 9 Dec 2020 16:20:26 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Wed, 09 Dec 2020 11:20:26 -0500 Authentication-Results: peff.net; auth=none Date: Wed, 9 Dec 2020 11:20:26 -0500 From: Jeff King To: Patrick Steinhardt Cc: git@vger.kernel.org, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Junio C Hamano , "brian m. carlson" , Philip Oakley Subject: [PATCH 3/3] config: store "git -c" variables using more robust format Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The previous commit added a new format for $GIT_CONFIG_PARAMETERS which is able to robustly handle subsections with "=" in them. Let's start writing the new format. Unfortunately, this does much less than you'd hope, because "git -c" itself has the same ambiguity problem! But it's still worth doing: - we've now pushed the problem from the inter-process communication into the "-c" command-line parser. This would free us up to later add an unambiguous format there (e.g., separate arguments like "git --config key value", etc). - for --config-env, the parser already disallows "=" in the environment variable name. So: git --config-env section.with=equals.key=ENVVAR will robustly set section.with=equals.key to the contents of $ENVVAR. The new test shows the improvement for --config-env. Signed-off-by: Jeff King --- One other side effect I just noticed is that we're very aggressive about trimming leading and trailing whitespace in the old-style format, but the new one will store values verbatim. IMHO that's better overall, but we might consider a preparatory patch to remove that trimming explicitly. config.c | 52 ++++++++++++++++++++++++++++++++++++++++------- t/t1300-config.sh | 8 ++++++++ 2 files changed, 53 insertions(+), 7 deletions(-) diff --git a/config.c b/config.c index fb160c33d2..04029e45dc 100644 --- a/config.c +++ b/config.c @@ -333,38 +333,76 @@ int git_config_include(const char *var, const char *value, void *data) return ret; } -void git_config_push_parameter(const char *text) +static void git_config_push_split_parameter(const char *key, const char *value) { struct strbuf env = STRBUF_INIT; const char *old = getenv(CONFIG_DATA_ENVIRONMENT); if (old && *old) { strbuf_addstr(&env, old); strbuf_addch(&env, ' '); } - sq_quote_buf(&env, text); + sq_quote_buf(&env, key); + strbuf_addch(&env, '='); + if (value) + sq_quote_buf(&env, value); setenv(CONFIG_DATA_ENVIRONMENT, env.buf, 1); strbuf_release(&env); } +void git_config_push_parameter(const char *text) +{ + const char *value; + + /* + * When we see: + * + * section.subsection=with=equals.key=value + * + * we cannot tell if it means: + * + * [section "subsection=with=equals"] + * key = value + * + * or: + * + * [section] + * subsection = with=equals.key=value + * + * We parse left-to-right for the first "=", meaning we'll prefer to + * keep the value intact over the subsection. This is historical, but + * also sensible since values are more likely to contain odd or + * untrusted input than a section name. + * + * A missing equals is explicitly allowed (as a bool-only entry). + */ + value = strchr(text, '='); + if (value) { + char *key = xmemdupz(text, value - text); + git_config_push_split_parameter(key, value + 1); + free(key); + } else { + git_config_push_split_parameter(text, NULL); + } +} + void git_config_push_env(const char *spec) { - struct strbuf buf = STRBUF_INIT; + char *key; const char *env_name; const char *env_value; env_name = strrchr(spec, '='); if (!env_name) die("invalid config format: %s", spec); + key = xmemdupz(spec, env_name - spec); env_name++; env_value = getenv(env_name); if (!env_value) die("config variable missing for '%s'", env_name); - strbuf_add(&buf, spec, env_name - spec); - strbuf_addstr(&buf, env_value); - git_config_push_parameter(buf.buf); - strbuf_release(&buf); + git_config_push_split_parameter(key, env_value); + free(key); } static inline int iskeychar(int c) diff --git a/t/t1300-config.sh b/t/t1300-config.sh index bd602e7720..e06961767f 100755 --- a/t/t1300-config.sh +++ b/t/t1300-config.sh @@ -1413,6 +1413,14 @@ test_expect_success 'git -c and --config-env override each other' ' test_cmp expect actual ' +test_expect_success '--config-env handles keys with equals' ' + echo value=with=equals >expect && + ENVVAR=value=with=equals git \ + --config-env=section.subsection=with=equals.key=ENVVAR \ + config section.subsection=with=equals.key >actual && + test_cmp expect actual +' + test_expect_success 'git config handles environment config pairs' ' GIT_CONFIG_COUNT=2 \ GIT_CONFIG_KEY_0="pair.one" GIT_CONFIG_VALUE_0="foo" \