[v2,1/1] config: learn the --stdin option for instructions

Message ID	20200127100933.10765-2-git@zjvandeweg.nl (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ksp+=3Q=vger.kernel.org=git-owner@kernel.org> From: Zeger-Jan van de Weg <git@zjvandeweg.nl> To: git@vger.kernel.org Cc: Zeger-Jan van de Weg <git@zjvandeweg.nl> Subject: [PATCH v2 1/1] config: learn the --stdin option for instructions Date: Mon, 27 Jan 2020 11:09:33 +0100 Message-Id: <20200127100933.10765-2-git@zjvandeweg.nl> In-Reply-To: <20200127100933.10765-1-git@zjvandeweg.nl> References: <20200127100933.10765-1-git@zjvandeweg.nl> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	config: read instructions from stdin \| expand [v2,0/1] config: read instructions from stdin [v2,1/1] config: learn the --stdin option for instructions

Zeger-Jan van de Weg Jan. 27, 2020, 10:09 a.m. UTC

When setting values in the git config, the value is part of the
arguments for execution. This potentially leaks the value through
logging, or other programs like `ps`.

Add the `--stdin` option that reads from stdin for instructions to set
and unset values to hide them from prying eyes. The instructions are based
on the `update-ref` DSL, and accept the set and unset commands.

Signed-off-by: Zeger-Jan van de Weg <git@zjvandeweg.nl>
---
 Documentation/git-config.txt |  29 ++++++++++
 builtin/config.c             | 104 +++++++++++++++++++++++++++++++++++
 t/t1300-config.sh            |  60 ++++++++++++++++++++
 3 files changed, 193 insertions(+)

Christian Couder Jan. 27, 2020, 11:44 a.m. UTC | #1

On Mon, Jan 27, 2020 at 11:21 AM Zeger-Jan van de Weg <git@zjvandeweg.nl> wrote:

> +static void update_config_stdin()
> +{
> +       struct strbuf input = STRBUF_INIT;
> +       const char *next;
> +       const char *max_char;
> +
> +       if (strbuf_read(&input, 0, 1000) < 0)
> +               die_errno(_("could not read from stdin"));
> +       next = input.buf;
> +
> +       max_char = input.buf + input.len;
> +       while(next < max_char) {

Nit: space missing between "while" and "(".

> +               if (*next == term)
> +                       die(_("empty command in input"));
> +               else if (isspace(*next))
> +                       die(_("whitespace before command"));
> +               else if (skip_prefix(next, "set ", &next))
> +                       next = parse_cmd_set(next, max_char);
> +               else if (skip_prefix(next, "unset ", &next))
> +                       next = parse_cmd_unset(next, max_char);
> +               else
> +                       die(_("unknown command %s"), next);
> +       }
> +
> +       strbuf_release(&input);
> +}

Thanks,
Christian.

Eric Sunshine Jan. 27, 2020, 4:59 p.m. UTC | #2

On Mon, Jan 27, 2020 at 5:17 AM Zeger-Jan van de Weg <git@zjvandeweg.nl> wrote:
> When setting values in the git config, the value is part of the
> arguments for execution. This potentially leaks the value through
> logging, or other programs like `ps`.
>
> Add the `--stdin` option that reads from stdin for instructions to set
> and unset values to hide them from prying eyes. The instructions are based
> on the `update-ref` DSL, and accept the set and unset commands.
>
> Signed-off-by: Zeger-Jan van de Weg <git@zjvandeweg.nl>
> ---
> diff --git a/Documentation/git-config.txt b/Documentation/git-config.txt
> @@ -259,6 +264,30 @@ Valid `<type>`'s include:
> +STDIN
> +-----
> +
> +With `--stdin`, config reads instructions from standard input and performs
> +all modifications in sequence.
> +
> +Specify commands of the form:
> +
> +    set SP <key> SP <newvalue>
> +    unset SP <key>

If you follow the precedent of the git-update-ref documentation, these
should be:

    set SP <key> SP <newvalue> LF
    unset SP <key> LF

I'm not sure we really need to be calling the value "newvalue" (I
guess you picked that up from git-update-ref.txt). "value" should be
fine, thus:

    set SP <key> SP <value> LF
    unset SP <key> LF

> +Alternatively, use `-z` or `--null` to specify in NUL-terminated format, without
> +quoting:
> +
> +    set SP <key> NULL <newvalue>
> +    unset SP <key>

A few comments:

First, this is talking about the NUL character, not a NULL pointer,
so: s/NULL/NUL/

Second, this doesn't give any indication about how the lines should be
terminated. It should instead be written as:

    set SP <key> NUL <value> NUL
    unset SP <key> NUL

Third, importantly, unlike git-update-ref from which this DSL takes
inspiration and in which "refs" might have oddball names for which
NUL-termination makes sense, it's hard to imagine a case in which a
configuration key would be so strange as to warrant NUL-termination.
This observation suggests a simpler DSL in which only <value> is
NUL-terminated:

    set SP <key> SP <value> NUL
    unset SP <key> LF

(The proposed code changes in config.c would need adjustment, as well,
to implement this revised DSL.)

> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> @@ -380,6 +380,66 @@ test_expect_success '--add' '
> +test_expect_success '--stdin works on no input' '
> +    echo -n "" | git config --stdin
> +'

`echo -n` is not portable. If you want no input at all, either grab it
from /dev/null:

    git config --stdin </dev/null

or use `printf` to suppress printing of the line-terminator:

    printf "" | git config --stdin

> +test_expect_success '--stdin with --null flag' '
> +    echo -ne "set bar.baz\0false" | git config --stdin --null &&
> +    Git config --get bar.baz >output &&
> +    echo false >expect &&
> +    test_cmp expect output
> +'

Likewise, non-portable use of `echo -n` and `echo "...\0...". Use
`printf` instead:

    printf "set bar.baz\0false" | git config --stdin --null &&

(But note that this isn't even testing NUL-termination of <value>.)

Jeff King Jan. 28, 2020, 9:19 a.m. UTC | #3

On Mon, Jan 27, 2020 at 11:09:33AM +0100, Zeger-Jan van de Weg wrote:

> diff --git a/Documentation/git-config.txt b/Documentation/git-config.txt
> index 899e92a1c9..9f7462284d 100644
> --- a/Documentation/git-config.txt
> +++ b/Documentation/git-config.txt
> @@ -23,6 +23,7 @@ SYNOPSIS
>  'git config' [<file-option>] [--show-origin] [-z|--null] [--name-only] -l | --list
>  'git config' [<file-option>] --get-color name [default]
>  'git config' [<file-option>] --get-colorbool name [stdout-is-tty]
> +'git config' [<file-option>] [-z|--null] --stdin

Instead of a stdin mode just for setting, you have a general --stdin
mode. That would later allow "get" commands, too. I'm not sure if that's
a good idea, though. Here are two complications which came to mind
immediately:

  - the permissions models for getting and setting might be different.
    Might we ever feed untrusted input to something like "git config
    --get --stdin", but not want it to be able to set any values?

  - what happens when get/set commands are intermixed? When are set
    commands flushed? E.g., if I send:

     set foo.bar baz
     get foo.bar

    then do I see "baz", or whatever was there before?

I think those are solvable, but unless we have a compelling use case for
a process that will both get and set values, it seems like it might be
simpler to keep the major modes separate.

> +STDIN
> +-----
> +
> +With `--stdin`, config reads instructions from standard input and performs
> +all modifications in sequence.
> +
> +Specify commands of the form:
> +
> +	set SP <key> SP <newvalue>
> +	unset SP <key>

This protocol doesn't leave much room for giving more details about each
set operation. There are several things you can do on the command line
now that you might want to specify on a per-key basis:

  - adding keys versus replacing

  - likewise, replace-all versus a single replace

  - likewise, unset versus unset-all

  - type interpretation / normalization

  - value regex matching

Even if we don't implement all of that immediately, it would be nice to
have a plan for how they'd be added to stdin mode eventually.

> +Alternatively, use `-z` or `--null` to specify in NUL-terminated format, without
> +quoting:

This mentions quoting, but the earlier form doesn't discuss it at all. I
can see from the quote that it does unquote_c_style(). It might be worth
saying so explicitly; since we're dealing with config, people may assume
that the syntax is like the one in config files (which is similar, but
not quite the same).

> +static const char *parse_cmd_set(const char *next, const char *max_char)
> +{
> +	char *key, *value;
> +	int ret;
> +
> +	key = parse_key_or_value(&next, max_char);
> +	if (!key)
> +		die(_("set: missing key"));
> +
> +	value = parse_key_or_value(&next, max_char);
> +	if (!value)
> +		die(_("set: missing value"));
> +
> +	ret = git_config_set_in_file_gently(given_config_source.file, key, value);
> +	if (ret)
> +		die(_("cannot set key value pair: %d"), ret);
> +
> +	free(key);
> +	free(value);
> +	return next;
> +}

I mentioned flushing earlier. It looks like this will flush each
individual "set" command by rewriting the whole file. But another reason
one might want to use --stdin is to set multiple keys in one go, either
for efficiency or because you'd like readers to see the result
atomically.

I don't think we actually have a function to set multiple keys in one
pass yet. And I'd be OK starting with a less-efficient implementation.
But we probably should document that the current flushing behavior is
not something people should count on, to leave us room to change it in
the future.

The code itself looked cleanly done, though I admit I didn't look too
closely. If I've convinced you on any of the design considerations
above, I think it would need rewritten a bit.

-Peff

Jeff King Jan. 28, 2020, 9:24 a.m. UTC | #4

On Mon, Jan 27, 2020 at 11:59:03AM -0500, Eric Sunshine wrote:

> Second, this doesn't give any indication about how the lines should be
> terminated. It should instead be written as:
> 
>     set SP <key> NUL <value> NUL
>     unset SP <key> NUL
> 
> Third, importantly, unlike git-update-ref from which this DSL takes
> inspiration and in which "refs" might have oddball names for which
> NUL-termination makes sense, it's hard to imagine a case in which a
> configuration key would be so strange as to warrant NUL-termination.
> This observation suggests a simpler DSL in which only <value> is
> NUL-terminated:
> 
>     set SP <key> SP <value> NUL
>     unset SP <key> LF
> 
> (The proposed code changes in config.c would need adjustment, as well,
> to implement this revised DSL.)

The section and key parts of a config key are pretty restricted, but the
subsection portion can contain anything except newline and NUL. So in
particular, it would be valid to have a space, which would make the
input ambiguous.

I agree it would probably be rare, but isn't the whole point of "-z" to
be able to represent anything without worrying about quoting?

-Peff

Eric Sunshine Jan. 28, 2020, 1:42 p.m. UTC | #5

On Tue, Jan 28, 2020 at 4:24 AM Jeff King <peff@peff.net> wrote:
> On Mon, Jan 27, 2020 at 11:59:03AM -0500, Eric Sunshine wrote:
> >     set SP <key> SP <value> NUL
> >     unset SP <key> LF
>
> The section and key parts of a config key are pretty restricted, but the
> subsection portion can contain anything except newline and NUL. So in
> particular, it would be valid to have a space, which would make the
> input ambiguous.

I was wondering if that might be the case (particularly, if space was
allowed). Thanks for the education.

Junio C Hamano Jan. 28, 2020, 7:28 p.m. UTC | #6

Jeff King <peff@peff.net> writes:

> On Mon, Jan 27, 2020 at 11:59:03AM -0500, Eric Sunshine wrote:
>
>> Second, this doesn't give any indication about how the lines should be
>> terminated. It should instead be written as:
>> 
>>     set SP <key> NUL <value> NUL
>>     unset SP <key> NUL
>> 
>> Third, importantly, unlike git-update-ref from which this DSL takes
>> inspiration and in which "refs" might have oddball names for which
>> NUL-termination makes sense, it's hard to imagine a case in which a
>> configuration key would be so strange as to warrant NUL-termination.
>> This observation suggests a simpler DSL in which only <value> is
>> NUL-terminated:
>> 
>>     set SP <key> SP <value> NUL
>>     unset SP <key> LF
>> 
>> (The proposed code changes in config.c would need adjustment, as well,
>> to implement this revised DSL.)
>
> The section and key parts of a config key are pretty restricted, but the
> subsection portion can contain anything except newline and NUL. So in
> particular, it would be valid to have a space, which would make the
> input ambiguous.
>
> I agree it would probably be rare, but isn't the whole point of "-z" to
> be able to represent anything without worrying about quoting?

Yup.  I was tempted to say, in addition to "without worrying about
quoting", "without worrying about the syntax".

But unfortunately that would not work.  If we were to add a new
"frotz" subcommand in a future version of Git that takes N args,

	frotz SP <arg1> NUL <arg2> NUL ... <argN> NUL

may how you'd express it, and it would be wonderful if the current
version of Git that does not know "frotz" subcommand can at least
parse and ignore it.  That cannot however be done, though, because
there is no syntactic difference between the argument terminator
(after each arg) and subcommand terminator (after each subcommand).

Jeff King Jan. 29, 2020, 2:37 a.m. UTC | #7

On Tue, Jan 28, 2020 at 11:28:36AM -0800, Junio C Hamano wrote:

> > I agree it would probably be rare, but isn't the whole point of "-z" to
> > be able to represent anything without worrying about quoting?
> 
> Yup.  I was tempted to say, in addition to "without worrying about
> quoting", "without worrying about the syntax".
> 
> But unfortunately that would not work.  If we were to add a new
> "frotz" subcommand in a future version of Git that takes N args,
> 
> 	frotz SP <arg1> NUL <arg2> NUL ... <argN> NUL
> 
> may how you'd express it, and it would be wonderful if the current
> version of Git that does not know "frotz" subcommand can at least
> parse and ignore it.  That cannot however be done, though, because
> there is no syntactic difference between the argument terminator
> (after each arg) and subcommand terminator (after each subcommand).

I do agree it would be nice if we could separate syntax from semantics,
just because it lets us reuse and modify parsers more easily. We have
the same problem on output (e.g., anything parsing "diff --name-status
-z" has to know there are two name fields for an R or C entry), and it
has tripped people up in the past.

But:

  - I don't think we'd want a current version of Git to ignore an
    unknown frotz anyway. It would probably say "I don't understand
    frotz, so I don't know if it's safe to ignore it or not".

  - Given only the tool of NUL bytes, I'm not sure how to easily
    design a syntax that isn't awful. I.e., clearly something like:

      frotz SP 2 NUL <arg1> NUL <arg2> NUL

    works, but have we really made the world a better place? It would
    make more sense to me to use a standard serialization format like
    JSON, but that generally comes with its own inherited baggage (like
    non-UTF8 strings).

-Peff

[v2,1/1] config: learn the --stdin option for instructions

Commit Message

Comments

Patch