mbox series

[0/8] builtin/config: canonicalize "value_regex" with `--type=bool[-or-int]`

Message ID cover.1573670565.git.martin.agren@gmail.com (mailing list archive)
Headers show
Series builtin/config: canonicalize "value_regex" with `--type=bool[-or-int]` | expand

Message

Martin Ågren Nov. 13, 2019, 6:54 p.m. UTC
To find all config items "foo.*" that are configured as the Boolean
value "true", you can try executing

  git config --type=bool --name-only --get-regexp '^foo\.' true

... and hope that you didn't spell "true" as "on", "True" or "1" back
when you populated your config. This shortcoming has been mentioned as a
left-over bit [1] [2].

This patch series teaches `git config` to canonicalize the incoming
"value_regex" ("true" in the example above), then canonicalize candidate
values as we go through the config. Or if you will, `git config` learns
a brand new type of regex, corresponding to the different ways there are
of spelling "true" and "false", respectively.

`--type=bool-or-int` gets the same treatment, except we need to to be
able to handle the ints and regexes matching particular ints that we
must expect. That said, even with `--type=bool` we can't move too
aggressively towards *requiring* that the incoming "value_regex"
canonializes as a Boolean value. The penultimate patch starts to warn on
non-canonicalizing values; the final patch makes us bail out entirely.

The last patch is not meant for immediate inclusion, but I post it
anyway. I can re-submit it at an appropriate time, or maybe it could
slumber on pu until the time is ripe for completing the switch.

[1] https://git-blame.blogspot.com/p/leftover-bits.html
[2] https://public-inbox.org/git/xmqq7frsh4tw.fsf@gitster.dls.corp.google.com/

Martin Ågren (8):
  config: make `git_parse_maybe_bool_text()` public
  t1300: modernize part of script
  builtin/config: extract `handle_value_regex()` from `get_value()`
  builtin/config: collect "value_regexp" data in a struct
  builtin/config: canonicalize "value_regex" with `--type=bool`
  builtin/config: canonicalize "value_regex" with `--type=bool-or-int`
  builtin/config: warn if "value_regex" doesn't canonicalize as boolean
  builtin/config: die if "value_regex" doesn't canonicalize as boolean

 Documentation/git-config.txt |   5 +
 builtin/config.c             |  84 +++++++++++----
 config.c                     |   2 +-
 config.h                     |   7 ++
 t/t1300-config.sh            | 199 ++++++++++++++++++++++-------------
 5 files changed, 203 insertions(+), 94 deletions(-)

Comments

Junio C Hamano Nov. 14, 2019, 2:18 a.m. UTC | #1
Martin Ågren <martin.agren@gmail.com> writes:

>   git config --type=bool --name-only --get-regexp '^foo\.' true
> ...
> This patch series teaches `git config` to canonicalize the incoming
> "value_regex" ("true" in the example above), then canonicalize candidate
> values as we go through the config. Or if you will, `git config` learns
> a brand new type of regex, corresponding to the different ways there are
> of spelling "true" and "false", respectively.

Nice ;-)

> `--type=bool-or-int` gets the same treatment, except we need to to be
> able to handle the ints and regexes matching particular ints that we
> must expect.

Hmm, so I can say 1024k or 1m and that would match 1048576?  

Doubly nice.

Looking forward to reading it thru.
Jeff King Nov. 14, 2019, 6:29 a.m. UTC | #2
On Wed, Nov 13, 2019 at 07:54:59PM +0100, Martin Ågren wrote:

> To find all config items "foo.*" that are configured as the Boolean
> value "true", you can try executing
> 
>   git config --type=bool --name-only --get-regexp '^foo\.' true
> 
> ... and hope that you didn't spell "true" as "on", "True" or "1" back
> when you populated your config. This shortcoming has been mentioned as a
> left-over bit [1] [2].

This seems like a good idea, but I wonder why we'd limit it to bools?
It seems like any type would probably be better off matching a
normalized version.

We already have two functions in builtin/config.c which handle types:

  - format_config() is for actually interpreting an existing value and
    writing it to stdout

  - normalize_value() is used when writing a new value into a config
    file, and normalizing what the user provided on the command-line

I don't think you'd want to use format_config() here. For example, if I
say:

  git config --type=color --get-regexp ^foo red

I want to match the original "red" color, but _output_ the ANSI code.
But normalize_value(), by contrast, leaves colors intact. So maybe it's
a good match?

OTOH, I'd probably expect to match expanded pathnames or expiration
dates there, too, and it doesn't expand those. Those ones are less clear
to me. The whole premise of this series is making an assumption that
"--type" is about how you want to match, and not just about what you
want to output. In your example above it's clear that you don't care
about the output type (because you're using --name-only), but for:

  git config --type=path --get-regexp ^foo /home/peff

it's unclear if you want to match a value of "~peff/foo", or if you
simply want to output the expansion.

I wonder if we'd want to allow specifying the output type and the
matching type separately? Maybe that just makes it more awkward to use
for little benefit (I admit that it's hard to imagine somebody wanting
to normalize bools on output but _not_ for matching).

Anyway. It would be nice if we could build on the existing logic in some
way, rather than making a third place that has to handle every type we
know about.

> `--type=bool-or-int` gets the same treatment, except we need to to be
> able to handle the ints and regexes matching particular ints that we
> must expect. That said, even with `--type=bool` we can't move too
> aggressively towards *requiring* that the incoming "value_regex"
> canonializes as a Boolean value. The penultimate patch starts to warn on
> non-canonicalizing values; the final patch makes us bail out entirely.
> 
> The last patch is not meant for immediate inclusion, but I post it
> anyway. I can re-submit it at an appropriate time, or maybe it could
> slumber on pu until the time is ripe for completing the switch.

I think bailing on values that can't be converted is normal for other
code paths. E.g., just trying to print:

  $ git -c foo.bar=abc config --type=bool foo.abr
  fatal: bad numeric config value 'abc' for 'foo.bar': invalid unit

-Peff
Martin Ågren Nov. 14, 2019, 6:40 a.m. UTC | #3
Hi Junio

On Thu, 14 Nov 2019 at 03:19, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > `--type=bool-or-int` gets the same treatment, except we need to to be
> > able to handle the ints and regexes matching particular ints that we
> > must expect.
>
> Hmm, so I can say 1024k or 1m and that would match 1048576?
>
> Doubly nice.
>
> Looking forward to reading it thru.

Maybe you already noticed, but no, I didn't get to canonicalizing
integers like that. What I meant was, type=bool-or-int learns to handle
booleans similar to what I did to type=bool.

I don't feel entirely satisfied by some of my commit message oneliners.
They could make that a bit clearer, I think.

Not directly related to your question, but I realize now that with
type=bool-or-int, I only added the first of these example usages below
as a test. The second one is perhaps just as interesting.

$ ./git -c an.int=1 config --get --type=bool-or-int an.int 1
1
$ ./git -c an.int=1 config --get --type=bool-or-int an.int on
1

This last one emits "1". That's because by the time we've decided to
output the value, `format_config()` has some logic around
type=bool-or-int, but doesn't know about why exactly we selected this
an.int=1 in the first place
  (git_parse_maybe_bool("1") == git_parse_maybe_bool_TEXT("on")).

Just after thinking about this for a short while, I can't immediately
say whether this second one should emit "1" or "true". My added
documentation is actually vague enough to allow both of these to
happen... I'll ponder this.


Martin
Martin Ågren Nov. 14, 2019, 6:54 a.m. UTC | #4
On Thu, 14 Nov 2019 at 07:29, Jeff King <peff@peff.net> wrote:
> This seems like a good idea, but I wonder why we'd limit it to bools?

Basically because that's what the existing left-over-bits mentioned and
it felt like a good place to start. But you're right in asking about the
bigger picture up front.

> It seems like any type would probably be better off matching a
> normalized version.
>
> We already have two functions in builtin/config.c which handle types:
>
>   - format_config() is for actually interpreting an existing value and
>     writing it to stdout
>
>   - normalize_value() is used when writing a new value into a config
>     file, and normalizing what the user provided on the command-line
>
> I don't think you'd want to use format_config() here.

I just speculated a little around format_config() in a reply to Junio.
Already with my patch series and with type=bool-or-int, there's a
visible funny-funky corner case where one hand doesn't know what the
other is doing.

> For example, if I
> say:
>
>   git config --type=color --get-regexp ^foo red
>
> I want to match the original "red" color, but _output_ the ANSI code.
> But normalize_value(), by contrast, leaves colors intact. So maybe it's
> a good match?
>
> OTOH, I'd probably expect to match expanded pathnames or expiration
> dates there, too, and it doesn't expand those. Those ones are less clear
> to me. The whole premise of this series is making an assumption that
> "--type" is about how you want to match,

Right.

> and not just about what you
> want to output. In your example above it's clear that you don't care
> about the output type (because you're using --name-only), but for:
>
>   git config --type=path --get-regexp ^foo /home/peff
>
> it's unclear if you want to match a value of "~peff/foo", or if you
> simply want to output the expansion.

Hmm, I feel like we're opening a can of worms here.

> I wonder if we'd want to allow specifying the output type and the
> matching type separately? Maybe that just makes it more awkward to use
> for little benefit (I admit that it's hard to imagine somebody wanting
> to normalize bools on output but _not_ for matching).
>
> Anyway. It would be nice if we could build on the existing logic in some
> way, rather than making a third place that has to handle every type we
> know about.

Understood. Thanks a lot for sharing your thoughts.

> > The last patch is not meant for immediate inclusion, but I post it
> > anyway. I can re-submit it at an appropriate time, or maybe it could
> > slumber on pu until the time is ripe for completing the switch.
>
> I think bailing on values that can't be converted is normal for other
> code paths. E.g., just trying to print:
>
>   $ git -c foo.bar=abc config --type=bool foo.abr
>   fatal: bad numeric config value 'abc' for 'foo.bar': invalid unit

I'm not sure if you mean "... so we could be a lot more aggressive
here"?

I'm running now and I feel like I'll need to read your mail again
tonight and get back to you in more detail.

Thanks
Martin
Jeff King Nov. 14, 2019, 7:37 a.m. UTC | #5
On Thu, Nov 14, 2019 at 07:54:35AM +0100, Martin Ågren wrote:

> > > The last patch is not meant for immediate inclusion, but I post it
> > > anyway. I can re-submit it at an appropriate time, or maybe it could
> > > slumber on pu until the time is ripe for completing the switch.
> >
> > I think bailing on values that can't be converted is normal for other
> > code paths. E.g., just trying to print:
> >
> >   $ git -c foo.bar=abc config --type=bool foo.abr
> >   fatal: bad numeric config value 'abc' for 'foo.bar': invalid unit
> 
> I'm not sure if you mean "... so we could be a lot more aggressive
> here"?

Yeah, I think it's OK to be aggressive with bailing when the user gave
us a --type, but the value doesn't match it.

> I'm running now and I feel like I'll need to read your mail again
> tonight and get back to you in more detail.

Sure, no rush and thanks for working on it. :)

-Peff