diff mbox series

pretty: find pretty formats case-insensitively

Message ID 20240324214316.917513-1-brianmlyles@gmail.com (mailing list archive)
State New
Headers show
Series pretty: find pretty formats case-insensitively | expand

Commit Message

Brian Lyles March 24, 2024, 9:43 p.m. UTC
User-defined pretty formats are stored in config, which is meant to use
case-insensitive matching for names as noted in config.txt's 'Syntax'
section:

    All the other lines [...] are recognized as setting variables, in
    the form 'name = value' [...]. The variable names are
    case-insensitive, [...].

When a user specifies one of their format aliases with an uppercase in
it, however, it is not found.

    $ git config pretty.testAlias %h
    $ git config --list | grep pretty
    pretty.testalias=%h
    $ git log --format=testAlias -1
    fatal: invalid --pretty format: testAlias
    $ git log --format=testalias -1
    3c2a3fdc38

This is true whether the name in the config file uses any uppercase
characters or not.

Normalize the format name specified via `--format` to lowercase so that
format aliases are found case-insensitively. The format aliases loaded
from config against which this name is compared are already normalized
to lowercase since they are loaded through `git_config()`.

`xstrdup_tolower` is used instead of modifying the string in-place to
ensure that the error shown to the user when the format is not found has
the same casing that the user entered. Otherwise, the mismatch may be
confusing to the user.

Signed-off-by: Brian Lyles <brianmlyles@gmail.com>
---
 pretty.c                      | 12 +++++++++++-
 t/t4205-log-pretty-formats.sh |  7 +++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

Comments

Jeff King March 25, 2024, 6:14 a.m. UTC | #1
On Sun, Mar 24, 2024 at 04:43:09PM -0500, Brian Lyles wrote:

> User-defined pretty formats are stored in config, which is meant to use
> case-insensitive matching for names as noted in config.txt's 'Syntax'
> section:
> 
>     All the other lines [...] are recognized as setting variables, in
>     the form 'name = value' [...]. The variable names are
>     case-insensitive, [...].
> 
> When a user specifies one of their format aliases with an uppercase in
> it, however, it is not found.
> 
>     $ git config pretty.testAlias %h
>     $ git config --list | grep pretty
>     pretty.testalias=%h
>     $ git log --format=testAlias -1
>     fatal: invalid --pretty format: testAlias
>     $ git log --format=testalias -1
>     3c2a3fdc38

Yeah, I agree that case-insensitive matching makes more sense here due
to the nature of config keys, especially given this:

> This is true whether the name in the config file uses any uppercase
> characters or not.

I.e., the config code is going to normalize the variable names already,
so we must match (even if the user consistently specifies camelCase).

But...

>  static struct cmt_fmt_map *find_commit_format(const char *sought)
>  {
> +	struct cmt_fmt_map *result;
> +	char *sought_lower;
> +
>  	if (!commit_formats)
>  		setup_commit_formats();
>  
> -	return find_commit_format_recursive(sought, sought, 0);
> +	/*
> +	 * The sought name will be compared to config names that have already
> +	 * been normalized to lowercase.
> +	 */
> +	sought_lower = xstrdup_tolower(sought);
> +	result = find_commit_format_recursive(sought_lower, sought_lower, 0);
> +	free(sought_lower);
> +	return result;
>  }

The mention of "recursive" in the function we call made me what wonder
if we'd need more normalization. And I think we do. Try this
modification to your test:

diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index 321e305979..be549b1d4b 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -61,8 +61,9 @@ test_expect_success 'alias user-defined format' '
 
 test_expect_success 'alias user-defined format is matched case-insensitively' '
 	git log --pretty="format:%h" >expected &&
-	git config pretty.testalias "format:%h" &&
-	git log --pretty=testAlias >actual &&
+	git config pretty.testone "format:%h" &&
+	git config pretty.testtwo testOne &&
+	git log --pretty=testTwo >actual &&
 	test_cmp expected actual
 '
 

which fails because looking up "testOne" in the recursion won't work. So
I think we'd want to simply match case-insensitively inside the
function, like:

diff --git a/pretty.c b/pretty.c
index 50825c9d25..10f71ee004 100644
--- a/pretty.c
+++ b/pretty.c
@@ -147,7 +147,7 @@ static struct cmt_fmt_map *find_commit_format_recursive(const char *sought,
 	for (i = 0; i < commit_formats_len; i++) {
 		size_t match_len;
 
-		if (!starts_with(commit_formats[i].name, sought))
+		if (!istarts_with(commit_formats[i].name, sought))
 			continue;
 
 		match_len = strlen(commit_formats[i].name);

And then you would not even need to normalize it in
find_commit_format().

> +test_expect_success 'alias user-defined format is matched case-insensitively' '
> +	git log --pretty="format:%h" >expected &&
> +	git config pretty.testalias "format:%h" &&
> +	git log --pretty=testAlias >actual &&
> +	test_cmp expected actual
> +'

Modern style would be to use "test_config" here (or just "git -c"), but
I see the surrounding tests are too old to do so. So I'd be OK with
matching them (but cleaning up all of the surrounding ones would be
nice, too).

-Peff

PS The matching rules in find_commit_format_recursive() seem weird
   to me. We do a prefix match, and then return the entry whose name is
   the shortest? And break ties based on which came first? So:

     git -c pretty.abcd=format:one \
         -c pretty.abc=format:two \
         -c pretty.abd=format:three \
	 log -1 --format=ab

   quietly chooses "two". I guess the "shortest wins" is meant to allow
   "foo" to be chosen over "foobar" if you specify the whole name. But
   the fact that we don't flag an ambiguity between "abc" and "abd"
   seems strange.

   That is all orthogonal to your patch, of course, but just a
   head-scratcher I noticed while looking at the code.
Brian Lyles March 25, 2024, 7:08 a.m. UTC | #2
Hi Peff

Thanks for the review.

On Mon, Mar 25, 2024 at 1:14 AM Jeff King <peff@peff.net> wrote:

> The mention of "recursive" in the function we call made me what wonder
> if we'd need more normalization. And I think we do. Try this
> modification to your test:
> 
> diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
> index 321e305979..be549b1d4b 100755
> --- a/t/t4205-log-pretty-formats.sh
> +++ b/t/t4205-log-pretty-formats.sh
> @@ -61,8 +61,9 @@ test_expect_success 'alias user-defined format' '
>  
>  test_expect_success 'alias user-defined format is matched case-insensitively' '
>  	git log --pretty="format:%h" >expected &&
> -	git config pretty.testalias "format:%h" &&
> -	git log --pretty=testAlias >actual &&
> +	git config pretty.testone "format:%h" &&
> +	git config pretty.testtwo testOne &&
> +	git log --pretty=testTwo >actual &&
>  	test_cmp expected actual
>  '
>  
> 
> which fails because looking up "testOne" in the recursion won't work. So
> I think we'd want to simply match case-insensitively inside the
> function, like:
> 
> diff --git a/pretty.c b/pretty.c
> index 50825c9d25..10f71ee004 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -147,7 +147,7 @@ static struct cmt_fmt_map *find_commit_format_recursive(const char *sought,
>  	for (i = 0; i < commit_formats_len; i++) {
>  		size_t match_len;
>  
> -		if (!starts_with(commit_formats[i].name, sought))
> +		if (!istarts_with(commit_formats[i].name, sought))
>  			continue;
>  
>  		match_len = strlen(commit_formats[i].name);
> 
> And then you would not even need to normalize it in
> find_commit_format().

Good catch -- you're absolutely right, and simply switching to
`istarts_with` is a more elegant solution than my initial patch. I'll
switch to this approach in a v2 re-roll.

>> +test_expect_success 'alias user-defined format is matched case-insensitively' '
>> +	git log --pretty="format:%h" >expected &&
>> +	git config pretty.testalias "format:%h" &&
>> +	git log --pretty=testAlias >actual &&
>> +	test_cmp expected actual
>> +'
> 
> Modern style would be to use "test_config" here (or just "git -c"), but
> I see the surrounding tests are too old to do so. So I'd be OK with
> matching them (but cleaning up all of the surrounding ones would be
> nice, too).

Thanks for the tip. Updating the existing tests in this file to use
`test_config` looks to be fairly trivial, so I will start v2 with a
patch that does that as well. I'm opting for `test_config` over `git -c`
for no real reason other than they seem roughly equivalent, but
`test_config` still ends up calling `git config` which seems slightly
more realistic to how pretty formats would be defined normally.

> PS The matching rules in find_commit_format_recursive() seem weird
>    to me. We do a prefix match, and then return the entry whose name is
>    the shortest? And break ties based on which came first? So:
> 
>      git -c pretty.abcd=format:one \
>          -c pretty.abc=format:two \
>          -c pretty.abd=format:three \
> 	 log -1 --format=ab
> 
>    quietly chooses "two". I guess the "shortest wins" is meant to allow
>    "foo" to be chosen over "foobar" if you specify the whole name. But
>    the fact that we don't flag an ambiguity between "abc" and "abd"
>    seems strange.
> 
>    That is all orthogonal to your patch, of course, but just a
>    head-scratcher I noticed while looking at the code.

I agree that this behavior is somewhat odd. I'm not sure what we would
want to do about it at this point -- any change would technically be
breaking, I assume. Regardless, not something I'd scope into this patch,
but good observation.
Junio C Hamano March 25, 2024, 6:12 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> The mention of "recursive" in the function we call made me what wonder
> if we'd need more normalization. And I think we do. Try this
> modification to your test:
>
> diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
> index 321e305979..be549b1d4b 100755
> --- a/t/t4205-log-pretty-formats.sh
> +++ b/t/t4205-log-pretty-formats.sh
> @@ -61,8 +61,9 @@ test_expect_success 'alias user-defined format' '
>  
>  test_expect_success 'alias user-defined format is matched case-insensitively' '
>  	git log --pretty="format:%h" >expected &&
> -	git config pretty.testalias "format:%h" &&
> -	git log --pretty=testAlias >actual &&
> +	git config pretty.testone "format:%h" &&
> +	git config pretty.testtwo testOne &&
> +	git log --pretty=testTwo >actual &&
>  	test_cmp expected actual
>  '

Very good thinking.  I totally missed the short-cut to another
short-cut while reading the patch.

>> +test_expect_success 'alias user-defined format is matched case-insensitively' '
>> +	git log --pretty="format:%h" >expected &&
>> +	git config pretty.testalias "format:%h" &&
>> +	git log --pretty=testAlias >actual &&
>> +	test_cmp expected actual
>> +'
>
> Modern style would be to use "test_config" here (or just "git -c"), but
> I see the surrounding tests are too old to do so. So I'd be OK with
> matching them (but cleaning up all of the surrounding ones would be
> nice, too).

Yup.  I do not mind seeing it done either way, as a preliminary
clean-up before the main fix, just a fix with more modern style
while leaving the clean-up as #leftoverbits to be done after the
dust settles.

> PS The matching rules in find_commit_format_recursive() seem weird
>    to me. We do a prefix match, and then return the entry whose name is
>    the shortest? And break ties based on which came first? So:
>
>      git -c pretty.abcd=format:one \
>          -c pretty.abc=format:two \
>          -c pretty.abd=format:three \
> 	 log -1 --format=ab
>
>    quietly chooses "two". I guess the "shortest wins" is meant to allow
>    "foo" to be chosen over "foobar" if you specify the whole name. But
>    the fact that we don't flag an ambiguity between "abc" and "abd"
>    seems strange.
>    That is all orthogonal to your patch, of course, but just a
>    head-scratcher I noticed while looking at the code.

I think it is not just strange but outright wrong.  I agree that it
is orthogonal to this fix.

Thanks, both.
diff mbox series

Patch

diff --git a/pretty.c b/pretty.c
index cf964b060c..78ec7a75ff 100644
--- a/pretty.c
+++ b/pretty.c
@@ -168,10 +168,20 @@  static struct cmt_fmt_map *find_commit_format_recursive(const char *sought,
 
 static struct cmt_fmt_map *find_commit_format(const char *sought)
 {
+	struct cmt_fmt_map *result;
+	char *sought_lower;
+
 	if (!commit_formats)
 		setup_commit_formats();
 
-	return find_commit_format_recursive(sought, sought, 0);
+	/*
+	 * The sought name will be compared to config names that have already
+	 * been normalized to lowercase.
+	 */
+	sought_lower = xstrdup_tolower(sought);
+	result = find_commit_format_recursive(sought_lower, sought_lower, 0);
+	free(sought_lower);
+	return result;
 }
 
 void get_commit_format(const char *arg, struct rev_info *rev)
diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
index e3d655e6b8..321e305979 100755
--- a/t/t4205-log-pretty-formats.sh
+++ b/t/t4205-log-pretty-formats.sh
@@ -59,6 +59,13 @@  test_expect_success 'alias user-defined format' '
 	test_cmp expected actual
 '
 
+test_expect_success 'alias user-defined format is matched case-insensitively' '
+	git log --pretty="format:%h" >expected &&
+	git config pretty.testalias "format:%h" &&
+	git log --pretty=testAlias >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'alias user-defined tformat with %s (ISO8859-1 encoding)' '
 	git config i18n.logOutputEncoding $test_encoding &&
 	git log --oneline >expected-s &&