diff mbox series

Questions about trailer configuration semantics

Message ID 87blk0rjob.fsf@0x63.nu (mailing list archive)
State New, archived
Headers show
Series Questions about trailer configuration semantics | expand

Commit Message

Anders Waldenborg July 27, 2020, 4:45 p.m. UTC
I noticed some undocumented and (at least to me) surprising behavior in
trailers.c.

When configuring a value in trailer.<token>.key it causes the trailer to
be normalized to that in "git interpret-trailers --parse".
E.g:
 $ printf '\naCKed: Zz\n' | \
   git -c 'trailer.Acked.key=Acked' interpret-trailers --parse
 will emit: "Acked: Zz"

but only if "key" is used, other config options doesn't cause it to be
normalized.
E.g:
 $ printf '\naCKed: Zz\n' | \
   git -c 'trailer.Acked.ifmissing=doNothing' interpret-trailers --parse
 will emit: "aCKed: Zz" (still lowercase a and uppercase CK)


Then there is the replacement by config "trailer.fix.key=Fixes" which
expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'"
which seems to be expected and useful behavior (albeit a bit unclear in
documentation). But it also happens when parsing incoming trailers, e.g
with that config
 $ printf "\nFix: 1\n" | git interpret-trailers --parse
 will emit: "Fixes: 1"

(token_from_item prefers order .key, incoming token, .name)


The most surprising thing is that it uses prefix matching when finding
they key in configuration. If I have "trailer.reviewed.key=Reviewed-By"
it is possible to just '--trailer r=XYZ' and it will find the
reviewed-by trailer as "r" is a prefix of reviewedby. This also applies
to the "--parse". This in makes it impossible to have trailer keys that
are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if
there is multiple matching in configuration it will just pick the one
that happens to come first.

(token_matches_item uses strncasecmp with token length)


I guess these are the questions for the above observations:

* Should normalization of spelling happen at all?

* If so should it only happen when there is a .key config?

* Should replacement to what is in .key happen also in --parse mode, or
  only for "--trailer"

* The prefix matching gotta be a bug, right?



Here is a patch to the tests showing these things.



From 49a4bb64a7ebf1f2d50897a024deb86b4f8056b1 Mon Sep 17 00:00:00 2001
From: Anders Waldenborg <anders@0x63.nu>
Date: Mon, 27 Jul 2020 18:34:37 +0200
Subject: [PATCH] trailers: add tests for unclear/undocumented behavior

---
 t/t7513-interpret-trailers.sh | 70 +++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

--
2.25.1

Comments

Junio C Hamano July 27, 2020, 5:18 p.m. UTC | #1
[Redirecting it to the resident expert of the trailers]

Anders Waldenborg <anders@0x63.nu> writes:

> I noticed some undocumented and (at least to me) surprising behavior in
> trailers.c.
>
> When configuring a value in trailer.<token>.key it causes the trailer to
> be normalized to that in "git interpret-trailers --parse".
> E.g:
>  $ printf '\naCKed: Zz\n' | \
>    git -c 'trailer.Acked.key=Acked' interpret-trailers --parse
>  will emit: "Acked: Zz"
>
> but only if "key" is used, other config options doesn't cause it to be
> normalized.
> E.g:
>  $ printf '\naCKed: Zz\n' | \
>    git -c 'trailer.Acked.ifmissing=doNothing' interpret-trailers --parse
>  will emit: "aCKed: Zz" (still lowercase a and uppercase CK)
>
>
> Then there is the replacement by config "trailer.fix.key=Fixes" which
> expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'"
> which seems to be expected and useful behavior (albeit a bit unclear in
> documentation). But it also happens when parsing incoming trailers, e.g
> with that config
>  $ printf "\nFix: 1\n" | git interpret-trailers --parse
>  will emit: "Fixes: 1"
>
> (token_from_item prefers order .key, incoming token, .name)
>
>
> The most surprising thing is that it uses prefix matching when finding
> they key in configuration. If I have "trailer.reviewed.key=Reviewed-By"
> it is possible to just '--trailer r=XYZ' and it will find the
> reviewed-by trailer as "r" is a prefix of reviewedby. This also applies
> to the "--parse". This in makes it impossible to have trailer keys that
> are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if
> there is multiple matching in configuration it will just pick the one
> that happens to come first.
>
> (token_matches_item uses strncasecmp with token length)
>
>
> I guess these are the questions for the above observations:
>
> * Should normalization of spelling happen at all?
>
> * If so should it only happen when there is a .key config?
>
> * Should replacement to what is in .key happen also in --parse mode, or
>   only for "--trailer"
>
> * The prefix matching gotta be a bug, right?
>
>
>
> Here is a patch to the tests showing these things.
>
>
>
> From 49a4bb64a7ebf1f2d50897a024deb86b4f8056b1 Mon Sep 17 00:00:00 2001
> From: Anders Waldenborg <anders@0x63.nu>
> Date: Mon, 27 Jul 2020 18:34:37 +0200
> Subject: [PATCH] trailers: add tests for unclear/undocumented behavior
>
> ---
>  t/t7513-interpret-trailers.sh | 70 +++++++++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>
> diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
> index 2e6d406edf..d5d19cf89b 100755
> --- a/t/t7513-interpret-trailers.sh
> +++ b/t/t7513-interpret-trailers.sh
> @@ -99,6 +99,64 @@ test_expect_success 'with config option on the command line' '
>  	test_cmp expected actual
>  '
>
> +test_expect_success 'parse normalizes spelling and separators from configs with key' '
> +	cat >patch <<-\EOF &&
> +		non-trailer-line
> +
> +		ReviEweD-bY :abc
> +		ReviEwEd-bY) rst
> +		ReviEweD-BY ; xyz
> +		aCked-bY: not normalized
> +	EOF
> +	cat >expected <<-\EOF &&
> +		Reviewed-By: abc
> +		Reviewed-By: rst
> +		Reviewed-By: xyz
> +		aCked-bY: not normalized
> +	EOF
> +	git \
> +		-c "trailer.separators=:);" \
> +		-c "trailer.rb.key=Reviewed-By" \
> +		-c "trailer.Acked-By.ifmissing=doNothing" \
> +		interpret-trailers --parse patch >actual &&
> +	test_cmp expected actual
> +'
> +
> +# Matching currently is prefix matching, causing "This-trailer" to be normalized too
> +test_expect_failure 'config option matches exact only' '
> +	cat >patch <<-\EOF &&
> +
> +		This-trailer: a
> +		 b
> +		This-trailer-exact: b
> +		 c
> +		This-trailer-exact-plus-some: c
> +		 d
> +	EOF
> +	cat >expected <<-\EOF &&
> +		This-trailer: a b
> +		THIS-TRAILER-EXACT: b c
> +		This-trailer-exact-plus-some: c d
> +	EOF
> +	git -c "trailer.tte.key=THIS-TRAILER-EXACT" interpret-trailers --parse patch >actual &&
> +	test_cmp expected actual
> +'
> +
> +# Matching currently uses the config key even if key value is different
> +test_expect_failure 'config option matches exact only' '
> +	cat >patch <<-\EOF &&
> +
> +		Ticket: 1234
> +		Reference-ticket: 99
> +	EOF
> +	cat >expected <<-\EOF &&
> +		Ticket: 1234
> +		Reference-Ticket: 99
> +	EOF
> +	git -c "trailer.ticket.key=Reference-Ticket" interpret-trailers --parse patch >actual &&
> +	test_cmp expected actual
> +'
> +
>  test_expect_success 'with only a title in the message' '
>  	cat >expected <<-\EOF &&
>  		area: change
> @@ -473,6 +531,18 @@ test_expect_success 'with config setup' '
>  	test_cmp expected actual
>  '
>
> +# currently this matches the "Acked-by: " value in ack key set by previous test
> +test_expect_failure 'with config setup matches key exactly' '
> +	cat >expected <<-\EOF &&
> +
> +		A: B
> +	EOF
> +	git interpret-trailers --trailer "A=10" empty >actual &&
> +	test_cmp expected actual
> +'
> +
> +
> +
>  test_expect_success 'with config setup and ":=" as separators' '
>  	git config trailer.separators ":=" &&
>  	git config trailer.ack.key "Acked-by= " &&
> --
> 2.25.1
Christian Couder July 27, 2020, 6:37 p.m. UTC | #2
[Adding Peff and Jonathan in Cc as they know also about this area of the code]

On Mon, Jul 27, 2020 at 7:18 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> [Redirecting it to the resident expert of the trailers]

Thanks!

> Anders Waldenborg <anders@0x63.nu> writes:
>
> > I noticed some undocumented and (at least to me) surprising behavior in
> > trailers.c.
> >
> > When configuring a value in trailer.<token>.key it causes the trailer to
> > be normalized to that in "git interpret-trailers --parse".
> > E.g:
> >  $ printf '\naCKed: Zz\n' | \
> >    git -c 'trailer.Acked.key=Acked' interpret-trailers --parse
> >  will emit: "Acked: Zz"

Yeah, I think that's nice, as it can make sure that the key appears in
the same way. It's true that it would be better if it would be
documented.

> > but only if "key" is used, other config options doesn't cause it to be
> > normalized.
> > E.g:
> >  $ printf '\naCKed: Zz\n' | \
> >    git -c 'trailer.Acked.ifmissing=doNothing' interpret-trailers --parse
> >  will emit: "aCKed: Zz" (still lowercase a and uppercase CK)

Yeah, in this case we are not sure if "Acked" or "aCKed" is the right
way to spell it.

> > Then there is the replacement by config "trailer.fix.key=Fixes" which
> > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'"
> > which seems to be expected and useful behavior (albeit a bit unclear in
> > documentation). But it also happens when parsing incoming trailers, e.g
> > with that config
> >  $ printf "\nFix: 1\n" | git interpret-trailers --parse
> >  will emit: "Fixes: 1"

Yeah, I think it allows for shortcuts and can help with standardizing
the keys in commit messages.

> > (token_from_item prefers order .key, incoming token, .name)
> >
> >
> > The most surprising thing is that it uses prefix matching when finding
> > they key in configuration. If I have "trailer.reviewed.key=Reviewed-By"
> > it is possible to just '--trailer r=XYZ' and it will find the
> > reviewed-by trailer as "r" is a prefix of reviewedby. This also applies
> > to the "--parse".

Yeah, that's also for shortcuts and standardization.

> > This in makes it impossible to have trailer keys that
> > are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if
> > there is multiple matching in configuration it will just pick the one
> > that happens to come first.

That's a downside of the above. I agree that it might seem strange or
bad. Perhaps an option could be added to implement a strict matching,
if people really want it.

Also if you configure trailers in the "Acked", "Acked-Tests",
"Acked-Docs" order, then any common prefix will pick "Acked" which
could be considered ok in my opinion.

> > (token_matches_item uses strncasecmp with token length)
> >
> >
> > I guess these are the questions for the above observations:
> >
> > * Should normalization of spelling happen at all?

Yes, I think it can help.

> > * If so should it only happen when there is a .key config?

Yes, it can help too if that only happens when there is a .key config.

> > * Should replacement to what is in .key happen also in --parse mode, or
> >   only for "--trailer"

I think it's more consistent if it happens in both --parse and
--trailer mode. I didn't implement --parse though.

> > * The prefix matching gotta be a bug, right?

No, it's a feature ;-) Seriously I agree that this could be seen as a
downside, but I think it can be understood that the convenience is
worth it. And in case someone is really annoyed by this, then adding
an option for strict matching should not be very difficult.

> > Here is a patch to the tests showing these things.

Thanks for the patch! I would be ok to add such a patch to the test
suite if it was sent like a regular patch (so with a commit message, a
Signed-off-by: and so on) to the mailing list. While at it some
documentation of the related behavior would also be very nice.
Jeff King July 27, 2020, 7:40 p.m. UTC | #3
On Mon, Jul 27, 2020 at 08:37:26PM +0200, Christian Couder wrote:

> > > I noticed some undocumented and (at least to me) surprising behavior in
> > > trailers.c.
> > >
> > > When configuring a value in trailer.<token>.key it causes the trailer to
> > > be normalized to that in "git interpret-trailers --parse".
> > > E.g:
> > >  $ printf '\naCKed: Zz\n' | \
> > >    git -c 'trailer.Acked.key=Acked' interpret-trailers --parse
> > >  will emit: "Acked: Zz"
> 
> Yeah, I think that's nice, as it can make sure that the key appears in
> the same way. It's true that it would be better if it would be
> documented.

I'd note that this also happens without --parse.

> > > Then there is the replacement by config "trailer.fix.key=Fixes" which
> > > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'"
> > > which seems to be expected and useful behavior (albeit a bit unclear in
> > > documentation). But it also happens when parsing incoming trailers, e.g
> > > with that config
> > >  $ printf "\nFix: 1\n" | git interpret-trailers --parse
> > >  will emit: "Fixes: 1"
> [...]
> > > * Should replacement to what is in .key happen also in --parse mode, or
> > >   only for "--trailer"
> 
> I think it's more consistent if it happens in both --parse and
> --trailer mode. I didn't implement --parse though.

I don't recall being aware of this prefix matching until this thread, so
I doubt that the current behavior of --parse was something I tried for
intentionally. It's mostly just using the existing code, plus a few
extra options (listed in the docs). I'm not opposed to adding an option
to do strict matching and/or avoid rewriting, and then possibly adding
that into --parse by default.

I don't have much of an opinion on which behavior would be preferred.
I've never actually had a use case for configuring trailer.*.key, as I
usually am only looking at reading existing trailers to collect stats,
etc.

-Peff
Junio C Hamano July 27, 2020, 8:11 p.m. UTC | #4
Christian Couder <christian.couder@gmail.com> writes:

>> >  $ printf '\naCKed: Zz\n' | \
>> >    git -c 'trailer.Acked.key=Acked' interpret-trailers --parse
>> >  will emit: "Acked: Zz"
>
> Yeah, I think that's nice, as it can make sure that the key appears in
> the same way. It's true that it would be better if it would be
> documented.
>
>> > but only if "key" is used, other config options doesn't cause it to be
>> > normalized.
>> > E.g:
>> >  $ printf '\naCKed: Zz\n' | \
>> >    git -c 'trailer.Acked.ifmissing=doNothing' interpret-trailers --parse
>> >  will emit: "aCKed: Zz" (still lowercase a and uppercase CK)
>
> Yeah, in this case we are not sure if "Acked" or "aCKed" is the right
> way to spell it.

OK, so in short, the trailer subsystem matches the second level of
the configuration variable name (e.g. "Acked") in a case insensitive
way, and it does *not* normalize the case in the output.  The .key 
request is a mechanism to replace the matched key with the specified
string, so there is *NO* case normalization in what Anders observed.

In other words,

  $ printf '\naCKed: Zz\n' | \
    git -c 'trailer.Acked.key=Rejected' interpret-trailers --parse

would have emitted "Rejected: Zz".
Anders Waldenborg July 27, 2020, 9:41 p.m. UTC | #5
Christian Couder writes:

>> > When configuring a value in trailer.<token>.key it causes the trailer to
>> > be normalized to that in "git interpret-trailers --parse".
>
> Yeah, I think that's nice, as it can make sure that the key appears in
> the same way. It's true that it would be better if it would be
> documented.
>
>> > but only if "key" is used, other config options doesn't cause it to be
>> > normalized.
>
> Yeah, in this case we are not sure if "Acked" or "aCKed" is the right
> way to spell it.

Makes sense.

However I guess one alternative implementation would be that if
trailer.X.something is configured but not trailer.X.key it would work as
if there was an implicit "trailer.X.key=X" configured. The name of the
configuration value would specify the correct spelling.


>> > Then there is the replacement by config "trailer.fix.key=Fixes" which
>> > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'"
>> > which seems to be expected and useful behavior (albeit a bit unclear in
>> > documentation). But it also happens when parsing incoming trailers, e.g
>> > with that config
>> >  $ printf "\nFix: 1\n" | git interpret-trailers --parse
>> >  will emit: "Fixes: 1"
>
> Yeah, I think it allows for shortcuts and can help with standardizing
> the keys in commit messages.

Maybe what I'm missing is a clear picture of the different cases that
"git interpret-trailers" is being used in. The "--trailer x=10" option
seems clearly designed for human input trying to be as helpful as
possible (e.g always allowing '=' regardless of separators
configured). But when reading a message with trailers, is same
helpfulness useful? Or is it always only reading proper trailers
previously added by --trailer?



The standardization of trailers is interesting. If I want to standardize
"ReviewedBy", "Reviewer", "Code-Reviewer" trailers to "Reviewed-By" I
would add these configs:

trailer.reviewer.key = Reviewed-By
trailer.ReviewedBy.key = Reviewed-By
trailer.Code-Reviewer.key = Reviewed-By

Seems useful (and works out of the box as a msg-filter to
filter-branch). But the configuration seems a bit backwards, it is
configured a 3 different trailer entries, rather that a single trailer
entry with three aliases (or something like that).



>> > (token_from_item prefers order .key, incoming token, .name)
>> >
>> >
>> > The most surprising thing is that it uses prefix matching when finding
>> > they key in configuration. If I have "trailer.reviewed.key=Reviewed-By"
>> > it is possible to just '--trailer r=XYZ' and it will find the
>> > reviewed-by trailer as "r" is a prefix of reviewedby. This also applies
>> > to the "--parse".
>
> Yeah, that's also for shortcuts and standardization.
>
>> > This in makes it impossible to have trailer keys that
>> > are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if
>> > there is multiple matching in configuration it will just pick the one
>> > that happens to come first.
>
> That's a downside of the above. I agree that it might seem strange or
> bad. Perhaps an option could be added to implement a strict matching,
> if people really want it.
>
> Also if you configure trailers in the "Acked", "Acked-Tests",
> "Acked-Docs" order, then any common prefix will pick "Acked" which
> could be considered ok in my opinion.

Yeah, that works. But that dependency on order of configuration is quite
subtle imho.

E.g consider:

$ cat >msg <<EOF

Acked: acked
Acked-Test: test
Acked-Docs: docs
EOF
$ git -c 'trailer.Acked.key=Acked' \
      -c 'trailer.Acked-Tests.key=Acked-Tests' \
      -c 'trailer.Acked-Docs.key=Acked-Docs' \
      interpret-trailers --parse msg
Acked: acked
Acked-Tests: test
Acked-Docs: docs
$ git -c 'trailer.Acked-Docs.key=Acked-Docs' \
      -c 'trailer.Acked-Tests.key=Acked-Tests' \
      -c 'trailer.Acked.key=Acked' \
      interpret-trailers --parse msg
Acked-Docs: acked
Acked-Tests: test
Acked-Docs: docs
$ git -c 'trailer.Acked-Tests.key=Acked-Tests' \
      -c 'trailer.Acked-Docs.key=Acked-Docs' \
      -c 'trailer.Acked.key=Acked' \
      interpret-trailers --parse msg
Acked-Tests: acked
Acked-Tests: test
Acked-Docs: docs





>
>> > (token_matches_item uses strncasecmp with token length)
>> >
>> >
>> > I guess these are the questions for the above observations:
>> >
>> > * Should normalization of spelling happen at all?
>
> Yes, I think it can help.
>
>> > * If so should it only happen when there is a .key config?
>
> Yes, it can help too if that only happens when there is a .key config.
>
>> > * Should replacement to what is in .key happen also in --parse mode, or
>> >   only for "--trailer"
>
> I think it's more consistent if it happens in both --parse and
> --trailer mode. I didn't implement --parse though.


Keep in mind that the machinery used by interpret-trailers is also used
by pretty '%(trailers)' so whatever normalization and shortcuts
happening also shows up there. Is shortcuts useful in that case too?
There the input is commit history, not some user input.

E.g:
$ git -c 'trailer.signed-off-by.key=Attest' \
      show --pretty='format:%(trailers:key=Attest)' --no-patch \
      0172f7834a31ae7cb9873feaaaa63939352fa3ae
Attest: Christian Couder <chriscool@tuxfamily.org>
Attest: Junio C Hamano <gitster@pobox.com>


There is also some inconsistency here. If one use '%(trailers) the
normalization doesn't happen. Only if using '%(trailers:only)' or some
other option.

(because optimization in format_trailer_info:
 /* If we want the whole block untouched, we can take the fast path. */)

I guess the fact that expansion happens also in these cases can get
confusing if I have configured a trailer "Bug-Introduced-In" locally and
the commit message contains "Bug: 123".

'git log --pretty=format:"%h% (trailers:key=Bug-Introduced-In,valueonly,separator=%x20)"'
would pick up data from the wrong trailer just because I added
configuration for Bug-Introduced-In trailer.



>> > * The prefix matching gotta be a bug, right?
>
> No, it's a feature ;-) Seriously I agree that this could be seen as a
> downside, but I think it can be understood that the convenience is
> worth it. And in case someone is really annoyed by this, then adding
> an option for strict matching should not be very difficult.
>
>> > Here is a patch to the tests showing these things.
>
> Thanks for the patch! I would be ok to add such a patch to the test
> suite if it was sent like a regular patch (so with a commit message, a
> Signed-off-by: and so on) to the mailing list. While at it some
> documentation of the related behavior would also be very nice.

Should I keep the "test_expect_failure" tests, or change into expecting
current behavior and switch them over to "test_except_success"?

I'll see if I can do something about documentation.
Anders Waldenborg July 27, 2020, 10:17 p.m. UTC | #6
Junio C Hamano writes:

> Christian Couder <christian.couder@gmail.com> writes:
>> Yeah, in this case we are not sure if "Acked" or "aCKed" is the right
>> way to spell it.
>
> OK, so in short, the trailer subsystem matches the second level of
> the configuration variable name (e.g. "Acked") in a case insensitive
> way

From what I can understand it tries to match *both* on the second level
AND the value of .key (trailers.c:token_matches_item)

$ printf '\na: 1\nb: 2\nc: 3\n' | \
  git -c 'trailer.A.key=B' interpret-trailers
B: 1
B: 2
c: 3

and then uses the .key value when outputting the result (by calling
trailer.c:token_from_item)

I.e:
it gets "a: 1", tries to find configuration for that, and finds
trailer.A because "a" (case insenitively) matches conf.name, therefore
it outputs value of trailer.A.key + separator + "1"

then it gets "b: 1", and again finds trailer.A because "b" matches
conf.key.

> , and it does *not* normalize the case in the output.  The .key
> request is a mechanism to replace the matched key with the specified
> string, so there is *NO* case normalization in what Anders observed.

Hmm. Maybe the "matching" vs "outputting" makes it clearer.


Given configuration trailer.<NAME>.key=<KEY>

When printing a parsed trailer, e.g from pretty format "%(trailers)",
"git interpret-trailers" passthrough of existing trailers or addition of
a new trailer with --trailer: <KEY> is used in output. If <KEY> is not
configured the trailer token is output the same way as it was in input.

When finding a trailer, e.g for '--trailer x=y' or
trailer.<NAME>.where=before/after: matching is done against both
<NAME> and <KEY>.

When showing a single trailer with pretty format '%(trailers:key=X)' it
is matched against <KEY> only. (I guess one can see this as matching on
the formatted output).


> In other words,
>
>   $ printf '\naCKed: Zz\n' | \
>     git -c 'trailer.Acked.key=Rejected' interpret-trailers --parse
>
> would have emitted "Rejected: Zz".

Indeed.
Junio C Hamano July 27, 2020, 10:53 p.m. UTC | #7
Anders Waldenborg <anders@0x63.nu> writes:

> However I guess one alternative implementation would be that if
> trailer.X.something is configured but not trailer.X.key it would work as
> if there was an implicit "trailer.X.key=X" configured. The name of the
> configuration value would specify the correct spelling.

I had the same thought but then a question struck and stopped me:
what should happen if "trailer.X.something" and
"trailer.x.somethingelse" are both defined?
Anders Waldenborg July 27, 2020, 10:57 p.m. UTC | #8
Jeff King writes:

> On Mon, Jul 27, 2020 at 08:37:26PM +0200, Christian Couder wrote:
>
>> > > I noticed some undocumented and (at least to me) surprising behavior in
>> > > trailers.c.
>> > >
>> > > When configuring a value in trailer.<token>.key it causes the trailer to
>> > > be normalized to that in "git interpret-trailers --parse".
>> > > E.g:
>> > >  $ printf '\naCKed: Zz\n' | \
>> > >    git -c 'trailer.Acked.key=Acked' interpret-trailers --parse
>> > >  will emit: "Acked: Zz"
>>
>> Yeah, I think that's nice, as it can make sure that the key appears in
>> the same way. It's true that it would be better if it would be
>> documented.
>
> I'd note that this also happens without --parse.

Right, and it also happens with "--only-input" (part of "--parse")

"--only-input" is documented as follows:

   Output only trailers that exist in the input; do not add any from the
   command-line or by following configured trailer.* rules.


[]

> I don't recall being aware of this prefix matching until this thread, so
> I doubt that the current behavior of --parse was something I tried for
> intentionally. It's mostly just using the existing code, plus a few
> extra options (listed in the docs). I'm not opposed to adding an option
> to do strict matching and/or avoid rewriting, and then possibly adding
> that into --parse by default.

Would that option also be set for the parsing done by "%(trailers)"
pretty format specifier?

> I don't have much of an opinion on which behavior would be preferred.
> I've never actually had a use case for configuring trailer.*.key, as I
> usually am only looking at reading existing trailers to collect stats,
> etc.

I'm also mainly using in reading trailers (mostly with pretty format
"%(trailers:key=x)") and then these convenience shortcuts doesn't really
seem helpful, they rather add a small risk of mangling my data. Not that
this has caused any problems for me in practice.
Junio C Hamano July 27, 2020, 11:05 p.m. UTC | #9
Anders Waldenborg <anders@0x63.nu> writes:

> From what I can understand it tries to match *both* on the second level
> AND the value of .key (trailers.c:token_matches_item)

Yuck, I do not know what were we thinking to design the behaviour
like *that*.  Or it may be simply buggy.

> $ printf '\na: 1\nb: 2\nc: 3\n' | \
>   git -c 'trailer.A.key=B' interpret-trailers
> B: 1
> B: 2
> c: 3

I can understand the first one (i.e. "trailer.$name.$var" try to
match $name as case insensitively) but not the second one.  There is
not an single rule for "b" trailer, and we should be getting the
same behaviour as the third line, i.e. the key not involved in
rewriting is passed as-is.
Anders Waldenborg July 27, 2020, 11:17 p.m. UTC | #10
Junio C Hamano writes:

> I had the same thought but then a question struck and stopped me:
> what should happen if "trailer.X.something" and
> "trailer.x.somethingelse" are both defined?

Good point.

If we follow the same reasoning as with what happens with prefix
matching (config order matters) the one that happens to be mentioned
first in configuration wins.

The same thing actually happens today with .key:

trailer.foo.key=Hello
trailer.bar.key=HELLO

$ printf "\nHELLo: hi\n" | \
  git -c "trailer.foo.key=Hello" -c "trailer.bar.key=HELLO" interpret-trailers --parse
Hello: hi

but if we swap order of config:

$ printf "\nHELLo: hi\n" | \
  git -c "trailer.bar.key=HELLO" -c "trailer.foo.key=Hello" interpret-trailers --parse
HELLO: hi
Jeff King July 27, 2020, 11:42 p.m. UTC | #11
On Tue, Jul 28, 2020 at 12:57:51AM +0200, Anders Waldenborg wrote:

> >> > > When configuring a value in trailer.<token>.key it causes the trailer to
> >> > > be normalized to that in "git interpret-trailers --parse".
> >> > > E.g:
> >> > >  $ printf '\naCKed: Zz\n' | \
> >> > >    git -c 'trailer.Acked.key=Acked' interpret-trailers --parse
> >> > >  will emit: "Acked: Zz"
> >>
> >> Yeah, I think that's nice, as it can make sure that the key appears in
> >> the same way. It's true that it would be better if it would be
> >> documented.
> >
> > I'd note that this also happens without --parse.
> 
> Right, and it also happens with "--only-input" (part of "--parse")
> 
> "--only-input" is documented as follows:
> 
>    Output only trailers that exist in the input; do not add any from the
>    command-line or by following configured trailer.* rules.

I think I meant there only that we wouldn't add new trailers (e.g., from
"trailers.*.ifMissing"). But I do agree that it might be simpler if we
can just say "we don't look at trailer.<token>.* config at all in
--only-input mode. I _think_ that's true (we probably do look at
trailer.separators, but the rest of the token-specific ones look like
they're all about writing or modifying, not reading).

> > I don't recall being aware of this prefix matching until this thread, so
> > I doubt that the current behavior of --parse was something I tried for
> > intentionally. It's mostly just using the existing code, plus a few
> > extra options (listed in the docs). I'm not opposed to adding an option
> > to do strict matching and/or avoid rewriting, and then possibly adding
> > that into --parse by default.
> 
> Would that option also be set for the parsing done by "%(trailers)"
> pretty format specifier?

I thnk %(trailers) isn't quite the same as "--parse", because you have
to say "only" or "unfold" yourself. But yeah, that option should
certainly be available there, if not the default.

> > I don't have much of an opinion on which behavior would be preferred.
> > I've never actually had a use case for configuring trailer.*.key, as I
> > usually am only looking at reading existing trailers to collect stats,
> > etc.
> 
> I'm also mainly using in reading trailers (mostly with pretty format
> "%(trailers:key=x)") and then these convenience shortcuts doesn't really
> seem helpful, they rather add a small risk of mangling my data. Not that
> this has caused any problems for me in practice.

Yeah, pondering it a bit more, I think trailer.<token>.* doesn't really
make any sense for any reading operation (including %(trailers) or
--parse). I guess it _could_ be useful to normalize names in some
instances, but it's as likely to confuse or break somebody as to help.

-Peff
Anders Waldenborg July 28, 2020, 12:01 a.m. UTC | #12
Junio C Hamano writes:

> Anders Waldenborg <anders@0x63.nu> writes:
>
>> From what I can understand it tries to match *both* on the second level
>> AND the value of .key (trailers.c:token_matches_item)
>
> Yuck, I do not know what were we thinking to design the behaviour
> like *that*.  Or it may be simply buggy.
>
>> $ printf '\na: 1\nb: 2\nc: 3\n' | \
>>   git -c 'trailer.A.key=B' interpret-trailers
>> B: 1
>> B: 2
>> c: 3
>
> I can understand the first one (i.e. "trailer.$name.$var" try to
> match $name as case insensitively) but not the second one.  There is
> not an single rule for "b" trailer, and we should be getting the
> same behaviour as the third line, i.e. the key not involved in
> rewriting is passed as-is.

I'm not so sure about that. Matching sometimes needs to happen on .key
too. If this configuration is supposed to be useful (and as "shortcuts"
has been mentioned before and is what tests does, I think it should be):

trailer.rb.key=Reviewed-By
trailer.rb.ifexists=addifdifferent

then matching must look at key, not name. As the value of .key is what
would have been written previously into the message.

E.g:
$ printf "\nReviewed-By: hi\n" | \
  git -c "trailer.rb.key=Reviewed-By" \
      -c "trailer.rb.ifexists=addifdifferent" \
      interpret-trailers --trailer "rb=hi"
Reviewed-By: hi



The opposite case, matching on name in input message I'm not sure where
it is useful.

 $ printf "\nrb: hi\n" | \
   git -c "trailer.rb.key=Reviewed-By" \
       -c "trailer.rb.ifexists=addifdifferent" \
       interpret-trailers --trailer "rb=hi"
Reviewed-By: hi



The way I have understood it ".key" can be used for some different
things:

 * Freeing up name to be used as a shortcut alias.
 * Defining the canonical capitalization when passing through trailers
 * Allowing specifying non default separator for that key.

I wonder if those uses could be split up.

So instead of this configuration:

 [trailer "rb"]
     key="Reviewed-By> "

that does all three of those. The configuration would be:

 [trailer "Reviewed-By"]
     separator="> "
     canonicalize=true
     alias=rb

that way the "name" part of the config always is the correct spelling
and capitalization. "alias" could easily be a list of multiple alias if
that is wanted. "alias" could be split into "inputalias" (matched
against when reading trailers from stdin or a commit/tag message) and
"optalias" (matched against when reading --trailer cmdline option, and
%(trailers:key))
Christian Couder July 28, 2020, 7:07 a.m. UTC | #13
On Mon, Jul 27, 2020 at 11:41 PM Anders Waldenborg <anders@0x63.nu> wrote:

> Christian Couder writes:

[...]

> >> > Then there is the replacement by config "trailer.fix.key=Fixes" which
> >> > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'"
> >> > which seems to be expected and useful behavior (albeit a bit unclear in
> >> > documentation). But it also happens when parsing incoming trailers, e.g
> >> > with that config
> >> >  $ printf "\nFix: 1\n" | git interpret-trailers --parse
> >> >  will emit: "Fixes: 1"
> >
> > Yeah, I think it allows for shortcuts and can help with standardizing
> > the keys in commit messages.
>
> Maybe what I'm missing is a clear picture of the different cases that
> "git interpret-trailers" is being used in. The "--trailer x=10" option
> seems clearly designed for human input trying to be as helpful as
> possible (e.g always allowing '=' regardless of separators
> configured). But when reading a message with trailers, is same
> helpfulness useful? Or is it always only reading proper trailers
> previously added by --trailer?

I guess it depends on the purpose of reading a message with trailers.
If you want to do stats for example, do you really want to consider
"Reviewed", "Reviewer" and "Reviewed-by" as different trailers in your
stats?

> The standardization of trailers is interesting. If I want to standardize
> "ReviewedBy", "Reviewer", "Code-Reviewer" trailers to "Reviewed-By" I
> would add these configs:
>
> trailer.reviewer.key = Reviewed-By
> trailer.ReviewedBy.key = Reviewed-By
> trailer.Code-Reviewer.key = Reviewed-By
>
> Seems useful (and works out of the box as a msg-filter to
> filter-branch). But the configuration seems a bit backwards, it is
> configured a 3 different trailer entries, rather that a single trailer
> entry with three aliases (or something like that).

Maybe taking advantage of the shortcuts and using only the following could work:

trailer.review.key = Reviewed-By
trailer.code-review.key = Reviewed-By

> >> > This in makes it impossible to have trailer keys that
> >> > are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if
> >> > there is multiple matching in configuration it will just pick the one
> >> > that happens to come first.
> >
> > That's a downside of the above. I agree that it might seem strange or
> > bad. Perhaps an option could be added to implement a strict matching,
> > if people really want it.
> >
> > Also if you configure trailers in the "Acked", "Acked-Tests",
> > "Acked-Docs" order, then any common prefix will pick "Acked" which
> > could be considered ok in my opinion.
>
> Yeah, that works. But that dependency on order of configuration is quite
> subtle imho.

[...]

Yeah, I agree that documenting this would be nice.

> >> > * Should replacement to what is in .key happen also in --parse mode, or
> >> >   only for "--trailer"
> >
> > I think it's more consistent if it happens in both --parse and
> > --trailer mode. I didn't implement --parse though.
>
> Keep in mind that the machinery used by interpret-trailers is also used
> by pretty '%(trailers)' so whatever normalization and shortcuts
> happening also shows up there. Is shortcuts useful in that case too?
> There the input is commit history, not some user input.

Pretty %(trailer) was added later by someone else, but I guess it also
depends on the use case. Is it going to be used for stats? Is it
interesting to distinguish between very similar trailers in these
stats?

And again if people are interested in very strict processing, then
adding an option for that could be the right thing to do.

[...]

> There is also some inconsistency here. If one use '%(trailers) the
> normalization doesn't happen. Only if using '%(trailers:only)' or some
> other option.
>
> (because optimization in format_trailer_info:
>  /* If we want the whole block untouched, we can take the fast path. */)

Maybe that's a bug. Peff, it looks like you added the above comment.
Do you think it's a bug?

> I guess the fact that expansion happens also in these cases can get
> confusing if I have configured a trailer "Bug-Introduced-In" locally and
> the commit message contains "Bug: 123".
>
> 'git log --pretty=format:"%h% (trailers:key=Bug-Introduced-In,valueonly,separator=%x20)"'
> would pick up data from the wrong trailer just because I added
> configuration for Bug-Introduced-In trailer.

Yeah, I understand that it could be an issue.

> >> > Here is a patch to the tests showing these things.
> >
> > Thanks for the patch! I would be ok to add such a patch to the test
> > suite if it was sent like a regular patch (so with a commit message, a
> > Signed-off-by: and so on) to the mailing list. While at it some
> > documentation of the related behavior would also be very nice.
>
> Should I keep the "test_expect_failure" tests, or change into expecting
> current behavior and switch them over to "test_except_success"?

I think you should switch them over to "test_except_success".

> I'll see if I can do something about documentation.

Thanks,
Christian.
Jeff King July 28, 2020, 3:41 p.m. UTC | #14
On Tue, Jul 28, 2020 at 09:07:18AM +0200, Christian Couder wrote:

> > Maybe what I'm missing is a clear picture of the different cases that
> > "git interpret-trailers" is being used in. The "--trailer x=10" option
> > seems clearly designed for human input trying to be as helpful as
> > possible (e.g always allowing '=' regardless of separators
> > configured). But when reading a message with trailers, is same
> > helpfulness useful? Or is it always only reading proper trailers
> > previously added by --trailer?
> 
> I guess it depends on the purpose of reading a message with trailers.
> If you want to do stats for example, do you really want to consider
> "Reviewed", "Reviewer" and "Reviewed-by" as different trailers in your
> stats?

My guess would be that yes, you probably would want them to be
different. They _might_ be typos of each other, in which case
normalizing is helpful. But they might well have totally different
meanings. It will really depend on the project's use of the trailers,
and I'd expect any stats-gathering to do that kind of normalization much
later. I.e., use Git to reliably get "foo: bar" output from all of the
commits, and then count them up in a perl script or whatever, lumping
together like fields at that stage.

It's inevitable that you'll have to do some data cleanup like that
anyway. Lumping together prefixes isn't flexible enough to coverall
cases. Using trailer.*.key to manually map names covers more, but again
not all (e.g., if the project used to use "foo" but switched to "bar",
but the syntax of the value fields also changed at the same time, you'd
need to normalize those, too).

So to me it boils down to:

  - returning the data as verbatim as possible when reading existing
    trailers would give the least surprise to most people

  - real data collection is going to involve a separate post-processing
    step which is better done in a full programming language

> > There is also some inconsistency here. If one use '%(trailers) the
> > normalization doesn't happen. Only if using '%(trailers:only)' or some
> > other option.
> >
> > (because optimization in format_trailer_info:
> >  /* If we want the whole block untouched, we can take the fast path. */)
> 
> Maybe that's a bug. Peff, it looks like you added the above comment.
> Do you think it's a bug?

The original %(trailers) was "dump the trailers block verbatim". But
that's not super useful for parsing individual trailers. So "only"
actually starts parsing the individual trailers. I'd argue that the
%(trailers) behavior is correct, but that %(trailers:only) should
probably be doing less (i.e., just parsing and reporting what it finds,
but not changing any names).

-Peff
Junio C Hamano July 28, 2020, 4:40 p.m. UTC | #15
Jeff King <peff@peff.net> writes:

> It's inevitable that you'll have to do some data cleanup like that
> anyway. Lumping together prefixes isn't flexible enough to coverall
> cases. Using trailer.*.key to manually map names covers more, but again
> not all (e.g., if the project used to use "foo" but switched to "bar",
> but the syntax of the value fields also changed at the same time, you'd
> need to normalize those, too).
>
> So to me it boils down to:
>
>   - returning the data as verbatim as possible when reading existing
>     trailers would give the least surprise to most people
>
>   - real data collection is going to involve a separate post-processing
>     step which is better done in a full programming language

Yeah, I agree that these are good guidelines to use when we think
about the feature.

> The original %(trailers) was "dump the trailers block verbatim". But
> that's not super useful for parsing individual trailers. So "only"
> actually starts parsing the individual trailers. I'd argue that the
> %(trailers) behavior is correct, but that %(trailers:only) should
> probably be doing less (i.e., just parsing and reporting what it finds,
> but not changing any names).

Yes, sounds sensible.
diff mbox series

Patch

diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 2e6d406edf..d5d19cf89b 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -99,6 +99,64 @@  test_expect_success 'with config option on the command line' '
 	test_cmp expected actual
 '

+test_expect_success 'parse normalizes spelling and separators from configs with key' '
+	cat >patch <<-\EOF &&
+		non-trailer-line
+
+		ReviEweD-bY :abc
+		ReviEwEd-bY) rst
+		ReviEweD-BY ; xyz
+		aCked-bY: not normalized
+	EOF
+	cat >expected <<-\EOF &&
+		Reviewed-By: abc
+		Reviewed-By: rst
+		Reviewed-By: xyz
+		aCked-bY: not normalized
+	EOF
+	git \
+		-c "trailer.separators=:);" \
+		-c "trailer.rb.key=Reviewed-By" \
+		-c "trailer.Acked-By.ifmissing=doNothing" \
+		interpret-trailers --parse patch >actual &&
+	test_cmp expected actual
+'
+
+# Matching currently is prefix matching, causing "This-trailer" to be normalized too
+test_expect_failure 'config option matches exact only' '
+	cat >patch <<-\EOF &&
+
+		This-trailer: a
+		 b
+		This-trailer-exact: b
+		 c
+		This-trailer-exact-plus-some: c
+		 d
+	EOF
+	cat >expected <<-\EOF &&
+		This-trailer: a b
+		THIS-TRAILER-EXACT: b c
+		This-trailer-exact-plus-some: c d
+	EOF
+	git -c "trailer.tte.key=THIS-TRAILER-EXACT" interpret-trailers --parse patch >actual &&
+	test_cmp expected actual
+'
+
+# Matching currently uses the config key even if key value is different
+test_expect_failure 'config option matches exact only' '
+	cat >patch <<-\EOF &&
+
+		Ticket: 1234
+		Reference-ticket: 99
+	EOF
+	cat >expected <<-\EOF &&
+		Ticket: 1234
+		Reference-Ticket: 99
+	EOF
+	git -c "trailer.ticket.key=Reference-Ticket" interpret-trailers --parse patch >actual &&
+	test_cmp expected actual
+'
+
 test_expect_success 'with only a title in the message' '
 	cat >expected <<-\EOF &&
 		area: change
@@ -473,6 +531,18 @@  test_expect_success 'with config setup' '
 	test_cmp expected actual
 '

+# currently this matches the "Acked-by: " value in ack key set by previous test
+test_expect_failure 'with config setup matches key exactly' '
+	cat >expected <<-\EOF &&
+
+		A: B
+	EOF
+	git interpret-trailers --trailer "A=10" empty >actual &&
+	test_cmp expected actual
+'
+
+
+
 test_expect_success 'with config setup and ":=" as separators' '
 	git config trailer.separators ":=" &&
 	git config trailer.ack.key "Acked-by= " &&