diff mbox series

trailer: allow spaces in tokens

Message ID pull.1309.git.git.1660806376021.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series trailer: allow spaces in tokens | expand

Commit Message

Max Bernstein Aug. 18, 2022, 7:06 a.m. UTC
From: Max Bernstein <max@bernsteinbear.com>

The docs (for example, https://git-scm.com/docs/git-interpret-trailers)
say that whitespace should be allowed inside tokens:

> There can also be whitespaces inside the token and the value.

The code since e4319562bc2834096fade432fd90c985b96476db does not allow
that, so re-enable and add a test.

Signed-off-by: Max Bernstein <max@bernsteinbear.com>
---
    trailer: allow spaces in tokens

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1309%2Ftekknolagi%2Fmaint-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1309/tekknolagi/maint-v1
Pull-Request: https://github.com/git/git/pull/1309

 t/t7513-interpret-trailers.sh | 40 +++++++++++++++++++++++++++++++++++
 trailer.c                     |  7 +-----
 2 files changed, 41 insertions(+), 6 deletions(-)


base-commit: ad60dddad72dfb8367bd695028b5b8dc6c33661b

Comments

Junio C Hamano Aug. 18, 2022, 4:48 p.m. UTC | #1
[jc: summoning an area expert and the developer whose commit is
blamed for the breakage by adding their addresses on Cc: list]

"Max Bernstein via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Max Bernstein <max@bernsteinbear.com>
>
> The docs (for example, https://git-scm.com/docs/git-interpret-trailers)

Refer to "git help interpret-trailers" instead.

> say that whitespace should be allowed inside tokens:
>
>> There can also be whitespaces inside the token and the value.
>
> The code since e4319562bc2834096fade432fd90c985b96476db does not allow

Refer to an existing commit like so

	e4319562 (trailer: be stricter in parsing separators, 2016-11-02)

> that, so re-enable and add a test.

Jonathan, do you remember if the "be stricter" was in a response to
specific real world issues, or is this "we no longer allow whitespace"
an unintended side effect of the change?

If the former, an equally valid "fix" to what Max reports here would
be to add such a real world issue or two as new tests to demonostrate
why allowing whitespaces was a mistake that hurts real-world workflow,
and fix the documentation.

I actually am suspecting it is the latter, only because we would have
added a testcase to show why whitespaces in the trailer label is a
bad idea in e4319562 if this was triggered by a real-world issue.

I would say that it would be a disaster, if we took any random
line with colon : in it in the middle of the commit message and
mistook it as a trailer (like the line above), but since we do not
allow paragraph breaks in the trailer block, as long as the message
has a valid trailer block, it might not be a huge issue.  I dunno.

> diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
> index 97f10905d23..47bf83003ef 100755
> --- a/t/t7513-interpret-trailers.sh
> +++ b/t/t7513-interpret-trailers.sh
> @@ -1481,6 +1481,46 @@ test_expect_success 'only-trailers omits non-trailer in middle of block' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success 'supports spaces inside token' '
> +	git config --unset trailer.sign.command &&
> +	cat >expected <<-\EOF &&
> +		Signed-off-by: nobody <nobody@nowhere>
> +		some other trailer: a value
> +		Signed-off-by: somebody <somebody@somewhere>
> +	EOF
> +	echo "wrote to expected" 1>&2 &&
> +	git interpret-trailers --only-trailers >actual <<-\EOF &&
> +		subject
> +
> +		it is important that the trailers below are signed-off-by

It also is important that if you add colon at the end of the above
line, it is *not* mistaken as a trailer with whitespace in token,
with an empty value.

> +		so that they meet the "25% trailers Git knows about" heuristic
> +
> +		Signed-off-by: nobody <nobody@nowhere>
> +		some other trailer: a value
> +		Signed-off-by: somebody <somebody@somewhere>
> +	EOF
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'does not support space at beginning of token' '
> +	cat >expected <<-\EOF &&
> +		Signed-off-by: nobody <nobody@nowhere> not a trailer: thing
> +		Signed-off-by: somebody <somebody@somewhere>
> +	EOF
> +	echo "wrote to expected" 1>&2 &&
> +	git interpret-trailers --only-trailers --unfold >actual <<-\EOF &&
> +		subject
> +
> +		it is important that the trailers below are signed-off-by
> +		so that they meet the "25% trailers Git knows about" heuristic
> +
> +		Signed-off-by: nobody <nobody@nowhere>
> +		 not a trailer: thing
> +		Signed-off-by: somebody <somebody@somewhere>
> +	EOF
> +	test_cmp expected actual
> +'
> +
>  test_expect_success 'only input' '
>  	git config trailer.sign.command "echo config-value" &&
>  	cat >expected <<-\EOF &&
> diff --git a/trailer.c b/trailer.c
> index d419c20735e..d02a9154512 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -618,17 +618,12 @@ static int token_matches_item(const char *tok, struct arg_item *item, size_t tok
>   */
>  static ssize_t find_separator(const char *line, const char *separators)
>  {
> -	int whitespace_found = 0;
>  	const char *c;
>  	for (c = line; *c; c++) {
>  		if (strchr(separators, *c))
>  			return c - line;
> -		if (!whitespace_found && (isalnum(*c) || *c == '-'))
> +		if (isalnum(*c) || *c == '-' || (c != line && (*c == ' ' || *c == '\t')))
>  			continue;
> -		if (c != line && (*c == ' ' || *c == '\t')) {
> -			whitespace_found = 1;
> -			continue;
> -		}
>  		break;
>  	}
>  	return -1;
>
> base-commit: ad60dddad72dfb8367bd695028b5b8dc6c33661b
Jonathan Tan Aug. 18, 2022, 5:52 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:
> Jonathan, do you remember if the "be stricter" was in a response to
> specific real world issues, or is this "we no longer allow whitespace"
> an unintended side effect of the change?

Rereading the relevant mailing list thread, I think it's a bit of both.
The sequence of events was as follows:
 - I made a patch set that changed sequencer.c (which, among other
   things, controls the behavior of "commit -s") to use
   interpret-trailers's functionality. This caused a change in the
   behavior of a "commit -s" unit test (the "I want to mention about
   Signed-off-by: here." one), because "commit -s" previously did not
   support spaces before the separator. [1]
 - You mentioned that perhaps we should tighten it back. [2]
 - I agreed and tightened it, but didn't change the documentation. [3]

[1] https://lore.kernel.org/git/a416ab9b-ff1f-9a71-3e58-60fd4f8a6b8e@google.com/
[2] https://lore.kernel.org/git/xmqqtwbroykt.fsf@gitster.mtv.corp.google.com/
[3] https://lore.kernel.org/git/cover.1478028700.git.jonathantanmy@google.com/

So the "real world issue" was a test in our test suite, and perhaps the
side effect was unintended in that we wanted to preserve the behavior of
"commit -s" but didn't think about other possible uses of
interpret-trailers.

> If the former, an equally valid "fix" to what Max reports here would
> be to add such a real world issue or two as new tests to demonostrate
> why allowing whitespaces was a mistake that hurts real-world workflow,
> and fix the documentation.
>
> I actually am suspecting it is the latter, only because we would have
> added a testcase to show why whitespaces in the trailer label is a
> bad idea in e4319562 if this was triggered by a real-world issue.

The "I want to mention about Signed-off-by: here." test might be a
sufficient real-world issue.

> I would say that it would be a disaster, if we took any random
> line with colon : in it in the middle of the commit message and
> mistook it as a trailer (like the line above), but since we do not
> allow paragraph breaks in the trailer block, as long as the message
> has a valid trailer block, it might not be a huge issue.  I dunno.

Yes, it would be fine if the paragraph was two or more lines long, but
not if it's a single line.
Junio C Hamano Aug. 18, 2022, 5:58 p.m. UTC | #3
Jonathan Tan <jonathantanmy@google.com> writes:

> The "I want to mention about Signed-off-by: here." test might be a
> sufficient real-world issue.
>
>> I would say that it would be a disaster, if we took any random
>> line with colon : in it in the middle of the commit message and
>> mistook it as a trailer (like the line above), but since we do not
>> allow paragraph breaks in the trailer block, as long as the message
>> has a valid trailer block, it might not be a huge issue.  I dunno.
>
> Yes, it would be fine if the paragraph was two or more lines long, but
> not if it's a single line.

OK, so it seems that we want to fix the documentation, and we have
sufficient test coverage (I didn't notice that Max needed to adjust
the tests to accomodate the new "loose" definition when I wrote the
message you were responding to).

THanks.
diff mbox series

Patch

diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 97f10905d23..47bf83003ef 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1481,6 +1481,46 @@  test_expect_success 'only-trailers omits non-trailer in middle of block' '
 	test_cmp expected actual
 '
 
+test_expect_success 'supports spaces inside token' '
+	git config --unset trailer.sign.command &&
+	cat >expected <<-\EOF &&
+		Signed-off-by: nobody <nobody@nowhere>
+		some other trailer: a value
+		Signed-off-by: somebody <somebody@somewhere>
+	EOF
+	echo "wrote to expected" 1>&2 &&
+	git interpret-trailers --only-trailers >actual <<-\EOF &&
+		subject
+
+		it is important that the trailers below are signed-off-by
+		so that they meet the "25% trailers Git knows about" heuristic
+
+		Signed-off-by: nobody <nobody@nowhere>
+		some other trailer: a value
+		Signed-off-by: somebody <somebody@somewhere>
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success 'does not support space at beginning of token' '
+	cat >expected <<-\EOF &&
+		Signed-off-by: nobody <nobody@nowhere> not a trailer: thing
+		Signed-off-by: somebody <somebody@somewhere>
+	EOF
+	echo "wrote to expected" 1>&2 &&
+	git interpret-trailers --only-trailers --unfold >actual <<-\EOF &&
+		subject
+
+		it is important that the trailers below are signed-off-by
+		so that they meet the "25% trailers Git knows about" heuristic
+
+		Signed-off-by: nobody <nobody@nowhere>
+		 not a trailer: thing
+		Signed-off-by: somebody <somebody@somewhere>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'only input' '
 	git config trailer.sign.command "echo config-value" &&
 	cat >expected <<-\EOF &&
diff --git a/trailer.c b/trailer.c
index d419c20735e..d02a9154512 100644
--- a/trailer.c
+++ b/trailer.c
@@ -618,17 +618,12 @@  static int token_matches_item(const char *tok, struct arg_item *item, size_t tok
  */
 static ssize_t find_separator(const char *line, const char *separators)
 {
-	int whitespace_found = 0;
 	const char *c;
 	for (c = line; *c; c++) {
 		if (strchr(separators, *c))
 			return c - line;
-		if (!whitespace_found && (isalnum(*c) || *c == '-'))
+		if (isalnum(*c) || *c == '-' || (c != line && (*c == ' ' || *c == '\t')))
 			continue;
-		if (c != line && (*c == ' ' || *c == '\t')) {
-			whitespace_found = 1;
-			continue;
-		}
 		break;
 	}
 	return -1;