diff mbox series

[v2] trailer: allow spaces in tokens

Message ID pull.1309.v2.git.git.1660809243298.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] trailer: allow spaces in tokens | expand

Commit Message

Max Bernstein Aug. 18, 2022, 7:54 a.m. UTC
From: Max Bernstein <max@bernsteinbear.com>

The docs (for example, https://git-scm.com/docs/git-interpret-trailers)
say that whitespace should be allowed inside tokens:

> There can also be whitespaces inside the token and the value.

The code since e4319562bc2834096fade432fd90c985b96476db does not allow
that, so re-enable and add a test.

Signed-off-by: Max Bernstein <max@bernsteinbear.com>
---
    trailer: allow spaces in tokens
    
    Changed since v1:
    
     * "Fixed" a format-patch test. I'm not sure if the new output is what
       everyone might expect for this particular case.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1309%2Ftekknolagi%2Fmaint-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1309/tekknolagi/maint-v2
Pull-Request: https://github.com/git/git/pull/1309

Range-diff vs v1:

 1:  404a6d1b193 ! 1:  4d490851ac2 trailer: allow spaces in tokens
     @@ Commit message
      
          Signed-off-by: Max Bernstein <max@bernsteinbear.com>
      
     + ## t/t4014-format-patch.sh ##
     +@@ t/t4014-format-patch.sh: test_expect_success 'signoff: not really a signoff' '
     + 	4:Subject: [PATCH] subject
     + 	8:
     + 	9:I want to mention about Signed-off-by: here.
     +-	10:
     +-	11:Signed-off-by: C O Mitter <committer@example.com>
     ++	10:Signed-off-by: C O Mitter <committer@example.com>
     + 	EOF
     + 	test_cmp expect actual
     + '
     +
       ## t/t7513-interpret-trailers.sh ##
      @@ t/t7513-interpret-trailers.sh: test_expect_success 'only-trailers omits non-trailer in middle of block' '
       	test_cmp expected actual


 t/t4014-format-patch.sh       |  3 +--
 t/t7513-interpret-trailers.sh | 40 +++++++++++++++++++++++++++++++++++
 trailer.c                     |  7 +-----
 3 files changed, 42 insertions(+), 8 deletions(-)


base-commit: ad60dddad72dfb8367bd695028b5b8dc6c33661b

Comments

Junio C Hamano Aug. 18, 2022, 4:54 p.m. UTC | #1
"Max Bernstein via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Range-diff vs v1:
>
>  1:  404a6d1b193 ! 1:  4d490851ac2 trailer: allow spaces in tokens
>      @@ Commit message
>       
>           Signed-off-by: Max Bernstein <max@bernsteinbear.com>
>       
>      + ## t/t4014-format-patch.sh ##
>      +@@ t/t4014-format-patch.sh: test_expect_success 'signoff: not really a signoff' '
>      + 	4:Subject: [PATCH] subject
>      + 	8:
>      + 	9:I want to mention about Signed-off-by: here.
>      +-	10:
>      +-	11:Signed-off-by: C O Mitter <committer@example.com>
>      ++	10:Signed-off-by: C O Mitter <committer@example.com>
>      + 	EOF
>      + 	test_cmp expect actual
>      + '

So, the updated code mistook the body of the message that is not a
sign-off, because there is a colon on the line, the line does not
begin with the colon, and everything before the colon is an alnum or
a whitespace, so squashed the paragraph break before the real
trailer block and the last line of the body and made it a body-less
commit log message?

This might be a good demonstration of why it is a mistaken design to
allow whitespaces, which may steer us toward fixing the documentation?

I dunno.  What do others think?
Jonathan Tan Aug. 18, 2022, 5:56 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:
> "Max Bernstein via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > Range-diff vs v1:
> >
> >  1:  404a6d1b193 ! 1:  4d490851ac2 trailer: allow spaces in tokens
> >      @@ Commit message
> >       
> >           Signed-off-by: Max Bernstein <max@bernsteinbear.com>
> >       
> >      + ## t/t4014-format-patch.sh ##
> >      +@@ t/t4014-format-patch.sh: test_expect_success 'signoff: not really a signoff' '
> >      + 	4:Subject: [PATCH] subject
> >      + 	8:
> >      + 	9:I want to mention about Signed-off-by: here.
> >      +-	10:
> >      +-	11:Signed-off-by: C O Mitter <committer@example.com>
> >      ++	10:Signed-off-by: C O Mitter <committer@example.com>
> >      + 	EOF
> >      + 	test_cmp expect actual
> >      + '
> 
> So, the updated code mistook the body of the message that is not a
> sign-off, because there is a colon on the line, the line does not
> begin with the colon, and everything before the colon is an alnum or
> a whitespace, so squashed the paragraph break before the real
> trailer block and the last line of the body and made it a body-less
> commit log message?
> 
> This might be a good demonstration of why it is a mistaken design to
> allow whitespaces, which may steer us toward fixing the documentation?
> 
> I dunno.  What do others think?

Yeah, this was the same issue we discussed a few years ago [1]. "git
commit -s" has never supported spaces before the colon (at least, it did
not support it before my patch set in [1] and does not support it now)
so it seems that having no spaces is already the established convention,
and I think that interpret-trailers should follow it. I would agree that
we should fix the documentation to match the current behavior.

[1] https://lore.kernel.org/git/a416ab9b-ff1f-9a71-3e58-60fd4f8a6b8e@google.com/
Maxwell Bernstein Aug. 18, 2022, 7:03 p.m. UTC | #3
On Thu Aug 18, 2022 at 9:54 AM PDT, Junio C Hamano wrote:
> So, the updated code mistook the body of the message that is not a
> sign-off, because there is a colon on the line, the line does not
> begin with the colon, and everything before the colon is an alnum or
> a whitespace, so squashed the paragraph break before the real
> trailer block and the last line of the body and made it a body-less
> commit log message?

I agree that it is not ideal, and I'm not sure how to fix it. This commit
probably shouldn't go in as-is; I modified the test in order to demonstrate
this problem.

> This might be a good demonstration of why it is a mistaken design to
> allow whitespaces, which may steer us toward fixing the documentation?
>
> I dunno.  What do others think?

I think allowing whitespace is good at least for the Phabricator project, which
despite being dead, has a number of existing users and existing commits. It
unfortunately has a "Differential revision" trailer with whitespace.
Christian Couder Aug. 18, 2022, 8:46 p.m. UTC | #4
On Thu, Aug 18, 2022 at 9:03 PM Maxwell Bernstein <tekk.nolagi@gmail.com> wrote:
>
> On Thu Aug 18, 2022 at 9:54 AM PDT, Junio C Hamano wrote:
> > So, the updated code mistook the body of the message that is not a
> > sign-off, because there is a colon on the line, the line does not
> > begin with the colon, and everything before the colon is an alnum or
> > a whitespace, so squashed the paragraph break before the real
> > trailer block and the last line of the body and made it a body-less
> > commit log message?
>
> I agree that it is not ideal, and I'm not sure how to fix it. This commit
> probably shouldn't go in as-is; I modified the test in order to demonstrate
> this problem.
>
> > This might be a good demonstration of why it is a mistaken design to
> > allow whitespaces, which may steer us toward fixing the documentation?
> >
> > I dunno.  What do others think?
>
> I think allowing whitespace is good at least for the Phabricator project, which
> despite being dead, has a number of existing users and existing commits. It
> unfortunately has a "Differential revision" trailer with whitespace.

I think allowing one white space before the separator is a good idea
for the following reasons.

First, if people use something like #, !, ~ or % as a separator, for
example in the case of a trailer like "Bug #43", it is very natural to
have a white space before the # separator.

Note that GitLab for example uses the ~N format, where N is a number,
for issues, !N for merge requests and %N for milestones. I think
Bugzilla and many other bug trackers use a #N format for bug numbers.

Also in some languages, like French, the typographical rules when
writing regular text is to have a space (technically it's supposed to
be a "non breaking space", see:
https://en.wikipedia.org/wiki/Non-breaking_space, but in practice
people use a regular space most of the time) before a colon. So it is
very natural for a number of people in the world to automatically add
a white space before a colon.

https://en.wikipedia.org/wiki/Colon_(punctuation)#Spacing says: "In
print, a thin space was traditionally placed before a colon and a
thick space after it. In modern English-language printing, no space is
placed before a colon and a single space is placed after it. In
French-language typing and printing, the traditional rules are
preserved. "

I think it would be very annoying for users to find out that a number
of otherwise proper trailers would not be taken into account only
because they have a white space before the colon. At least there
should be an option to allow that.
Junio C Hamano Aug. 18, 2022, 9:31 p.m. UTC | #5
Christian Couder <christian.couder@gmail.com> writes:

> I think allowing one white space before the separator is a good idea
> for the following reasons.
> ...
> I think it would be very annoying for users to find out that a number
> of otherwise proper trailers would not be taken into account only
> because they have a white space before the colon. At least there
> should be an option to allow that.

In short, you do not support Max's patch that allows arbitrary
number of white spaces anywhere but at the beginning of line,
but see a room for unrelated improvement from the current code,
namely, to allow exactly one optional space, immediately before
the separator and nowhere else.

It may be a reasonable thing to do that may not break too many
things.  I do not know if that is loose enough to satisify Max's
original wish, though.

Thanks.
Junio C Hamano Aug. 19, 2022, 4:33 a.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

> In short, you do not support Max's patch that allows arbitrary
> number of white spaces anywhere but at the beginning of line,
> but see a room for unrelated improvement from the current code,
> namely, to allow exactly one optional space, immediately before
> the separator and nowhere else.

Ah, no, sorry, I misread the situation.  It's not a room for
improvement.  It is very close to what the current code already
does, i.e. to allow optional spaces immediately before the
separator.  The difference is that the current code allows arbitrary
number of optional spaces, not zero or exactly one.

So the only thing we need to do is to update the documentation that
Max misread as allowing spaces at arbitrary place to reflect what
the code has been doing, i.e. an optional run of spaces is allowed
between the "token" and the "separator"?

Anybody wants to do the honors to produce such a patch, then?

Thanks.
Christian Couder Aug. 19, 2022, 10:29 a.m. UTC | #7
On Fri, Aug 19, 2022 at 6:33 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > In short, you do not support Max's patch that allows arbitrary
> > number of white spaces anywhere but at the beginning of line,

I haven't closely followed the patches that might have tweaked the
original default behavior regarding whitespaces in tokens, but I think
in general it's not a good idea to change the default behavior for no
very good reason, because it could cause regressions in the way others
already use `git interpret-trailers`. It could have been Ok to accept
that during a few years after the original implementation was merged
in 2014, but these days people and tools (like GitLab for example)
rely on trailers and `git interpret-trailers` more and more.

So in general I think it's better to have a more cautious approach
now, like we usually have for other commands, and only allow new
options to tweak the default behavior. If one such option appears to
be very widely used, then we might decide later that it should become
the default behavior.

I understand that having commands with many options might not be
considered by some of us as good design, but if we don't like that,
then we could perhaps just have only one option, maybe called
something like "trailer.tokenFormat", or just "trailer.format", that
would contain a regex that every token, or trailer, should match.

It's a complex subject because there are different and competing
viewpoints. On one hand it can be annoying to have some trailers
ignored just because they don't match the default format in a very
small way. (And rewriting trailers could mean rewriting published
history which is not an easy thing to do.) On the other hand if we
develop something like "trailer.tokenFormat" that allows everything,
then we might consider that we should instead encourage people to do
the right thing and to use as much as possible regular trailers with a
strict format. So allowing new options to only tweak things in a small
way might be the right thing to do after all.

> > but see a room for unrelated improvement from the current code,
> > namely, to allow exactly one optional space, immediately before
> > the separator and nowhere else.
>
> Ah, no, sorry, I misread the situation.  It's not a room for
> improvement.  It is very close to what the current code already
> does, i.e. to allow optional spaces immediately before the
> separator.  The difference is that the current code allows arbitrary
> number of optional spaces, not zero or exactly one.

Yeah, I think it makes sense to not change this default behavior, but
maybe, if people don't like it for some reason, allow options to tweak
it.

> So the only thing we need to do is to update the documentation that
> Max misread as allowing spaces at arbitrary place to reflect what
> the code has been doing, i.e. an optional run of spaces is allowed
> between the "token" and the "separator"?

Yeah, I also think that the documentation should be very clear (and
accurate of course) about where whitespaces and how many of them are
allowed by default, and what can possibly be tweaked by options.

> Anybody wants to do the honors to produce such a patch, then?

Ok, I will give it a try soon.
Johannes Schindelin Aug. 22, 2022, 1:58 p.m. UTC | #8
Hi Christian,

On Fri, 19 Aug 2022, Christian Couder wrote:

> On Fri, Aug 19, 2022 at 6:33 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Junio C Hamano <gitster@pobox.com> writes:
> >
> > > but see a room for unrelated improvement from the current code,
> > > namely, to allow exactly one optional space, immediately before the
> > > separator and nowhere else.
> >
> > Ah, no, sorry, I misread the situation.  It's not a room for
> > improvement.  It is very close to what the current code already does,
> > i.e. to allow optional spaces immediately before the separator.  The
> > difference is that the current code allows arbitrary number of
> > optional spaces, not zero or exactly one.
>
> Yeah, I think it makes sense to not change this default behavior, but
> maybe, if people don't like it for some reason, allow options to tweak
> it.

That sounds like a very good compromise to me.

Ciao,
Dscho
diff mbox series

Patch

diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index fbec8ad2ef7..db95cb6ee66 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1544,8 +1544,7 @@  test_expect_success 'signoff: not really a signoff' '
 	4:Subject: [PATCH] subject
 	8:
 	9:I want to mention about Signed-off-by: here.
-	10:
-	11:Signed-off-by: C O Mitter <committer@example.com>
+	10:Signed-off-by: C O Mitter <committer@example.com>
 	EOF
 	test_cmp expect actual
 '
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 97f10905d23..47bf83003ef 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -1481,6 +1481,46 @@  test_expect_success 'only-trailers omits non-trailer in middle of block' '
 	test_cmp expected actual
 '
 
+test_expect_success 'supports spaces inside token' '
+	git config --unset trailer.sign.command &&
+	cat >expected <<-\EOF &&
+		Signed-off-by: nobody <nobody@nowhere>
+		some other trailer: a value
+		Signed-off-by: somebody <somebody@somewhere>
+	EOF
+	echo "wrote to expected" 1>&2 &&
+	git interpret-trailers --only-trailers >actual <<-\EOF &&
+		subject
+
+		it is important that the trailers below are signed-off-by
+		so that they meet the "25% trailers Git knows about" heuristic
+
+		Signed-off-by: nobody <nobody@nowhere>
+		some other trailer: a value
+		Signed-off-by: somebody <somebody@somewhere>
+	EOF
+	test_cmp expected actual
+'
+
+test_expect_success 'does not support space at beginning of token' '
+	cat >expected <<-\EOF &&
+		Signed-off-by: nobody <nobody@nowhere> not a trailer: thing
+		Signed-off-by: somebody <somebody@somewhere>
+	EOF
+	echo "wrote to expected" 1>&2 &&
+	git interpret-trailers --only-trailers --unfold >actual <<-\EOF &&
+		subject
+
+		it is important that the trailers below are signed-off-by
+		so that they meet the "25% trailers Git knows about" heuristic
+
+		Signed-off-by: nobody <nobody@nowhere>
+		 not a trailer: thing
+		Signed-off-by: somebody <somebody@somewhere>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'only input' '
 	git config trailer.sign.command "echo config-value" &&
 	cat >expected <<-\EOF &&
diff --git a/trailer.c b/trailer.c
index d419c20735e..d02a9154512 100644
--- a/trailer.c
+++ b/trailer.c
@@ -618,17 +618,12 @@  static int token_matches_item(const char *tok, struct arg_item *item, size_t tok
  */
 static ssize_t find_separator(const char *line, const char *separators)
 {
-	int whitespace_found = 0;
 	const char *c;
 	for (c = line; *c; c++) {
 		if (strchr(separators, *c))
 			return c - line;
-		if (!whitespace_found && (isalnum(*c) || *c == '-'))
+		if (isalnum(*c) || *c == '-' || (c != line && (*c == ' ' || *c == '\t')))
 			continue;
-		if (c != line && (*c == ' ' || *c == '\t')) {
-			whitespace_found = 1;
-			continue;
-		}
 		break;
 	}
 	return -1;