Message ID | pull.1309.v2.git.git.1660809243298.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] trailer: allow spaces in tokens | expand |
"Max Bernstein via GitGitGadget" <gitgitgadget@gmail.com> writes: > Range-diff vs v1: > > 1: 404a6d1b193 ! 1: 4d490851ac2 trailer: allow spaces in tokens > @@ Commit message > > Signed-off-by: Max Bernstein <max@bernsteinbear.com> > > + ## t/t4014-format-patch.sh ## > +@@ t/t4014-format-patch.sh: test_expect_success 'signoff: not really a signoff' ' > + 4:Subject: [PATCH] subject > + 8: > + 9:I want to mention about Signed-off-by: here. > +- 10: > +- 11:Signed-off-by: C O Mitter <committer@example.com> > ++ 10:Signed-off-by: C O Mitter <committer@example.com> > + EOF > + test_cmp expect actual > + ' So, the updated code mistook the body of the message that is not a sign-off, because there is a colon on the line, the line does not begin with the colon, and everything before the colon is an alnum or a whitespace, so squashed the paragraph break before the real trailer block and the last line of the body and made it a body-less commit log message? This might be a good demonstration of why it is a mistaken design to allow whitespaces, which may steer us toward fixing the documentation? I dunno. What do others think?
Junio C Hamano <gitster@pobox.com> writes: > "Max Bernstein via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > Range-diff vs v1: > > > > 1: 404a6d1b193 ! 1: 4d490851ac2 trailer: allow spaces in tokens > > @@ Commit message > > > > Signed-off-by: Max Bernstein <max@bernsteinbear.com> > > > > + ## t/t4014-format-patch.sh ## > > +@@ t/t4014-format-patch.sh: test_expect_success 'signoff: not really a signoff' ' > > + 4:Subject: [PATCH] subject > > + 8: > > + 9:I want to mention about Signed-off-by: here. > > +- 10: > > +- 11:Signed-off-by: C O Mitter <committer@example.com> > > ++ 10:Signed-off-by: C O Mitter <committer@example.com> > > + EOF > > + test_cmp expect actual > > + ' > > So, the updated code mistook the body of the message that is not a > sign-off, because there is a colon on the line, the line does not > begin with the colon, and everything before the colon is an alnum or > a whitespace, so squashed the paragraph break before the real > trailer block and the last line of the body and made it a body-less > commit log message? > > This might be a good demonstration of why it is a mistaken design to > allow whitespaces, which may steer us toward fixing the documentation? > > I dunno. What do others think? Yeah, this was the same issue we discussed a few years ago [1]. "git commit -s" has never supported spaces before the colon (at least, it did not support it before my patch set in [1] and does not support it now) so it seems that having no spaces is already the established convention, and I think that interpret-trailers should follow it. I would agree that we should fix the documentation to match the current behavior. [1] https://lore.kernel.org/git/a416ab9b-ff1f-9a71-3e58-60fd4f8a6b8e@google.com/
On Thu Aug 18, 2022 at 9:54 AM PDT, Junio C Hamano wrote: > So, the updated code mistook the body of the message that is not a > sign-off, because there is a colon on the line, the line does not > begin with the colon, and everything before the colon is an alnum or > a whitespace, so squashed the paragraph break before the real > trailer block and the last line of the body and made it a body-less > commit log message? I agree that it is not ideal, and I'm not sure how to fix it. This commit probably shouldn't go in as-is; I modified the test in order to demonstrate this problem. > This might be a good demonstration of why it is a mistaken design to > allow whitespaces, which may steer us toward fixing the documentation? > > I dunno. What do others think? I think allowing whitespace is good at least for the Phabricator project, which despite being dead, has a number of existing users and existing commits. It unfortunately has a "Differential revision" trailer with whitespace.
On Thu, Aug 18, 2022 at 9:03 PM Maxwell Bernstein <tekk.nolagi@gmail.com> wrote: > > On Thu Aug 18, 2022 at 9:54 AM PDT, Junio C Hamano wrote: > > So, the updated code mistook the body of the message that is not a > > sign-off, because there is a colon on the line, the line does not > > begin with the colon, and everything before the colon is an alnum or > > a whitespace, so squashed the paragraph break before the real > > trailer block and the last line of the body and made it a body-less > > commit log message? > > I agree that it is not ideal, and I'm not sure how to fix it. This commit > probably shouldn't go in as-is; I modified the test in order to demonstrate > this problem. > > > This might be a good demonstration of why it is a mistaken design to > > allow whitespaces, which may steer us toward fixing the documentation? > > > > I dunno. What do others think? > > I think allowing whitespace is good at least for the Phabricator project, which > despite being dead, has a number of existing users and existing commits. It > unfortunately has a "Differential revision" trailer with whitespace. I think allowing one white space before the separator is a good idea for the following reasons. First, if people use something like #, !, ~ or % as a separator, for example in the case of a trailer like "Bug #43", it is very natural to have a white space before the # separator. Note that GitLab for example uses the ~N format, where N is a number, for issues, !N for merge requests and %N for milestones. I think Bugzilla and many other bug trackers use a #N format for bug numbers. Also in some languages, like French, the typographical rules when writing regular text is to have a space (technically it's supposed to be a "non breaking space", see: https://en.wikipedia.org/wiki/Non-breaking_space, but in practice people use a regular space most of the time) before a colon. So it is very natural for a number of people in the world to automatically add a white space before a colon. https://en.wikipedia.org/wiki/Colon_(punctuation)#Spacing says: "In print, a thin space was traditionally placed before a colon and a thick space after it. In modern English-language printing, no space is placed before a colon and a single space is placed after it. In French-language typing and printing, the traditional rules are preserved. " I think it would be very annoying for users to find out that a number of otherwise proper trailers would not be taken into account only because they have a white space before the colon. At least there should be an option to allow that.
Christian Couder <christian.couder@gmail.com> writes: > I think allowing one white space before the separator is a good idea > for the following reasons. > ... > I think it would be very annoying for users to find out that a number > of otherwise proper trailers would not be taken into account only > because they have a white space before the colon. At least there > should be an option to allow that. In short, you do not support Max's patch that allows arbitrary number of white spaces anywhere but at the beginning of line, but see a room for unrelated improvement from the current code, namely, to allow exactly one optional space, immediately before the separator and nowhere else. It may be a reasonable thing to do that may not break too many things. I do not know if that is loose enough to satisify Max's original wish, though. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > In short, you do not support Max's patch that allows arbitrary > number of white spaces anywhere but at the beginning of line, > but see a room for unrelated improvement from the current code, > namely, to allow exactly one optional space, immediately before > the separator and nowhere else. Ah, no, sorry, I misread the situation. It's not a room for improvement. It is very close to what the current code already does, i.e. to allow optional spaces immediately before the separator. The difference is that the current code allows arbitrary number of optional spaces, not zero or exactly one. So the only thing we need to do is to update the documentation that Max misread as allowing spaces at arbitrary place to reflect what the code has been doing, i.e. an optional run of spaces is allowed between the "token" and the "separator"? Anybody wants to do the honors to produce such a patch, then? Thanks.
On Fri, Aug 19, 2022 at 6:33 AM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > In short, you do not support Max's patch that allows arbitrary > > number of white spaces anywhere but at the beginning of line, I haven't closely followed the patches that might have tweaked the original default behavior regarding whitespaces in tokens, but I think in general it's not a good idea to change the default behavior for no very good reason, because it could cause regressions in the way others already use `git interpret-trailers`. It could have been Ok to accept that during a few years after the original implementation was merged in 2014, but these days people and tools (like GitLab for example) rely on trailers and `git interpret-trailers` more and more. So in general I think it's better to have a more cautious approach now, like we usually have for other commands, and only allow new options to tweak the default behavior. If one such option appears to be very widely used, then we might decide later that it should become the default behavior. I understand that having commands with many options might not be considered by some of us as good design, but if we don't like that, then we could perhaps just have only one option, maybe called something like "trailer.tokenFormat", or just "trailer.format", that would contain a regex that every token, or trailer, should match. It's a complex subject because there are different and competing viewpoints. On one hand it can be annoying to have some trailers ignored just because they don't match the default format in a very small way. (And rewriting trailers could mean rewriting published history which is not an easy thing to do.) On the other hand if we develop something like "trailer.tokenFormat" that allows everything, then we might consider that we should instead encourage people to do the right thing and to use as much as possible regular trailers with a strict format. So allowing new options to only tweak things in a small way might be the right thing to do after all. > > but see a room for unrelated improvement from the current code, > > namely, to allow exactly one optional space, immediately before > > the separator and nowhere else. > > Ah, no, sorry, I misread the situation. It's not a room for > improvement. It is very close to what the current code already > does, i.e. to allow optional spaces immediately before the > separator. The difference is that the current code allows arbitrary > number of optional spaces, not zero or exactly one. Yeah, I think it makes sense to not change this default behavior, but maybe, if people don't like it for some reason, allow options to tweak it. > So the only thing we need to do is to update the documentation that > Max misread as allowing spaces at arbitrary place to reflect what > the code has been doing, i.e. an optional run of spaces is allowed > between the "token" and the "separator"? Yeah, I also think that the documentation should be very clear (and accurate of course) about where whitespaces and how many of them are allowed by default, and what can possibly be tweaked by options. > Anybody wants to do the honors to produce such a patch, then? Ok, I will give it a try soon.
Hi Christian, On Fri, 19 Aug 2022, Christian Couder wrote: > On Fri, Aug 19, 2022 at 6:33 AM Junio C Hamano <gitster@pobox.com> wrote: > > > > Junio C Hamano <gitster@pobox.com> writes: > > > > > but see a room for unrelated improvement from the current code, > > > namely, to allow exactly one optional space, immediately before the > > > separator and nowhere else. > > > > Ah, no, sorry, I misread the situation. It's not a room for > > improvement. It is very close to what the current code already does, > > i.e. to allow optional spaces immediately before the separator. The > > difference is that the current code allows arbitrary number of > > optional spaces, not zero or exactly one. > > Yeah, I think it makes sense to not change this default behavior, but > maybe, if people don't like it for some reason, allow options to tweak > it. That sounds like a very good compromise to me. Ciao, Dscho
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh index fbec8ad2ef7..db95cb6ee66 100755 --- a/t/t4014-format-patch.sh +++ b/t/t4014-format-patch.sh @@ -1544,8 +1544,7 @@ test_expect_success 'signoff: not really a signoff' ' 4:Subject: [PATCH] subject 8: 9:I want to mention about Signed-off-by: here. - 10: - 11:Signed-off-by: C O Mitter <committer@example.com> + 10:Signed-off-by: C O Mitter <committer@example.com> EOF test_cmp expect actual ' diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh index 97f10905d23..47bf83003ef 100755 --- a/t/t7513-interpret-trailers.sh +++ b/t/t7513-interpret-trailers.sh @@ -1481,6 +1481,46 @@ test_expect_success 'only-trailers omits non-trailer in middle of block' ' test_cmp expected actual ' +test_expect_success 'supports spaces inside token' ' + git config --unset trailer.sign.command && + cat >expected <<-\EOF && + Signed-off-by: nobody <nobody@nowhere> + some other trailer: a value + Signed-off-by: somebody <somebody@somewhere> + EOF + echo "wrote to expected" 1>&2 && + git interpret-trailers --only-trailers >actual <<-\EOF && + subject + + it is important that the trailers below are signed-off-by + so that they meet the "25% trailers Git knows about" heuristic + + Signed-off-by: nobody <nobody@nowhere> + some other trailer: a value + Signed-off-by: somebody <somebody@somewhere> + EOF + test_cmp expected actual +' + +test_expect_success 'does not support space at beginning of token' ' + cat >expected <<-\EOF && + Signed-off-by: nobody <nobody@nowhere> not a trailer: thing + Signed-off-by: somebody <somebody@somewhere> + EOF + echo "wrote to expected" 1>&2 && + git interpret-trailers --only-trailers --unfold >actual <<-\EOF && + subject + + it is important that the trailers below are signed-off-by + so that they meet the "25% trailers Git knows about" heuristic + + Signed-off-by: nobody <nobody@nowhere> + not a trailer: thing + Signed-off-by: somebody <somebody@somewhere> + EOF + test_cmp expected actual +' + test_expect_success 'only input' ' git config trailer.sign.command "echo config-value" && cat >expected <<-\EOF && diff --git a/trailer.c b/trailer.c index d419c20735e..d02a9154512 100644 --- a/trailer.c +++ b/trailer.c @@ -618,17 +618,12 @@ static int token_matches_item(const char *tok, struct arg_item *item, size_t tok */ static ssize_t find_separator(const char *line, const char *separators) { - int whitespace_found = 0; const char *c; for (c = line; *c; c++) { if (strchr(separators, *c)) return c - line; - if (!whitespace_found && (isalnum(*c) || *c == '-')) + if (isalnum(*c) || *c == '-' || (c != line && (*c == ' ' || *c == '\t'))) continue; - if (c != line && (*c == ' ' || *c == '\t')) { - whitespace_found = 1; - continue; - } break; } return -1;