diff mbox series

[v2,16/16] config: allow multi-byte core.commentChar

Message ID 20240312091750.GP95609@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 8b311478ad16b2fe9d2f5b5febec9f5e8f7fd52d
Headers show
Series allow multi-byte core.commentChar | expand

Commit Message

Jeff King March 12, 2024, 9:17 a.m. UTC
Now that all of the code handles multi-byte comment characters, it's
safe to allow users to set them.

There is one special case I kept: we still will not allow an empty
string for the commentChar. While it might make sense in some contexts
(e.g., output where you don't want any comment prefix), there are plenty
where it will behave badly (e.g., all of our starts_with() checks will
indicate that every line is a comment!). It might be reasonable to
assign some meaningful semantics, but it would probably involve checking
how each site behaves. In the interim let's forbid it and we can loosen
things later.

Likewise, the "commentChar cannot be a newline" rule is now extended to
"it cannot contain a newline" (for the same reason: it can confuse our
parsing loops).

Since comment_line_str is used in many parts of the code, it's hard to
cover all possibilities with tests. We can convert the existing
double-semicolon prefix test to show that "git status" works. And we'll
give it a more challenging case in t7507, where we confirm that
git-commit strips out the commit template along with any --verbose text
when reading the edited commit message back in. That covers the basics,
though it's possible there could be issues in more exotic spots (e.g.,
the sequencer todo list uses its own code).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config/core.txt |  4 +++-
 config.c                      | 10 +++++-----
 t/t0030-stripspace.sh         |  7 ++++++-
 t/t7507-commit-verbose.sh     | 10 ++++++++++
 t/t7508-status.sh             |  4 +++-
 5 files changed, 27 insertions(+), 8 deletions(-)

Comments

Kristoffer Haugsbakk March 13, 2024, 6:23 p.m. UTC | #1
Thanks for your work on this. Now I can use dingbats as my comment char.

On Tue, Mar 12, 2024, at 10:17, Jeff King wrote:
> diff --git a/Documentation/config/core.txt
> b/Documentation/config/core.txt
> index 0e8c2832bf..c86b8c8408 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -523,7 +523,9 @@ core.commentChar::
>  	Commands such as `commit` and `tag` that let you edit
>  	messages consider a line that begins with this character
>  	commented, and removes them after the editor returns
> -	(default '#').
> +	(default '#'). Note that this option can take values larger than
> +	a byte (whether a single multi-byte character, or you
> +	could even go wild with a multi-character sequence).

I don’t know if this expanded description focuses a bit much on the
history of the change[1] or if it is intentionally indirect about this
char-is-really-a-string behavior as a sort of easter egg.[2]

Maybe it could be more directly stated like:

  “ Note that this variable can in fact be a string like `foo`; it
    doesn’t have to be a single character.

(Hopefully UTF-8 is implied by “foo”. Or else “føø”.)

Terms like “a byte” and “multi-byte characters” seem a bit too technical
in this context when you can just say “string”.

† 1: (1) What’s a “char”, is it ASCII? (2) It’s ASCII but could in
    principle be made multi-byte (3) And also a multi-byte *string*,
    right? (4) …
† 2: In five years: (1) How come this Git tutorial’s commit message
    template has `(commit)` as the ignore-these-lines marker? How did he
    abuse “comment char” to make a long string? (2) Actually…

❦ Please enter the email reply. Lines starting with '❦' will be ignored,
❦ and an empty message aborts the sendout.
Junio C Hamano March 13, 2024, 6:39 p.m. UTC | #2
"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:

>> +	(default '#'). Note that this option can take values larger than
>> +	a byte (whether a single multi-byte character, or you
>> +	could even go wild with a multi-character sequence).
>
> I don’t know if this expanded description focuses a bit much on the
> history of the change[1] or if it is intentionally indirect about this
> char-is-really-a-string behavior as a sort of easter egg.[2]

> Maybe it could be more directly stated like:
>
>   “ Note that this variable can in fact be a string like `foo`; it
>     doesn’t have to be a single character.
>
> (Hopefully UTF-8 is implied by “foo”. Or else “føø”.)

That's definitely an improvement, but I would say that using a
dingbat instad of "foo", and "single character" -> "single ASCII
character" (or "single byte") would make it even clearer.

Thanks.
Jeff King March 15, 2024, 5:59 a.m. UTC | #3
On Wed, Mar 13, 2024 at 07:23:25PM +0100, Kristoffer Haugsbakk wrote:

> Thanks for your work on this. Now I can use dingbats as my comment char.

Truly we have entered a golden age of technology. ;)

> > @@ -523,7 +523,9 @@ core.commentChar::
> >  	Commands such as `commit` and `tag` that let you edit
> >  	messages consider a line that begins with this character
> >  	commented, and removes them after the editor returns
> > -	(default '#').
> > +	(default '#'). Note that this option can take values larger than
> > +	a byte (whether a single multi-byte character, or you
> > +	could even go wild with a multi-character sequence).
> 
> I don’t know if this expanded description focuses a bit much on the
> history of the change[1] or if it is intentionally indirect about this
> char-is-really-a-string behavior as a sort of easter egg.[2]

Mostly I was worried that people would take "char" in the name to assume
it could only be a single byte (I had originally even started the new
sentence with "Despite the word 'char' in the name, this option
can..."). And that is not just history, but a name we are stuck with
forever[1].

Certainly "char" is an ambiguous term, though. I didn't mean to leave
char-is-a-string as an easter egg; that's what I meant by
"multi-character sequence". Certainly "string" is a shorter way of
saying that. ;) But I wasn't sure its meaning would be obvious without
the word "multi-character". Giving an example as you suggested does
help that.

That said...

> Maybe it could be more directly stated like:
> 
>   “ Note that this variable can in fact be a string like `foo`; it
>     doesn’t have to be a single character.

I actually do think the "string" nature is mostly uninteresting, and I'd
be OK leaving it as an easter egg. What your suggestion doesn't say is
that multi-byte characters are OK. But if we think people will just
assume that in a modern UTF-8 world, then maybe we don't need to say
anything at all?

> (Hopefully UTF-8 is implied by “foo”. Or else “føø”.)

It actually does not have to be UTF-8. If you use an alternate encoding
in your editor (and probably set i18n.commitEncoding to match), I think
everything might just work. (Though to be clear, I think anybody using
non-UTF8 in 2024 deserves our pity either for being crazy or for being
stuck working on an antiquated system).

-Peff
Kristoffer Haugsbakk March 15, 2024, 7:16 a.m. UTC | #4
On Fri, Mar 15, 2024, at 06:59, Jeff King wrote:
> On Wed, Mar 13, 2024 at 07:23:25PM +0100, Kristoffer Haugsbakk wrote:
>
>> Thanks for your work on this. Now I can use dingbats as my comment char.
>
> Truly we have entered a golden age of technology. ;)

QoL features can in aggregate have a surprising impact :)

>
>> > @@ -523,7 +523,9 @@ core.commentChar::
>> >  	Commands such as `commit` and `tag` that let you edit
>> >  	messages consider a line that begins with this character
>> >  	commented, and removes them after the editor returns
>> > -	(default '#').
>> > +	(default '#'). Note that this option can take values larger than
>> > +	a byte (whether a single multi-byte character, or you
>> > +	could even go wild with a multi-character sequence).
>>
>> I don’t know if this expanded description focuses a bit much on the
>> history of the change[1] or if it is intentionally indirect about this
>> char-is-really-a-string behavior as a sort of easter egg.[2]
>
> Mostly I was worried that people would take "char" in the name to assume
> it could only be a single byte (I had originally even started the new
> sentence with "Despite the word 'char' in the name, this option
> can..."). And that is not just history, but a name we are stuck with
> forever[1].

Missing footnote or referring to my footnote?

My suggestion was to use a `core.commentString` alias. Which might
matter for new answers to questions about its use. It might not matter
if in practice most people get their config tips from 1500 point
StackOverflow question about how git-commit(1) keeps swallowing their
GitHub issue numbers (due to automatic linewrap) from 2011.

> Certainly "char" is an ambiguous term, though. I didn't mean to leave
> char-is-a-string as an easter egg; that's what I meant by
> "multi-character sequence". Certainly "string" is a shorter way of
> saying that. ;) But I wasn't sure its meaning would be obvious without
> the word "multi-character". Giving an example as you suggested does
> help that.
>
> That said...
>
>> Maybe it could be more directly stated like:
>>
>>   “ Note that this variable can in fact be a string like `foo`; it
>>     doesn’t have to be a single character.
>
> I actually do think the "string" nature is mostly uninteresting, and I'd
> be OK leaving it as an easter egg.

To my mind a string subsumes a char (multi- or not). Like in programming
languages: some might be used to single-char `#`, but I don’t think they
do a double take when they see languages with `//` or `--`.

> What your suggestion doesn't say is that multi-byte characters are
> OK. But if we think people will just assume that in a modern UTF-8
> world, then maybe we don't need to say anything at all?

Given that we’re mostly in the context of a commit message, an
ASCII-only restriction would feel archaic.

I guess it depends on what the *normal* is in the documentation at
large. As a user I’m used to Git handling the text that I give it.

> It actually does not have to be UTF-8.

Good point. Unicode is more appropriate.

> (Though to be clear, I think anybody using non-UTF8 in 2024 deserves
> our pity either for being crazy or for being stuck working on an
> antiquated system).

I honestly feel blessed that I have to worry so little about text
encoding.
Jeff King March 15, 2024, 8:10 a.m. UTC | #5
On Fri, Mar 15, 2024 at 08:16:53AM +0100, Kristoffer Haugsbakk wrote:

> > Mostly I was worried that people would take "char" in the name to assume
> > it could only be a single byte (I had originally even started the new
> > sentence with "Despite the word 'char' in the name, this option
> > can..."). And that is not just history, but a name we are stuck with
> > forever[1].
> 
> Missing footnote or referring to my footnote?
> 
> My suggestion was to use a `core.commentString` alias. Which might
> matter for new answers to questions about its use. It might not matter
> if in practice most people get their config tips from 1500 point
> StackOverflow question about how git-commit(1) keeps swallowing their
> GitHub issue numbers (due to automatic linewrap) from 2011.

Heh, missing footnote. I was going to say "we could introduce
core.commentStr or similar", but after your comment I searched in the
archive and see that you did indeed already suggest it.

I'm not sure if it would make things more or less confusing to have two
related values. One nice side effect is that the new variable would be
ignored by older versions of Git (whereas by extending core.commentChar,
you end up with config that causes older versions to barf). That
probably doesn't matter that much for most users, but as somebody who
works on Git I frequently run old versions for bug testing, bisection,
and so forth.

> > I actually do think the "string" nature is mostly uninteresting, and I'd
> > be OK leaving it as an easter egg.
> 
> To my mind a string subsumes a char (multi- or not). Like in programming
> languages: some might be used to single-char `#`, but I don’t think they
> do a double take when they see languages with `//` or `--`.

Hmm, good point. I was mostly focused on UTF-8 characters, but "//" is
quite a reasonable thing for people to try. It is probably a better
example than "foo".

> > What your suggestion doesn't say is that multi-byte characters are
> > OK. But if we think people will just assume that in a modern UTF-8
> > world, then maybe we don't need to say anything at all?
> 
> Given that we’re mostly in the context of a commit message, an
> ASCII-only restriction would feel archaic.
>
> I guess it depends on what the *normal* is in the documentation at
> large. As a user I’m used to Git handling the text that I give it.

Right, that's what I was asking. To me "character" means an ASCII byte,
but I think I might be archaic myself. ;) If most of our readers would
just assume that multi-byte characters work, perhaps it is confusing
things to even mention it.

> > It actually does not have to be UTF-8.
> 
> Good point. Unicode is more appropriate.

I think other Unicode encodings are likely to have problems (because
they embed NULs). Specifically I was thinking that you could probably
get away with latin1 or other 8-bit encodings. But again, I really hope
nobody is doing that anymore.

So anyway, adapting your original suggestion based on discussion in the
thread, maybe squash in (to the final patch):

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c86b8c8408..c5a8033df9 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -523,9 +523,8 @@ core.commentChar::
 	Commands such as `commit` and `tag` that let you edit
 	messages consider a line that begins with this character
 	commented, and removes them after the editor returns
-	(default '#'). Note that this option can take values larger than
-	a byte (whether a single multi-byte character, or you
-	could even go wild with a multi-character sequence).
+	(default '#'). Note that this variable can be a string like
+	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
 +
 If set to "auto", `git-commit` would select a character that is not
 the beginning character of any line in existing commit messages.


That's assuming we don't want to go the commentString route, which would
require a bit more re-working of the patch. I'm also open to a more
clever or pretty multi-byte example if we have one. ;)

-Peff
Kristoffer Haugsbakk March 15, 2024, 1:30 p.m. UTC | #6
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index c86b8c8408..c5a8033df9 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -523,9 +523,8 @@ core.commentChar::
>  	Commands such as `commit` and `tag` that let you edit
>  	messages consider a line that begins with this character
>  	commented, and removes them after the editor returns
> -	(default '#'). Note that this option can take values larger than
> -	a byte (whether a single multi-byte character, or you
> -	could even go wild with a multi-character sequence).
> +	(default '#'). Note that this variable can be a string like
> +	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.

This is perfect :)
Junio C Hamano March 15, 2024, 3:40 p.m. UTC | #7
Jeff King <peff@peff.net> writes:

> +	(default '#'). Note that this variable can be a string like
> +	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.

Looking good.

> That's assuming we don't want to go the commentString route, which would
> require a bit more re-working of the patch. I'm also open to a more
> clever or pretty multi-byte example if we have one. ;)

Adding core.commentString can be done long after the dust settles
and I would expect that most of the changes in the patch would not
have to be updated.  The parts that use comment_line_str variable do
not have to change, the documentation needs "core.commentString is a
synonym for core.commentChar, the latter of which is understood by
older versions of Git (but they may use only the first byte of the
string)" or something, but other than that, the existing text after
this patch does not have to be updated.  If we add a proper synonym
support to the config machinery, that would be a sizable project,
but otherwise it would be just another "if (!strcmp()) var = val".

Stepping back a bit, one thing that we do need to mention in this
round is what happens when you use multi-byte sequence and have it
accessed by existing versions of Git.  "use only the first byte" I
wrote above came out of thin air without experimenting or reading
the code, but something like that ought to be part of the "Note
that" paragraph above.

	(default '#'). Note that this variable can be a string like
	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
	Also note that older versions of Git used only the first byte
	(not necessarily a character) of the value of this variable,
	so you may want to be careful if you plan to use versions of
	Git older than 2.45.

or something like that, perhaps.
Jeff King March 16, 2024, 5:50 a.m. UTC | #8
On Fri, Mar 15, 2024 at 08:40:56AM -0700, Junio C Hamano wrote:

> > That's assuming we don't want to go the commentString route, which would
> > require a bit more re-working of the patch. I'm also open to a more
> > clever or pretty multi-byte example if we have one. ;)
> 
> Adding core.commentString can be done long after the dust settles
> and I would expect that most of the changes in the patch would not
> have to be updated.  The parts that use comment_line_str variable do
> not have to change, the documentation needs "core.commentString is a
> synonym for core.commentChar, the latter of which is understood by
> older versions of Git (but they may use only the first byte of the
> string)" or something, but other than that, the existing text after
> this patch does not have to be updated.  If we add a proper synonym
> support to the config machinery, that would be a sizable project,
> but otherwise it would be just another "if (!strcmp()) var = val".

Yeah, I agree we could add core.commentString on top of what's here, as
long as we're OK with core.commentChar starting to accept strings in the
meantime. Which is probably reasonable, and in which case the code
portion of the patch really is just:

diff --git a/config.c b/config.c
index 92c752ed9f..13fb922bf5 100644
--- a/config.c
+++ b/config.c
@@ -1560,7 +1560,8 @@ static int git_default_core_config(const char *var, const char *value,
 	if (!strcmp(var, "core.editor"))
 		return git_config_string(&editor_program, var, value);
 
-	if (!strcmp(var, "core.commentchar")) {
+	if (!strcmp(var, "core.commentchar") ||
+	    !strcmp(var, "core.commentstring")) {
 		if (!value)
 			return config_error_nonbool(var);
 		else if (!strcasecmp(value, "auto"))

(the real work of course being in docs and tests).

If we wanted to distinguish them more (say, core.commentChar remains
as-is but core.commentString allows strings and takes precedence), then
we'd need to do it now to avoid flip-flopping between versions. I don't
see a huge benefit in restricting commentChar though.

> Stepping back a bit, one thing that we do need to mention in this
> round is what happens when you use multi-byte sequence and have it
> accessed by existing versions of Git.  "use only the first byte" I
> wrote above came out of thin air without experimenting or reading
> the code, but something like that ought to be part of the "Note
> that" paragraph above.
> 
> 	(default '#'). Note that this variable can be a string like
> 	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
> 	Also note that older versions of Git used only the first byte
> 	(not necessarily a character) of the value of this variable,
> 	so you may want to be careful if you plan to use versions of
> 	Git older than 2.45.

The current code barfs for anything larger than a byte:

  $ git.v2.44.0 -c core.commentchar=foo stripspace -s
  error: core.commentChar should only be one ASCII character
  fatal: unable to parse 'core.commentchar' from command-line config

I'm mixed on these sorts of version-specific notes in the documentation.
For people who aren't mixing versions, the history is useless noise
(whose value decreases as time goes on and 2.45 becomes "old" itself).
For people who do use older versions, they'd quickly get an error like
the one above.

So I dunno. I'm not strictly opposed, but if this is something we think
is worth warning about, then that implies to me that it is worth
providing a more ergonomic solution like core.commentString.

-Peff
Junio C Hamano March 26, 2024, 10:10 p.m. UTC | #9
Jeff King <peff@peff.net> writes:

> So anyway, adapting your original suggestion based on discussion in the
> thread, maybe squash in (to the final patch):
>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index c86b8c8408..c5a8033df9 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -523,9 +523,8 @@ core.commentChar::
>  	Commands such as `commit` and `tag` that let you edit
>  	messages consider a line that begins with this character
>  	commented, and removes them after the editor returns
> -	(default '#'). Note that this option can take values larger than
> -	a byte (whether a single multi-byte character, or you
> -	could even go wild with a multi-character sequence).
> +	(default '#'). Note that this variable can be a string like
> +	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
>  +
>  If set to "auto", `git-commit` would select a character that is not
>  the beginning character of any line in existing commit messages.
>
>
> That's assuming we don't want to go the commentString route, which would
> require a bit more re-working of the patch. I'm also open to a more
> clever or pretty multi-byte example if we have one. ;)

It has been 10 days since this discussion petered out.

My preference is to introduce core.commentString to avoid confusion
coming from an older Git using the first-byte of a multi-byte
string, or dying upon reading a configuration file meant for a newer
Git, and then let core.commentString override core.commentChar, but
I would prefer to see the discussion participants to raise their
opinions and reach a conclusion.

Thanks.
Kristoffer Haugsbakk March 26, 2024, 10:12 p.m. UTC | #10
On Tue, Mar 26, 2024, at 23:10, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
>> So anyway, adapting your original suggestion based on discussion in the
>> thread, maybe squash in (to the final patch):
>>
>> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
>> index c86b8c8408..c5a8033df9 100644
>> --- a/Documentation/config/core.txt
>> +++ b/Documentation/config/core.txt
>> @@ -523,9 +523,8 @@ core.commentChar::
>>  	Commands such as `commit` and `tag` that let you edit
>>  	messages consider a line that begins with this character
>>  	commented, and removes them after the editor returns
>> -	(default '#'). Note that this option can take values larger than
>> -	a byte (whether a single multi-byte character, or you
>> -	could even go wild with a multi-character sequence).
>> +	(default '#'). Note that this variable can be a string like
>> +	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
>>  +
>>  If set to "auto", `git-commit` would select a character that is not
>>  the beginning character of any line in existing commit messages.
>>
>>
>> That's assuming we don't want to go the commentString route, which would
>> require a bit more re-working of the patch. I'm also open to a more
>> clever or pretty multi-byte example if we have one. ;)
>
> It has been 10 days since this discussion petered out.
>
> My preference is to introduce core.commentString to avoid confusion
> coming from an older Git using the first-byte of a multi-byte
> string, or dying upon reading a configuration file meant for a newer
> Git, and then let core.commentString override core.commentChar, but
> I would prefer to see the discussion participants to raise their
> opinions and reach a conclusion.
>
> Thanks.

Sounds good to me.
Jeff King March 27, 2024, 7:46 a.m. UTC | #11
On Tue, Mar 26, 2024 at 03:10:23PM -0700, Junio C Hamano wrote:

> It has been 10 days since this discussion petered out.

I wrote the last message, so I was waiting for you to respond. ;)

  https://lore.kernel.org/git/20240316055013.GA32145@coredump.intra.peff.net/

> My preference is to introduce core.commentString to avoid confusion
> coming from an older Git using the first-byte of a multi-byte
> string, or dying upon reading a configuration file meant for a newer
> Git, and then let core.commentString override core.commentChar, but
> I would prefer to see the discussion participants to raise their
> opinions and reach a conclusion.

OK. I don't have a strong opinion. Are you OK with core.commentString as
a strict synonym (so last-one-wins and either name overwrites previous)?
Or do you want an override (i.e., commentString always overrides
commentChar, regardless of order). I think it's mostly academic, and the
strict synonym version is much easier to implement.

-Peff
Junio C Hamano March 27, 2024, 2:53 p.m. UTC | #12
Jeff King <peff@peff.net> writes:

> OK. I don't have a strong opinion. Are you OK with core.commentString as
> a strict synonym (so last-one-wins and either name overwrites previous)?
> Or do you want an override (i.e., commentString always overrides
> commentChar, regardless of order). I think it's mostly academic, and the
> strict synonym version is much easier to implement.

When I wrote it, I meant "String is a successor of Char, if both
exists that is used regardless of the order", but either is OK.
Older versions of Git would not understand the "String" version, so
it matters only to those who uses mixed versions of Git and they can
control the last-one-wins in their configuration file, I would
guess.

Thanks.
diff mbox series

Patch

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 0e8c2832bf..c86b8c8408 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -523,7 +523,9 @@  core.commentChar::
 	Commands such as `commit` and `tag` that let you edit
 	messages consider a line that begins with this character
 	commented, and removes them after the editor returns
-	(default '#').
+	(default '#'). Note that this option can take values larger than
+	a byte (whether a single multi-byte character, or you
+	could even go wild with a multi-character sequence).
 +
 If set to "auto", `git-commit` would select a character that is not
 the beginning character of any line in existing commit messages.
diff --git a/config.c b/config.c
index 7e5dbca4bd..92c752ed9f 100644
--- a/config.c
+++ b/config.c
@@ -1565,13 +1565,13 @@  static int git_default_core_config(const char *var, const char *value,
 			return config_error_nonbool(var);
 		else if (!strcasecmp(value, "auto"))
 			auto_comment_line_char = 1;
-		else if (value[0] && !value[1]) {
-			if (value[0] == '\n')
-				return error(_("core.commentChar cannot be newline"));
-			comment_line_str = xstrfmt("%c", value[0]);
+		else if (value[0]) {
+			if (strchr(value, '\n'))
+				return error(_("core.commentChar cannot contain newline"));
+			comment_line_str = xstrdup(value);
 			auto_comment_line_char = 0;
 		} else
-			return error(_("core.commentChar should only be one ASCII character"));
+			return error(_("core.commentChar must have at least one character"));
 		return 0;
 	}
 
diff --git a/t/t0030-stripspace.sh b/t/t0030-stripspace.sh
index e399dd9189..a161faf702 100755
--- a/t/t0030-stripspace.sh
+++ b/t/t0030-stripspace.sh
@@ -403,7 +403,12 @@  test_expect_success 'strip comments with changed comment char' '
 
 test_expect_success 'newline as commentchar is forbidden' '
 	test_must_fail git -c core.commentChar="$LF" stripspace -s 2>err &&
-	grep "core.commentChar cannot be newline" err
+	grep "core.commentChar cannot contain newline" err
+'
+
+test_expect_success 'empty commentchar is forbidden' '
+	test_must_fail git -c core.commentchar= stripspace -s 2>err &&
+	grep "core.commentChar must have at least one character" err
 '
 
 test_expect_success '-c with single line' '
diff --git a/t/t7507-commit-verbose.sh b/t/t7507-commit-verbose.sh
index c3281b192e..4c7db19ce7 100755
--- a/t/t7507-commit-verbose.sh
+++ b/t/t7507-commit-verbose.sh
@@ -101,6 +101,16 @@  test_expect_success 'verbose diff is stripped out with set core.commentChar' '
 	test_grep "Aborting commit due to empty commit message." err
 '
 
+test_expect_success 'verbose diff is stripped with multi-byte comment char' '
+	(
+		GIT_EDITOR=cat &&
+		export GIT_EDITOR &&
+		test_must_fail git -c core.commentchar="foo>" commit -a -v >out 2>err
+	) &&
+	grep "^foo> " out &&
+	test_grep "Aborting commit due to empty commit message." err
+'
+
 test_expect_success 'status does not verbose without --verbose' '
 	git status >actual &&
 	! grep "^diff --git" actual
diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index a3c18a4fc2..10ed8b32bc 100755
--- a/t/t7508-status.sh
+++ b/t/t7508-status.sh
@@ -1403,7 +1403,9 @@  test_expect_success "status (core.commentchar with submodule summary)" '
 
 test_expect_success "status (core.commentchar with two chars with submodule summary)" '
 	test_config core.commentchar ";;" &&
-	test_must_fail git -c status.displayCommentPrefix=true status
+	sed "s/^/;/" <expect >expect.double &&
+	git -c status.displayCommentPrefix=true status >output &&
+	test_cmp expect.double output
 '
 
 test_expect_success "--ignore-submodules=all suppresses submodule summary" '