Message ID | 20190401215334.18678-2-rybak.a.v@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/2] mailinfo: use starts_with() for clarity | expand |
On Mon, Apr 01, 2019 at 11:53:34PM +0200, Andrei Rybak wrote: > diff --git a/mailinfo.c b/mailinfo.c > index f4aaa89788..804b07cd8a 100644 > --- a/mailinfo.c > +++ b/mailinfo.c > @@ -701,6 +701,13 @@ static int is_scissors_line(const char *line) > c++; > continue; > } > + if (starts_with(c, "\xE2\x9C\x82" /* U-2702 ✂ in UTF-8 */)) { > + in_perforation = 1; > + perforation += 3; > + scissors += 3; > + c += 2; > + continue; > + } It might be worth using skip_prefix() instead of starts_with() to compute the size automatically. E.g.: if (skip_prefix(c, "\xE2\x9C\x82", &end)) { size_t len = end - c; /* no magic number needed! */ } In fact, I think you could then combine this with the previous conditional and get: if (skip_prefix(c, ">8", &end) || skip_prefix(c, "8<", &end) || skip_prefix(c, ">%", &end) || skip_prefix(c, "%<", &end) || /* U-2702 in UTF-8 */ skip_prefix(c, "\xE2\x9C\x82", &end)) { in_perforation = 1; perforation += end - c; scissors += end - c; c = end - 1; /* minus one to account for loop increment */ } (Though I'm still on the fence regarding the whole idea, so do not take this as an endorsement ;) ). -Peff
Jeff King <peff@peff.net> writes: > In fact, I think you could then combine this with the previous > conditional and get: > > if (skip_prefix(c, ">8", &end) || > skip_prefix(c, "8<", &end) || > skip_prefix(c, ">%", &end) || > skip_prefix(c, "%<", &end) || > /* U-2702 in UTF-8 */ > skip_prefix(c, "\xE2\x9C\x82", &end)) { > in_perforation = 1; > perforation += end - c; > scissors += end - c; > c = end - 1; /* minus one to account for loop increment */ > } > > (Though I'm still on the fence regarding the whole idea, so do not take > this as an endorsement ;) ). I do not think we want to add more, but use of skip_prefix does sound sensible. I was very tempted to suggest static const char *scissors[] = { ">8", "8<", ">%", "%<", NULL, }; const char **s; for (s = scissors; *s; s++) if (skip_prefix, c, *s, &end) { in_perforation = 1; ... break; } } if (!s) ... we are not looking at any of the scissors[] ... but that would encourage adding more random entries to the array, which we would want to avoid in order to help reduce the cognirive load of end-users. In hindsight, addition of an undocumented '%' was already a mistake. I wonder how widely it is in use (yes, I am tempted to deprecate and remove these two to match the code to the docs).
diff --git a/mailinfo.c b/mailinfo.c index f4aaa89788..804b07cd8a 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -701,6 +701,13 @@ static int is_scissors_line(const char *line) c++; continue; } + if (starts_with(c, "\xE2\x9C\x82" /* U-2702 ✂ in UTF-8 */)) { + in_perforation = 1; + perforation += 3; + scissors += 3; + c += 2; + continue; + } in_perforation = 0; } diff --git a/t/t4150-am.sh b/t/t4150-am.sh index 3f7f750cc8..3ea8e8a2cf 100755 --- a/t/t4150-am.sh +++ b/t/t4150-am.sh @@ -77,12 +77,20 @@ test_expect_success 'setup: messages' ' printf "Subject: " >subject-prefix && - cat - subject-prefix msg-without-scissors-line >msg-with-scissors-line <<-\EOF + cat - subject-prefix msg-without-scissors-line >msg-with-scissors-line <<-\EOF && This line should not be included in the commit message with --scissors enabled. - - >8 - - remove everything above this line - - >8 - - EOF + + cat - subject-prefix msg-without-scissors-line >msg-with-unicode-scissors <<-\EOF + Lines above unicode scissors line should not be included in the commit + message with --scissors enabled. + + - - - ✂ - - - ✂ - - - + + EOF ' test_expect_success setup ' @@ -161,6 +169,12 @@ test_expect_success setup ' git format-patch --stdout expected-for-no-scissors^ >patch-with-scissors-line.eml && git reset --hard HEAD^ && + echo file >file && + git add file && + git commit -F msg-with-unicode-scissors && + git format-patch --stdout HEAD^ >patch-with-unicode-scissors.eml && + git reset --hard HEAD^ && + sed -n -e "3,\$p" msg >file && git add file && test_tick && @@ -421,6 +435,16 @@ test_expect_success 'am --scissors cuts the message at the scissors line' ' test_cmp_rev expected-for-scissors HEAD ' +test_expect_success 'am --scissors cuts the message at the unicode scissors line' ' + rm -fr .git/rebase-apply && + git reset --hard && + git checkout second && + git am --scissors patch-with-unicode-scissors.eml && + test_path_is_missing .git/rebase-apply && + git diff --exit-code expected-for-scissors && + test_cmp_rev expected-for-scissors HEAD +' + test_expect_success 'am --no-scissors overrides mailinfo.scissors' ' rm -fr .git/rebase-apply && git reset --hard &&
Thank you all for review. Below is the second version of original patch, addressing comments by Gábor and Peff. While preparing v2 I found out that U+2702 was already suggested on the list eight months before cutting at perforation lines was implemented: https://public-inbox.org/git/200901181656.37813.markus.heidelberg@web.de/T/#m3856d2e5c5f3e1900210b74bf2be8851b92d2271 ---- >8 ---- Subject: [PATCH v2 2/2] mailinfo: support Unicode scissors Date: Mon, 1 Apr 2019 00:00:00 +0000 'git am --scissors' allows cutting a patch from an email at a scissors line. Such a line should contain perforation, i.e. hyphens, and a scissors symbol. Only ASCII graphics scissors '8<' '>8' '%<' '>%' are recognized by 'git am --scissors' command at the moment. Unicode character 'BLACK SCISSORS' (U+2702) has been a part of Unicode since version 1.0.0 [1]. Since then 'BLACK SCISSORS' also became part of character set Emoji 1.0, published in 2015 [2]. With its adoption as an emoji, availability of this character on keyboards has increased. Support UTF-8 encoding of '✂' in function is_scissors_line, for 'git am --scissors' to be able to cut at Unicode perforation lines in emails. Note, that Unicode character '✂' is three bytes in UTF-8 encoding and is spelled out using hexadecimal escape sequence. 1. https://www.unicode.org/versions/Unicode1.0.0/CodeCharts1.pdf https://www.unicode.org/Public/reconstructed/1.0.0/UnicodeData.txt 2. https://unicode.org/Public/emoji/1.0/emoji-data.txt Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> --- mailinfo.c | 7 +++++++ t/t4150-am.sh | 26 +++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 1 deletion(-)