diff mbox series

[v2,2/2] mailinfo: support Unicode scissors

Message ID 20190401215334.18678-2-rybak.a.v@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] mailinfo: use starts_with() for clarity | expand

Commit Message

Andrei Rybak April 1, 2019, 9:53 p.m. UTC
Thank you all for review.  Below is the second version of original patch,
addressing comments by Gábor and Peff.

While preparing v2 I found out that U+2702 was already suggested on the list
eight months before cutting at perforation lines was implemented:
https://public-inbox.org/git/200901181656.37813.markus.heidelberg@web.de/T/#m3856d2e5c5f3e1900210b74bf2be8851b92d2271

---- >8 ----
Subject: [PATCH v2 2/2] mailinfo: support Unicode scissors
Date: Mon, 1 Apr 2019 00:00:00 +0000

'git am --scissors' allows cutting a patch from an email at a scissors
line.  Such a line should contain perforation, i.e. hyphens, and a
scissors symbol.  Only ASCII graphics scissors '8<' '>8' '%<' '>%' are
recognized by 'git am --scissors' command at the moment.

Unicode character 'BLACK SCISSORS' (U+2702) has been a part of Unicode
since version 1.0.0 [1].  Since then 'BLACK SCISSORS' also became part
of character set Emoji 1.0, published in 2015 [2].  With its adoption as
an emoji, availability of this character on keyboards has increased.

Support UTF-8 encoding of '✂' in function is_scissors_line, for 'git am
--scissors' to be able to cut at Unicode perforation lines in emails.
Note, that Unicode character '✂' is three bytes in UTF-8 encoding and is
spelled out using hexadecimal escape sequence.

1. https://www.unicode.org/versions/Unicode1.0.0/CodeCharts1.pdf
   https://www.unicode.org/Public/reconstructed/1.0.0/UnicodeData.txt
2. https://unicode.org/Public/emoji/1.0/emoji-data.txt

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
---
 mailinfo.c    |  7 +++++++
 t/t4150-am.sh | 26 +++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 1 deletion(-)

Comments

Jeff King April 2, 2019, 2:36 p.m. UTC | #1
On Mon, Apr 01, 2019 at 11:53:34PM +0200, Andrei Rybak wrote:

> diff --git a/mailinfo.c b/mailinfo.c
> index f4aaa89788..804b07cd8a 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -701,6 +701,13 @@ static int is_scissors_line(const char *line)
>  			c++;
>  			continue;
>  		}
> +		if (starts_with(c, "\xE2\x9C\x82" /* U-2702 ✂ in UTF-8 */)) {
> +			in_perforation = 1;
> +			perforation += 3;
> +			scissors += 3;
> +			c += 2;
> +			continue;
> +		}

It might be worth using skip_prefix() instead of starts_with() to
compute the size automatically. E.g.:

  if (skip_prefix(c, "\xE2\x9C\x82", &end)) {
	size_t len = end - c; /* no magic number needed! */
  }

In fact, I think you could then combine this with the previous
conditional and get:

  if (skip_prefix(c, ">8", &end) ||
      skip_prefix(c, "8<", &end) ||
      skip_prefix(c, ">%", &end) ||
      skip_prefix(c, "%<", &end) ||
      /* U-2702 in UTF-8 */
      skip_prefix(c, "\xE2\x9C\x82", &end)) {
          in_perforation = 1;
	  perforation += end - c;
	  scissors += end - c;
	  c = end - 1; /* minus one to account for loop increment */
  }

(Though I'm still on the fence regarding the whole idea, so do not take
this as an endorsement ;) ).

-Peff
Junio C Hamano April 3, 2019, 6:47 a.m. UTC | #2
Jeff King <peff@peff.net> writes:

> In fact, I think you could then combine this with the previous
> conditional and get:
>
>   if (skip_prefix(c, ">8", &end) ||
>       skip_prefix(c, "8<", &end) ||
>       skip_prefix(c, ">%", &end) ||
>       skip_prefix(c, "%<", &end) ||
>       /* U-2702 in UTF-8 */
>       skip_prefix(c, "\xE2\x9C\x82", &end)) {
>           in_perforation = 1;
> 	  perforation += end - c;
> 	  scissors += end - c;
> 	  c = end - 1; /* minus one to account for loop increment */
>   }
>
> (Though I'm still on the fence regarding the whole idea, so do not take
> this as an endorsement ;) ).

I do not think we want to add more, but use of skip_prefix does
sound sensible.  I was very tempted to suggest

	static const char *scissors[] = {
		">8", "8<", ">%", "%<",
                NULL,
	};
        const char **s;

	for (s = scissors; *s; s++)
		if (skip_prefix, c, *s, &end) {
			in_perforation = 1;
			...
			break;
		}
	}
        if (!s)
		... we are not looking at any of the scissors[] ...

but that would encourage adding more random entries to the array,
which we would want to avoid in order to help reduce the cognirive
load of end-users.

In hindsight, addition of an undocumented '%' was already a mistake.
I wonder how widely it is in use (yes, I am tempted to deprecate and
remove these two to match the code to the docs).
diff mbox series

Patch

diff --git a/mailinfo.c b/mailinfo.c
index f4aaa89788..804b07cd8a 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -701,6 +701,13 @@  static int is_scissors_line(const char *line)
 			c++;
 			continue;
 		}
+		if (starts_with(c, "\xE2\x9C\x82" /* U-2702 ✂ in UTF-8 */)) {
+			in_perforation = 1;
+			perforation += 3;
+			scissors += 3;
+			c += 2;
+			continue;
+		}
 		in_perforation = 0;
 	}
 
diff --git a/t/t4150-am.sh b/t/t4150-am.sh
index 3f7f750cc8..3ea8e8a2cf 100755
--- a/t/t4150-am.sh
+++ b/t/t4150-am.sh
@@ -77,12 +77,20 @@  test_expect_success 'setup: messages' '
 
 	printf "Subject: " >subject-prefix &&
 
-	cat - subject-prefix msg-without-scissors-line >msg-with-scissors-line <<-\EOF
+	cat - subject-prefix msg-without-scissors-line >msg-with-scissors-line <<-\EOF &&
 	This line should not be included in the commit message with --scissors enabled.
 
 	 - - >8 - - remove everything above this line - - >8 - -
 
 	EOF
+
+	cat - subject-prefix msg-without-scissors-line >msg-with-unicode-scissors <<-\EOF
+	Lines above unicode scissors line should not be included in the commit
+	message with --scissors enabled.
+
+	- - - ✂ - - - ✂ - - -
+
+	EOF
 '
 
 test_expect_success setup '
@@ -161,6 +169,12 @@  test_expect_success setup '
 	git format-patch --stdout expected-for-no-scissors^ >patch-with-scissors-line.eml &&
 	git reset --hard HEAD^ &&
 
+	echo file >file &&
+	git add file &&
+	git commit -F msg-with-unicode-scissors &&
+	git format-patch --stdout HEAD^ >patch-with-unicode-scissors.eml &&
+	git reset --hard HEAD^ &&
+
 	sed -n -e "3,\$p" msg >file &&
 	git add file &&
 	test_tick &&
@@ -421,6 +435,16 @@  test_expect_success 'am --scissors cuts the message at the scissors line' '
 	test_cmp_rev expected-for-scissors HEAD
 '
 
+test_expect_success 'am --scissors cuts the message at the unicode scissors line' '
+	rm -fr .git/rebase-apply &&
+	git reset --hard &&
+	git checkout second &&
+	git am --scissors patch-with-unicode-scissors.eml &&
+	test_path_is_missing .git/rebase-apply &&
+	git diff --exit-code expected-for-scissors &&
+	test_cmp_rev expected-for-scissors HEAD
+'
+
 test_expect_success 'am --no-scissors overrides mailinfo.scissors' '
 	rm -fr .git/rebase-apply &&
 	git reset --hard &&