diff mbox series

[v3,2/3] ref-filter: handle CRLF at end-of-line more gracefully

Message ID 11d044a4f7feccdf20da6364a1f9bbe934e9981f.1602526169.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series ref-filter: handle CRLF at end-of-line more gracefully | expand

Commit Message

Philippe Blain Oct. 12, 2020, 6:09 p.m. UTC
From: Philippe Blain <levraiphilippeblain@gmail.com>

The ref-filter code does not correctly handle commit or tag messages that
use CRLF as the line terminator. Such messages can be created with the
`--verbatim` option of `git commit` and `git tag`, or by using `git
commit-tree` directly.

The function `find_subpos` in ref-filter.c looks for two consecutive
LFs to find the end of the subject line, a sequence which is absent in
messages using CRLF. This results in the whole message being parsed as
the subject line (`%(contents:subject)`), and the body of the message
(`%(contents:body)`)  being empty.

Moreover, in `copy_subject`, which wants to return the subject as a
single line, '\n' is replaced by space, but '\r' is
untouched.

This impacts the output of `git branch`, `git tag` and `git
for-each-ref`.

This bug is a regression for `git branch --verbose`, which
bisects down to 949af0684c (branch: use ref-filter printing APIs,
2017-01-10).

Fix this bug in ref-filter by hardening the logic in `copy_subject` and
`find_subpos` to correctly parse messages containing CRLF.

Add tests for `branch`, `tag` and `for-each-ref` using
lib-crlf-messages.sh.

The 'make commits' test at the beginning of t3203-branch-output.sh does
not use `test_tick` and thus the commit hashes are not reproducible. For
simplicity, use `test_commit` to create the commits, as the content and
name of the files created in this setup test are irrelevant to the rest
of the test script.

Use `test_cleanup_crlf_refs` in t3203-branch-output.sh and t7004-tag.sh
to avoid having to modify the expected output in later tests.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
 ref-filter.c             | 36 +++++++++++++++++++++---------------
 t/t3203-branch-output.sh | 31 ++++++++++++++++++++++++++-----
 t/t6300-for-each-ref.sh  |  5 +++++
 t/t7004-tag.sh           |  7 +++++++
 4 files changed, 59 insertions(+), 20 deletions(-)

Comments

Junio C Hamano Oct. 12, 2020, 10:24 p.m. UTC | #1
"Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:

> -	for (i = 0; i < len; i++)
> -		if (r[i] == '\n')
> -			r[i] = ' ';
> +	for (int i = 0; i < len; i++) {

We do not allow this in our codebase (yet).

cf. Documentation/CodingGuidelines

 - Declaring a variable in the for loop "for (int i = 0; i < 10; i++)"
   is still not allowed in this codebase.

> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
> index 71818b90f0..c06eca774f 100755
> --- a/t/t3203-branch-output.sh
> +++ b/t/t3203-branch-output.sh
> @@ -3,13 +3,11 @@
>  test_description='git branch display tests'
>  . ./test-lib.sh
>  . "$TEST_DIRECTORY"/lib-terminal.sh
> +. "$TEST_DIRECTORY"/lib-crlf-messages.sh
>  
>  test_expect_success 'make commits' '
> -	echo content >file &&
> -	git add file &&
> -	git commit -m one &&
> -	echo content >>file &&
> -	git commit -a -m two
> +	test_commit one &&
> +	test_commit two
>  '

What does this change have to do with the topic at hand?
Philippe Blain Oct. 14, 2020, 1:09 p.m. UTC | #2
Hi Junio,

> Le 12 oct. 2020 à 18:24, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> "Philippe Blain via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> -	for (i = 0; i < len; i++)
>> -		if (r[i] == '\n')
>> -			r[i] = ' ';
>> +	for (int i = 0; i < len; i++) {
> 
> We do not allow this in our codebase (yet).
> 
> cf. Documentation/CodingGuidelines
> 
> - Declaring a variable in the for loop "for (int i = 0; i < 10; i++)"
>   is still not allowed in this codebase.

Indeed. Will fix (and re-read the guidelines).

> 
>> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
>> index 71818b90f0..c06eca774f 100755
>> --- a/t/t3203-branch-output.sh
>> +++ b/t/t3203-branch-output.sh
>> @@ -3,13 +3,11 @@
>> test_description='git branch display tests'
>> . ./test-lib.sh
>> . "$TEST_DIRECTORY"/lib-terminal.sh
>> +. "$TEST_DIRECTORY"/lib-crlf-messages.sh
>> 
>> test_expect_success 'make commits' '
>> -	echo content >file &&
>> -	git add file &&
>> -	git commit -m one &&
>> -	echo content >>file &&
>> -	git commit -a -m two
>> +	test_commit one &&
>> +	test_commit two
>> '
> 
> What does this change have to do with the topic at hand?

In the previous iteration, the expected output of one of the tests
I was adding had commit hashes, 
so the change above was necessary to make those hashes reproducible.
However in this series I removed the hashes from the expected output
because it would break the "linux-clang" CI job which runs with SHA-256.

Maybe we should add a note in t/README specifically saying that raw commit
hashes should not appear in expected output, but use $(git rev-parse "${ref}")
instead ? Could this be a preparatory patch, or would deserve a separate topic ?
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index c62f6b4822..92d8ca5340 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1097,14 +1097,18 @@  static const char *copy_email(const char *buf, struct used_atom *atom)
 
 static char *copy_subject(const char *buf, unsigned long len)
 {
-	char *r = xmemdupz(buf, len);
-	int i;
+	struct strbuf sb = STRBUF_INIT;
 
-	for (i = 0; i < len; i++)
-		if (r[i] == '\n')
-			r[i] = ' ';
+	for (int i = 0; i < len; i++) {
+		if (buf[i] == '\r' && i + 1 < len && buf[i + 1] == '\n')
+			continue; /* ignore CR in CRLF */
 
-	return r;
+		if (buf[i] == '\n')
+			strbuf_addch(&sb, ' ');
+		else
+			strbuf_addch(&sb, buf[i]);
+	}
+	return strbuf_detach(&sb, NULL);
 }
 
 static void grab_date(const char *buf, struct atom_value *v, const char *atomname)
@@ -1228,20 +1232,22 @@  static void find_subpos(const char *buf,
 
 	/* subject is first non-empty line */
 	*sub = buf;
-	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
-		eol = strchrnul(buf, '\n');
-		if (*eol)
-			eol++;
-		buf = eol;
-	}
+	/* subject goes to first empty line before signature begins */
+	if ((eol = strstr(*sub, "\n\n"))) {
+		eol = eol < *sig ? eol : *sig;
+	/* check if message uses CRLF */
+	} else if (! (eol = strstr(*sub, "\r\n\r\n"))) {
+		/* treat whole message as subject */
+		eol = strrchr(*sub, '\0');
+	}
+	buf = eol;
 	*sublen = buf - *sub;
 	/* drop trailing newline, if present */
-	if (*sublen && (*sub)[*sublen - 1] == '\n')
+	while (*sublen && ((*sub)[*sublen - 1] == '\n' || (*sub)[*sublen - 1] == '\r'))
 		*sublen -= 1;
 
 	/* skip any empty lines */
-	while (*buf == '\n')
+	while (*buf == '\n' || *buf == '\r')
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 71818b90f0..c06eca774f 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -3,13 +3,11 @@ 
 test_description='git branch display tests'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 test_expect_success 'make commits' '
-	echo content >file &&
-	git add file &&
-	git commit -m one &&
-	echo content >>file &&
-	git commit -a -m two
+	test_commit one &&
+	test_commit two
 '
 
 test_expect_success 'make branches' '
@@ -95,6 +93,29 @@  test_expect_success 'git branch --ignore-case --list -v pattern shows branch sum
 	awk "{print \$NF}" <tmp >actual &&
 	test_cmp expect actual
 '
+test_create_crlf_refs
+
+test_expect_success 'git branch -v works with CRLF input' '
+	cat >expect <<-EOF &&
+	  two
+	  one
+	  Subject first line
+	  Subject first line
+	  Subject first line Subject second line
+	  Subject first line Subject second line
+	  Subject first line Subject second line
+	  Subject first line Subject second line
+	EOF
+	git branch -v >tmp &&
+	# Remove first two columns, and the line for the currently checked out branch
+	current=$(git branch --show-current) &&
+	grep -v $current <tmp | awk "{\$1=\$2=\"\"}1"  >actual &&
+	test_cmp expect actual
+'
+
+test_crlf_subject_body_and_contents branch --list crlf*
+
+test_cleanup_crlf_refs
 
 test_expect_success 'git branch -v pattern does not show branch summaries' '
 	test_must_fail git branch -v branch*
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index b359023189..c30940cf7a 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -8,6 +8,7 @@  test_description='for-each-ref test'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 # Mon Jul 3 23:18:43 2006 +0000
 datestamp=1151968723
@@ -1030,4 +1031,8 @@  test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents for-each-ref refs/heads/crlf*
+
 test_done
diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 05f411c821..cda735dab4 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -10,6 +10,7 @@  Tests for operations with tags.'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-gpg.sh
 . "$TEST_DIRECTORY"/lib-terminal.sh
+. "$TEST_DIRECTORY"/lib-crlf-messages.sh
 
 # creating and listing lightweight tags:
 
@@ -1970,6 +1971,12 @@  test_expect_success '--format should list tags as per format given' '
 	test_cmp expect actual
 '
 
+test_create_crlf_refs
+
+test_crlf_subject_body_and_contents tag --list tag-crlf*
+
+test_cleanup_crlf_refs
+
 test_expect_success "set up color tests" '
 	echo "<RED>v1.0<RESET>" >expect.color &&
 	echo "v1.0" >expect.bare &&