diff mbox series

add-patch: handle splitting hunks with diff.suppressBlankEmpty

Message ID pull.1763.git.1721312619822.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series add-patch: handle splitting hunks with diff.suppressBlankEmpty | expand

Commit Message

Phillip Wood July 18, 2024, 2:23 p.m. UTC
From: Phillip Wood <phillip.wood@dunelm.org.uk>

When "add -p" parses diffs, it looks for context lines starting with a
single space. But when diff.suppressBlankEmpty is in effect, an empty
context line will omit the space, giving us a true empty line. This
confuses the parser, which is unable to split based on such a line.

It's tempting to say that we should just make sure that we generate a

base-commit: 790a17fb19d6eadd16c52e5d284a5c6921744766

Comments

Junio C Hamano July 18, 2024, 4:29 p.m. UTC | #1
"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> When "add -p" parses diffs, it looks for context lines starting with a
> single space. But when diff.suppressBlankEmpty is in effect, an empty
> context line will omit the space, giving us a true empty line. This
> confuses the parser, which is unable to split based on such a line.
>
> It's tempting to say that we should just make sure that we generate a
> diff without that option.  However, although we do not parse hunks that
> the user has manually edited with parse_diff() we do allow the user
> to split such hunks. As POSIX calls the decision of whether to print the
> space here "implementation-defined" we need to handle edited hunks where
> empty context lines omit the space.
>
> So let's handle both cases: a context line either starts with a space or
> consists of a totally empty line by normalizing the first character to a
> space when we parse them. Normalizing the first character rather than
> changing the code to check for a space or newline will hopefully future
> proof against introducing similar bugs if the code is changed.
>
> Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---

Well written.  Thanks for a pleasant read.

> @@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
>  	context_line_count = 0;
>  
>  	while (splittable_into > 1) {
> -		ch = s->plain.buf[current];
> +		ch = normalize_marker(s->plain.buf + current);

I wondered if &s->plain.buf[current] is easier to grok, but what's
written already is good enough ;-)

There is another explicit mention of ' ' in merge_hunks().  Unless
we are normalizing the buffer here (which I do not think we do),
wouldn't we have to make sure that the code over there also knows
that an empty line in a patch is an unmodified empty line?

                if (plain[overlap_end] != ' ')
                        return error(_("expected context line "
                                       "#%d in\n%.*s"),
Phillip Wood July 19, 2024, 3:17 p.m. UTC | #2
Hi Junio

On 18/07/2024 17:29, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> When "add -p" parses diffs, it looks for context lines starting with a
>> single space. But when diff.suppressBlankEmpty is in effect, an empty
>> context line will omit the space, giving us a true empty line. This
>> confuses the parser, which is unable to split based on such a line.
>>
>> It's tempting to say that we should just make sure that we generate a
>> diff without that option.  However, although we do not parse hunks that
>> the user has manually edited with parse_diff() we do allow the user
>> to split such hunks. As POSIX calls the decision of whether to print the
>> space here "implementation-defined" we need to handle edited hunks where
>> empty context lines omit the space.
>>
>> So let's handle both cases: a context line either starts with a space or
>> consists of a totally empty line by normalizing the first character to a
>> space when we parse them. Normalizing the first character rather than
>> changing the code to check for a space or newline will hopefully future
>> proof against introducing similar bugs if the code is changed.
>>
>> Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
>> Helped-by: Jeff King <peff@peff.net>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
> 
> Well written.  Thanks for a pleasant read.

Thanks to Peff for letting me steal his commit message

>> @@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
>>   	context_line_count = 0;
>>   
>>   	while (splittable_into > 1) {
>> -		ch = s->plain.buf[current];
>> +		ch = normalize_marker(s->plain.buf + current);
> 
> I wondered if &s->plain.buf[current] is easier to grok, but what's
> written already is good enough ;-)

Yes I think that would be better

> There is another explicit mention of ' ' in merge_hunks().  Unless
> we are normalizing the buffer here (which I do not think we do),
> wouldn't we have to make sure that the code over there also knows
> that an empty line in a patch is an unmodified empty line?
> 
>                  if (plain[overlap_end] != ' ')
>                          return error(_("expected context line "
>                                         "#%d in\n%.*s"),

Oh dear, I'm not sure how I missed that. I'll fix that and update the 
test to make sure it exercises that code path as well.

Thanks for your comments

Phillip
diff mbox series

Patch

diff without that option.  However, although we do not parse hunks that
the user has manually edited with parse_diff() we do allow the user
to split such hunks. As POSIX calls the decision of whether to print the
space here "implementation-defined" we need to handle edited hunks where
empty context lines omit the space.

So let's handle both cases: a context line either starts with a space or
consists of a totally empty line by normalizing the first character to a
space when we parse them. Normalizing the first character rather than
changing the code to check for a space or newline will hopefully future
proof against introducing similar bugs if the code is changed.

Reported-by: Ilya Tumaykin <itumaykin@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
    add-patch: handle splitting hunks with diff.suppressBlankEmpty
    
    This is an alternative to jk/add-patch-with-suppress-blank-empty which
    was recently discarded from next. I hope that normalizing the context
    marker will simplify any future changes to the code.
    
    While I was writing this I realized that we should be recalculating
    hunk->splittable_into when we re-count the hunk header of it is edited
    but I've left to for a future series.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1763%2Fphillipwood%2Fadd-p-suppress-blank-empty-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1763/phillipwood/add-p-suppress-blank-empty-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1763

 add-patch.c                | 17 ++++++++++++-----
 t/t3701-add-interactive.sh | 11 +++++++++++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index d8ea05ff108..13b2607f544 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -400,6 +400,12 @@  static void complete_file(char marker, struct hunk *hunk)
 		hunk->splittable_into++;
 }
 
+/* Empty context lines may omit the leading ' ' */
+static int normalize_marker(char *p)
+{
+	return p[0] == '\n' || (p[0] == '\r' && p[1] == '\n') ? ' ' : p[0];
+}
+
 static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 {
 	struct strvec args = STRVEC_INIT;
@@ -485,6 +491,7 @@  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 	while (p != pend) {
 		char *eol = memchr(p, '\n', pend - p);
 		const char *deleted = NULL, *mode_change = NULL;
+		char ch = normalize_marker(p);
 
 		if (!eol)
 			eol = pend;
@@ -532,7 +539,7 @@  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			 * Start counting into how many hunks this one can be
 			 * split
 			 */
-			marker = *p;
+			marker = ch;
 		} else if (hunk == &file_diff->head &&
 			   starts_with(p, "new file")) {
 			file_diff->added = 1;
@@ -586,10 +593,10 @@  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			    (int)(eol - (plain->buf + file_diff->head.start)),
 			    plain->buf + file_diff->head.start);
 
-		if ((marker == '-' || marker == '+') && *p == ' ')
+		if ((marker == '-' || marker == '+') && ch == ' ')
 			hunk->splittable_into++;
-		if (marker && *p != '\\')
-			marker = *p;
+		if (marker && ch != '\\')
+			marker = ch;
 
 		p = eol == pend ? pend : eol + 1;
 		hunk->end = p - plain->buf;
@@ -953,7 +960,7 @@  static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
 	context_line_count = 0;
 
 	while (splittable_into > 1) {
-		ch = s->plain.buf[current];
+		ch = normalize_marker(s->plain.buf + current);
 
 		if (!ch)
 			BUG("buffer overrun while splitting hunks");
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 5d78868ac16..351dd2b4332 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -1164,4 +1164,15 @@  test_expect_success 'reset -p with unmerged files' '
 	test_must_be_empty staged
 '
 
+test_expect_success 'hunk splitting works with diff.suppressBlankEmpty' '
+	test_config diff.suppressBlankEmpty true &&
+	test_write_lines a b c "" d e f >file &&
+	git add file &&
+	test_write_lines p q r "" s t u >file &&
+	test_write_lines s n y q | git add -p &&
+	git cat-file blob :file >actual &&
+	test_write_lines a b c "" s t u >expect &&
+	test_cmp expect actual
+'
+
 test_done