[2/2] builtin add -p: fix hunk splitting

Message ID	5d5639c2b0474680850b7adbb7c5ec81d124eb50.1640010777.git.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <5d5639c2b0474680850b7adbb7c5ec81d124eb50.1640010777.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1100.git.1640010777.gitgitgadget@gmail.com> References: <pull.1100.git.1640010777.gitgitgadget@gmail.com> Date: Mon, 20 Dec 2021 14:32:57 +0000 Subject: [PATCH 2/2] builtin add -p: fix hunk splitting MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fcc: Sent To: git@vger.kernel.org Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>, SZEDER =?utf-8?b?R8Oh?= =?utf-8?b?Ym9y?= <szeder.dev@gmail.com>, Phillip Wood <phillip.wood@dunelm.org.uk>, Phillip Wood <phillip.wood@dunelm.org.uk> Precedence: bulk From: Phillip Wood <phillip.wood@dunelm.org.uk>
Series	builtin add -p: fix hunk splitting \| expand [0/2] builtin add -p: fix hunk splitting [1/2] t3701: clean up hunk splitting tests [2/2] builtin add -p: fix hunk splitting

Message ID

5d5639c2b0474680850b7adbb7c5ec81d124eb50.1640010777.git.gitgitgadget@gmail.com (mailing list archive)

State

Superseded

Headers

Message-Id: 
 <5d5639c2b0474680850b7adbb7c5ec81d124eb50.1640010777.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.1100.git.1640010777.gitgitgadget@gmail.com>
References: <pull.1100.git.1640010777.gitgitgadget@gmail.com>
Date: Mon, 20 Dec 2021 14:32:57 +0000
Subject: [PATCH 2/2] builtin add -p: fix hunk splitting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Fcc: Sent
To: git@vger.kernel.org
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>, SZEDER =?utf-8?b?R8Oh?=
	=?utf-8?b?Ym9y?= <szeder.dev@gmail.com>,
 Phillip Wood <phillip.wood@dunelm.org.uk>,
 Phillip Wood <phillip.wood@dunelm.org.uk>
Precedence: bulk
From: Phillip Wood <phillip.wood@dunelm.org.uk>

Series

builtin add -p: fix hunk splitting | expand

Commit Message

Phillip Wood Dec. 20, 2021, 2:32 p.m. UTC

From: Phillip Wood <phillip.wood@dunelm.org.uk>

To determine whether a hunk can be split a counter is incremented each
time a context line follows an insertion or deletion. If at the end of
the hunk the value of this counter is greater than one then the hunk
can be split into that number of smaller hunks. If the last hunk in a
file ends with an insertion or deletion then there is no following
context line and the counter will not be incremented. This case is
already handled at the end of the loop where counter is incremented if
the last hunk ended with an insertion or deletion. Unfortunately there
is no similar check between files (likely because the perl version
only ever parses one diff at a time). Fix this by checking if the last
hunk ended with an insertion or deletion when we see the diff header
of a new file and extend the existing regression test.

Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 add-patch.c                |  7 ++++++
 t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
 2 files changed, 49 insertions(+), 4 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Dec. 20, 2021, 7:06 p.m. UTC | #1

On Mon, Dec 20 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> To determine whether a hunk can be split a counter is incremented each
> time a context line follows an insertion or deletion. If at the end of
> the hunk the value of this counter is greater than one then the hunk
> can be split into that number of smaller hunks. If the last hunk in a
> file ends with an insertion or deletion then there is no following
> context line and the counter will not be incremented. This case is
> already handled at the end of the loop where counter is incremented if
> the last hunk ended with an insertion or deletion. Unfortunately there
> is no similar check between files (likely because the perl version
> only ever parses one diff at a time). Fix this by checking if the last
> hunk ended with an insertion or deletion when we see the diff header
> of a new file and extend the existing regression test.
>
> Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  add-patch.c                |  7 ++++++
>  t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
>  2 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/add-patch.c b/add-patch.c
> index 8c41cdfe39b..5cea70666e9 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -472,6 +472,13 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>  			eol = pend;
>  
>  		if (starts_with(p, "diff ")) {
> +			if (marker == '-' || marker == '+')
> +				/*
> +				 * Last hunk ended in non-context line (i.e. it
> +				 * appended lines to the file, so there are no
> +				 * trailing context lines).
> +				 */
> +				hunk->splittable_into++;

I wondered if factoring out these several "marker == '-' || marker ==
'+'" cases in parse_diff() into a "is_plus_minus(marker)" was worth it,
but probably not.

>  			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
>  				   file_diff_alloc);
>  			file_diff = s->file_diff + s->file_diff_nr - 1;
> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
> index 77de0029ba5..94537a6b40a 100755
> --- a/t/t3701-add-interactive.sh
> +++ b/t/t3701-add-interactive.sh
> @@ -326,7 +326,9 @@ test_expect_success 'correct message when there is nothing to do' '
>  test_expect_success 'setup again' '
>  	git reset --hard &&
>  	test_chmod +x file &&
> -	echo content >>file
> +	echo content >>file &&
> +	test_write_lines A B C D>file2 &&

style nit: "cmd args >file2" not "cmd args>file2"

> @@ -373,8 +411,8 @@ test_expect_success 'setup expected' '
>  test_expect_success 'add first line works' '
>  	git commit -am "clear local changes" &&
>  	git apply patch &&
> -	test_write_lines s y y | git add -p file 2>error >raw-output &&
> -	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
> +	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
> +	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>  	       -e "/^[-+@ \\\\]"/p raw-output >output &&
>  	test_must_be_empty error &&
>  	git diff --cached >diff &&

style/diff nit: maybe worth it to in 1/2 do some version of:

    test_write_lines ... >lines &&
    git ... <lines .. &&
    ...
    sed -n \
    	-e ... \
        -e ... \
        >output

Just to make the diff smaller, i.e. just the "test_write_lines" line
would be modified here.

The changes themselves & this series LGTM.

Junio C Hamano Dec. 20, 2021, 9:30 p.m. UTC | #2

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> To determine whether a hunk can be split a counter is incremented each
> time a context line follows an insertion or deletion. If at the end of
> the hunk the value of this counter is greater than one then the hunk
> can be split into that number of smaller hunks. If the last hunk in a
> file ends with an insertion or deletion then there is no following
> context line and the counter will not be incremented. This case is
> already handled at the end of the loop where counter is incremented if
> the last hunk ended with an insertion or deletion. Unfortunately there
> is no similar check between files (likely because the perl version
> only ever parses one diff at a time).

In other words, the original laid out the code in such a way that
such a bug will be impossible, and the rewrite broke it because it
rolled both "next file" and "next hunk" into the same loop?

> Fix this by checking if the last
> hunk ended with an insertion or deletion when we see the diff header
> of a new file and extend the existing regression test.

You should be able to explain what end-user visible bug is in a
simple single sentence before all of the above.

"The C reimplementation of 'add -p' fails to split a hunk when the
hunk ends with addition or deletion without post context line." or
something like that.

>  		if (starts_with(p, "diff ")) {
> +			if (marker == '-' || marker == '+')
> +				/*
> +				 * Last hunk ended in non-context line (i.e. it
> +				 * appended lines to the file, so there are no
> +				 * trailing context lines).
> +				 */
> +				hunk->splittable_into++;

This looks correct but unsatisfactory.  We have the same processing
immediately after loop---what is common between them is that this is
a process to "conclude" the hunks for the file we have been reading
the patch for.

Can we at least make a helper function that identifies what it does
clearly by its name, and use it here and after the loop, to clarify
what is going on?  Then you do not need the 5-line comment there.

		if (starts_with(p, "diff ")) {
+			conclude_file(hunk, marker);

or something like that, perhaps.

Thanks.

Phillip Wood Jan. 11, 2022, 11:13 a.m. UTC | #3

Hi Ævar

On 20/12/2021 19:06, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 20 2021, Phillip Wood via GitGitGadget wrote:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> To determine whether a hunk can be split a counter is incremented each
>> time a context line follows an insertion or deletion. If at the end of
>> the hunk the value of this counter is greater than one then the hunk
>> can be split into that number of smaller hunks. If the last hunk in a
>> file ends with an insertion or deletion then there is no following
>> context line and the counter will not be incremented. This case is
>> already handled at the end of the loop where counter is incremented if
>> the last hunk ended with an insertion or deletion. Unfortunately there
>> is no similar check between files (likely because the perl version
>> only ever parses one diff at a time). Fix this by checking if the last
>> hunk ended with an insertion or deletion when we see the diff header
>> of a new file and extend the existing regression test.
>>
>> Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>   add-patch.c                |  7 ++++++
>>   t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
>>   2 files changed, 49 insertions(+), 4 deletions(-)
>>
>> diff --git a/add-patch.c b/add-patch.c
>> index 8c41cdfe39b..5cea70666e9 100644
>> --- a/add-patch.c
>> +++ b/add-patch.c
>> @@ -472,6 +472,13 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>>   			eol = pend;
>>   
>>   		if (starts_with(p, "diff ")) {
>> +			if (marker == '-' || marker == '+')
>> +				/*
>> +				 * Last hunk ended in non-context line (i.e. it
>> +				 * appended lines to the file, so there are no
>> +				 * trailing context lines).
>> +				 */
>> +				hunk->splittable_into++;
> 
> I wondered if factoring out these several "marker == '-' || marker ==
> '+'" cases in parse_diff() into a "is_plus_minus(marker)" was worth it,
> but probably not.

Yeah in the end I just factored out this hunk into a new function but I 
didn't add a function for "marker == '-' || marker ==
 > '+'"

>>   			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
>>   				   file_diff_alloc);
>>   			file_diff = s->file_diff + s->file_diff_nr - 1;
>> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
>> index 77de0029ba5..94537a6b40a 100755
>> --- a/t/t3701-add-interactive.sh
>> +++ b/t/t3701-add-interactive.sh
>> @@ -326,7 +326,9 @@ test_expect_success 'correct message when there is nothing to do' '
>>   test_expect_success 'setup again' '
>>   	git reset --hard &&
>>   	test_chmod +x file &&
>> -	echo content >>file
>> +	echo content >>file &&
>> +	test_write_lines A B C D>file2 &&
> 
> style nit: "cmd args >file2" not "cmd args>file2"
> 
>> @@ -373,8 +411,8 @@ test_expect_success 'setup expected' '
>>   test_expect_success 'add first line works' '
>>   	git commit -am "clear local changes" &&
>>   	git apply patch &&
>> -	test_write_lines s y y | git add -p file 2>error >raw-output &&
>> -	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>> +	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
>> +	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>>   	       -e "/^[-+@ \\\\]"/p raw-output >output &&
>>   	test_must_be_empty error &&
>>   	git diff --cached >diff &&
> 
> style/diff nit: maybe worth it to in 1/2 do some version of:
> 
>      test_write_lines ... >lines &&
>      git ... <lines .. &&
>      ...
>      sed -n \
>      	-e ... \
>          -e ... \
>          >output
> 
> Just to make the diff smaller, i.e. just the "test_write_lines" line
> would be modified here.

In the end I decided to leave this as is, while refactoring slightly 
simplifies this patch it makes the previous one bigger and means that 
would need to be reviewed again.


> The changes themselves & this series LGTM.

Thanks

Best Wishes

Phillip

Ævar Arnfjörð Bjarmason Jan. 11, 2022, 11:44 a.m. UTC | #4

On Tue, Jan 11 2022, Phillip Wood wrote:

> Hi Ævar
>
> On 20/12/2021 19:06, Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Dec 20 2021, Phillip Wood via GitGitGadget wrote:
>> 
>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>
>>> To determine whether a hunk can be split a counter is incremented each
>>> time a context line follows an insertion or deletion. If at the end of
>>> the hunk the value of this counter is greater than one then the hunk
>>> can be split into that number of smaller hunks. If the last hunk in a
>>> file ends with an insertion or deletion then there is no following
>>> context line and the counter will not be incremented. This case is
>>> already handled at the end of the loop where counter is incremented if
>>> the last hunk ended with an insertion or deletion. Unfortunately there
>>> is no similar check between files (likely because the perl version
>>> only ever parses one diff at a time). Fix this by checking if the last
>>> hunk ended with an insertion or deletion when we see the diff header
>>> of a new file and extend the existing regression test.
>>>
>>> Reproted-by: SZEDER Gábor <szeder.dev@gmail.com>
>>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>>> ---
>>>   add-patch.c                |  7 ++++++
>>>   t/t3701-add-interactive.sh | 46 ++++++++++++++++++++++++++++++++++----
>>>   2 files changed, 49 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/add-patch.c b/add-patch.c
>>> index 8c41cdfe39b..5cea70666e9 100644
>>> --- a/add-patch.c
>>> +++ b/add-patch.c
>>> @@ -472,6 +472,13 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
>>>   			eol = pend;
>>>     		if (starts_with(p, "diff ")) {
>>> +			if (marker == '-' || marker == '+')
>>> +				/*
>>> +				 * Last hunk ended in non-context line (i.e. it
>>> +				 * appended lines to the file, so there are no
>>> +				 * trailing context lines).
>>> +				 */
>>> +				hunk->splittable_into++;
>> I wondered if factoring out these several "marker == '-' || marker
>> ==
>> '+'" cases in parse_diff() into a "is_plus_minus(marker)" was worth it,
>> but probably not.
>
> Yeah in the end I just factored out this hunk into a new function but
> I didn't add a function for "marker == '-' || marker ==
>> '+'"
>
>>>   			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
>>>   				   file_diff_alloc);
>>>   			file_diff = s->file_diff + s->file_diff_nr - 1;
>>> diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
>>> index 77de0029ba5..94537a6b40a 100755
>>> --- a/t/t3701-add-interactive.sh
>>> +++ b/t/t3701-add-interactive.sh
>>> @@ -326,7 +326,9 @@ test_expect_success 'correct message when there is nothing to do' '
>>>   test_expect_success 'setup again' '
>>>   	git reset --hard &&
>>>   	test_chmod +x file &&
>>> -	echo content >>file
>>> +	echo content >>file &&
>>> +	test_write_lines A B C D>file2 &&
>> style nit: "cmd args >file2" not "cmd args>file2"
>> 
>>> @@ -373,8 +411,8 @@ test_expect_success 'setup expected' '
>>>   test_expect_success 'add first line works' '
>>>   	git commit -am "clear local changes" &&
>>>   	git apply patch &&
>>> -	test_write_lines s y y | git add -p file 2>error >raw-output &&
>>> -	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>>> +	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
>>> +	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
>>>   	       -e "/^[-+@ \\\\]"/p raw-output >output &&
>>>   	test_must_be_empty error &&
>>>   	git diff --cached >diff &&
>> style/diff nit: maybe worth it to in 1/2 do some version of:
>>      test_write_lines ... >lines &&
>>      git ... <lines .. &&
>>      ...
>>      sed -n \
>>      	-e ... \
>>          -e ... \
>>          >output
>> Just to make the diff smaller, i.e. just the "test_write_lines" line
>> would be modified here.
>
> In the end I decided to leave this as is, while refactoring slightly
> simplifies this patch it makes the previous one bigger and means that 
> would need to be reviewed again.

All sounds good to me. Just stuff I thought I'd point out in case you
thought it made sense. Going with it as-is is fine too.

>> The changes themselves & this series LGTM.
>
> Thanks
>
> Best Wishes
>
> Phillip

diff --git a/add-patch.c b/add-patch.c
index 8c41cdfe39b..5cea70666e9 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -472,6 +472,13 @@  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			eol = pend;
 
 		if (starts_with(p, "diff ")) {
+			if (marker == '-' || marker == '+')
+				/*
+				 * Last hunk ended in non-context line (i.e. it
+				 * appended lines to the file, so there are no
+				 * trailing context lines).
+				 */
+				hunk->splittable_into++;
 			ALLOC_GROW_BY(s->file_diff, s->file_diff_nr, 1,
 				   file_diff_alloc);
 			file_diff = s->file_diff + s->file_diff_nr - 1;
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 77de0029ba5..94537a6b40a 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -326,7 +326,9 @@  test_expect_success 'correct message when there is nothing to do' '
 test_expect_success 'setup again' '
 	git reset --hard &&
 	test_chmod +x file &&
-	echo content >>file
+	echo content >>file &&
+	test_write_lines A B C D>file2 &&
+	git add file2
 '
 
 # Write the patch file with a new line at the top and bottom
@@ -341,13 +343,27 @@  test_expect_success 'setup patch' '
 	 content
 	+lastline
 	\ No newline at end of file
+	diff --git a/file2 b/file2
+	index 8422d40..35b930a 100644
+	--- a/file2
+	+++ b/file2
+	@@ -1,4 +1,5 @@
+	-A
+	+Z
+	 B
+	+Y
+	 C
+	-D
+	+X
 	EOF
 '
 
 # Expected output, diff is similar to the patch but w/ diff at the top
 test_expect_success 'setup expected' '
 	echo diff --git a/file b/file >expected &&
-	sed "/^index/s/ 100644/ 100755/" patch >>expected &&
+	sed -e "/^index 180b47c/s/ 100644/ 100755/" \
+	    -e /1,5/s//1,4/ \
+	    -e /Y/d patch >>expected &&
 	cat >expected-output <<-\EOF
 	--- a/file
 	+++ b/file
@@ -366,6 +382,28 @@  test_expect_success 'setup expected' '
 	 content
 	+lastline
 	\ No newline at end of file
+	--- a/file2
+	+++ b/file2
+	@@ -1,4 +1,5 @@
+	-A
+	+Z
+	 B
+	+Y
+	 C
+	-D
+	+X
+	@@ -1,2 +1,2 @@
+	-A
+	+Z
+	 B
+	@@ -2,2 +2,3 @@
+	 B
+	+Y
+	 C
+	@@ -3,2 +4,2 @@
+	 C
+	-D
+	+X
 	EOF
 '
 
@@ -373,8 +411,8 @@  test_expect_success 'setup expected' '
 test_expect_success 'add first line works' '
 	git commit -am "clear local changes" &&
 	git apply patch &&
-	test_write_lines s y y | git add -p file 2>error >raw-output &&
-	sed -n -e "s/^([1-2]\/[1-2]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
+	test_write_lines s y y s y n y | git add -p 2>error >raw-output &&
+	sed -n -e "s/^([1-9]\/[1-9]) Stage this hunk[^@]*\(@@ .*\)/\1/" \
 	       -e "/^[-+@ \\\\]"/p raw-output >output &&
 	test_must_be_empty error &&
 	git diff --cached >diff &&

[2/2] builtin add -p: fix hunk splitting

Commit Message

Comments

Patch