[6/6] t4124: fix test for non-compliance diff

Message ID	285c6830c5182cb602d4fe559525083f69a158e9.1584625896.git.congdanhqx@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=WCE0=5E=vger.kernel.org=git-owner@kernel.org> From: =?utf-8?b?xJBvw6BuIFRy4bqnbiBDw7RuZyBEYW5o?= <congdanhqx@gmail.com> To: git@vger.kernel.org Cc: =?utf-8?b?xJBvw6BuIFRy4bqnbiBDw7RuZyBEYW5o?= <congdanhqx@gmail.com> Subject: [PATCH 6/6] t4124: fix test for non-compliance diff Date: Thu, 19 Mar 2020 21:00:07 +0700 Message-Id: <285c6830c5182cb602d4fe559525083f69a158e9.1584625896.git.congdanhqx@gmail.com> In-Reply-To: <cover.1584625896.git.congdanhqx@gmail.com> References: <cover.1584625896.git.congdanhqx@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	fix test failure with busybox \| expand [0/6] fix test failure with busybox [1/6] t4061: use POSIX compliance regex(7) [2/6] test-lib-functions: test_cmp: eval $GIT_TEST_CMP [3/6] t5003: skip conversion test if unzip -a is unavailable [4/6] t5616: use rev-parse instead to get HEAD's object_id [5/6] t7063: use POSIX find(1) syntax [6/6] t4124: fix test for non-compliance diff

Đoàn Trần Công Danh March 19, 2020, 2 p.m. UTC

POSIX's diff(1) requires output in normal diff format.
However, busybox's diff's output is written in unified format.

POSIX requires no option for normal-diff format.

A hint in test-lib-functions::test_cmp said `diff -u` isn't available
everywhere.

Workaround this problem by assuming `diff(1)` output is unified
if we couldn't make anything from normal-diff format.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 t/t4124-apply-ws-rule.sh | 6 ++++++
 1 file changed, 6 insertions(+)

Jeff King March 19, 2020, 4:33 p.m. UTC | #1

On Thu, Mar 19, 2020 at 09:00:07PM +0700, Đoàn Trần Công Danh wrote:

> POSIX's diff(1) requires output in normal diff format.
> However, busybox's diff's output is written in unified format.

That's a pretty big difference. I'm surprised this only produces one
problem in the test scripts. ;)

> POSIX requires no option for normal-diff format.
> 
> A hint in test-lib-functions::test_cmp said `diff -u` isn't available
> everywhere.
> 
> Workaround this problem by assuming `diff(1)` output is unified
> if we couldn't make anything from normal-diff format.

I wonder if we could use "git diff" here. We have to be careful about
circular reasoning in our tests (i.e., making sure we're not verifying
output with the same code that we're testing), but I think here we're
checking how "apply --whitespace=fix" works.

But if this is the only spot, then adjusting to handle unified or normal
diff isn't too bad.

> diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
> index 971a5a7512..2a54ce96b5 100755
> --- a/t/t4124-apply-ws-rule.sh
> +++ b/t/t4124-apply-ws-rule.sh
> @@ -52,6 +52,12 @@ test_fix () {
>  
>  	# find touched lines
>  	$DIFF file target | sed -n -e "s/^> //p" >fixed
> +	if ! test -s fixed; then
> +		$DIFF file target |
> +		grep '^+' |
> +		grep -v '^+++' |
> +		sed -e "s/+//" >fixed
> +	fi

I think those greps could be lumped into sed like:

  sed -ne "s/^+[^+]//p"

(at the cost of missing blank lines, but I think that's OK for our
purposes here; it could be fixed with an ERE).

Could we then make a single invocation that covers both diff formats? We
can further observe that the only thing we do with the "fixed" file is
count the lines, so we can leave the markers. Which means we could ditch
sed entirely and use grep. Something like:

diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
index 971a5a7512..15cb0c81b7 100755
--- a/t/t4124-apply-ws-rule.sh
+++ b/t/t4124-apply-ws-rule.sh
@@ -50,8 +50,9 @@ test_fix () {
 	# fix should not barf
 	apply_patch --whitespace=fix || return 1

-	# find touched lines
-	$DIFF file target | sed -n -e "s/^> //p" >fixed
+	# find touched lines; handle either normal or unified
+	# diff, as system diff may generate either
+	$DIFF file target | grep '^[>+][^+]' >fixed

 	# the changed lines are all expected to change
 	fixed_cnt=$(wc -l <fixed)

seems to work for with both busybox diff and GNU diff.

-Peff

Junio C Hamano March 19, 2020, 10:58 p.m. UTC | #2

Jeff King <peff@peff.net> writes:

> On Thu, Mar 19, 2020 at 09:00:07PM +0700, Đoàn Trần Công Danh wrote:
>
>> POSIX's diff(1) requires output in normal diff format.
>> However, busybox's diff's output is written in unified format.
>
> That's a pretty big difference. I'm surprised this only produces one
> problem in the test scripts. ;)
>
>> POSIX requires no option for normal-diff format.
>> 
>> A hint in test-lib-functions::test_cmp said `diff -u` isn't available
>> everywhere.
>> 
>> Workaround this problem by assuming `diff(1)` output is unified
>> if we couldn't make anything from normal-diff format.

I do not mind working it around, but I am a bit disturbed by an
uneven attitude towards POSIX noncompliance this series has.  If
we were willing to break other people's "sed" that does not do BRE
correctly, instead of using '[+]' trick to accomodate them while
making sure that an implementation that does not use nonstandard
extension and does only BRE, we should just similarly be writing
such an implementation of noncompliant diff off as broken, yet we
bend backwards over to make sure we can work with them here.  

IOW, I do not have trouble changing the test so that it works with
noncompliant "diff".  But then in the same series, I would prefer to
see the existing test keeps working with a possibly noncompliant
"sed" implementation that has been working well with the tests.

> But if this is the only spot, then adjusting to handle unified or normal
> diff isn't too bad.

Yup.

> Could we then make a single invocation that covers both diff formats? We
> can further observe that the only thing we do with the "fixed" file is
> count the lines, so we can leave the markers. Which means we could ditch
> sed entirely and use grep. Something like:
>
> diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
> index 971a5a7512..15cb0c81b7 100755
> --- a/t/t4124-apply-ws-rule.sh
> +++ b/t/t4124-apply-ws-rule.sh
> @@ -50,8 +50,9 @@ test_fix () {
>  	# fix should not barf
>  	apply_patch --whitespace=fix || return 1
>  
> -	# find touched lines
> -	$DIFF file target | sed -n -e "s/^> //p" >fixed
> +	# find touched lines; handle either normal or unified
> +	# diff, as system diff may generate either
> +	$DIFF file target | grep '^[>+][^+]' >fixed
>  
>  	# the changed lines are all expected to change
>  	fixed_cnt=$(wc -l <fixed)
>
> seems to work for with both busybox diff and GNU diff.

Sounds OK to me.

Đoàn Trần Công Danh March 20, 2020, 1:52 a.m. UTC | #3

On 2020-03-19 12:33:34-0400, Jeff King <peff@peff.net> wrote:
> On Thu, Mar 19, 2020 at 09:00:07PM +0700, Đoàn Trần Công Danh wrote:
> 
> > POSIX's diff(1) requires output in normal diff format.
> > However, busybox's diff's output is written in unified format.
> 
> That's a pretty big difference. I'm surprised this only produces one
> problem in the test scripts. ;)
> 
> > POSIX requires no option for normal-diff format.
> > 
> > A hint in test-lib-functions::test_cmp said `diff -u` isn't available
> > everywhere.
> > 
> > Workaround this problem by assuming `diff(1)` output is unified
> > if we couldn't make anything from normal-diff format.
> 
> I wonder if we could use "git diff" here. We have to be careful about
> circular reasoning in our tests (i.e., making sure we're not verifying
> output with the same code that we're testing), but I think here we're
> checking how "apply --whitespace=fix" works.
> 
> But if this is the only spot, then adjusting to handle unified or normal
> diff isn't too bad.
> 
> > diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
> > index 971a5a7512..2a54ce96b5 100755
> > --- a/t/t4124-apply-ws-rule.sh
> > +++ b/t/t4124-apply-ws-rule.sh
> > @@ -52,6 +52,12 @@ test_fix () {
> >  
> >  	# find touched lines
> >  	$DIFF file target | sed -n -e "s/^> //p" >fixed
> > +	if ! test -s fixed; then
> > +		$DIFF file target |
> > +		grep '^+' |
> > +		grep -v '^+++' |
> > +		sed -e "s/+//" >fixed
> > +	fi
> 
> I think those greps could be lumped into sed like:
> 
>   sed -ne "s/^+[^+]//p"
> 
> (at the cost of missing blank lines, but I think that's OK for our
> purposes here; it could be fixed with an ERE).
> 
> Could we then make a single invocation that covers both diff formats? We
> can further observe that the only thing we do with the "fixed" file is
> count the lines, so we can leave the markers. Which means we could ditch
> sed entirely and use grep. Something like:
> 
> diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
> index 971a5a7512..15cb0c81b7 100755
> --- a/t/t4124-apply-ws-rule.sh
> +++ b/t/t4124-apply-ws-rule.sh
> @@ -50,8 +50,9 @@ test_fix () {
>  	# fix should not barf
>  	apply_patch --whitespace=fix || return 1
>  
> -	# find touched lines
> -	$DIFF file target | sed -n -e "s/^> //p" >fixed
> +	# find touched lines; handle either normal or unified
> +	# diff, as system diff may generate either
> +	$DIFF file target | grep '^[>+][^+]' >fixed
>  
>  	# the changed lines are all expected to change
>  	fixed_cnt=$(wc -l <fixed)
> 
> seems to work for with both busybox diff and GNU diff.

3 lines after this one:

	?*) expect_cnt=$(grep "[$1]" <fixed | wc -l) ;;

As of now, we could simply replace sed with grep entirely,
because ! "$1" ~"[>+]".

Considering the complicated of:

	test_expect_success "rule=$rule" '
		git config core.whitespace "$rule" &&
		test_fix "$tt$ts$ti$th"
	'

I think it's better to use sed here.

Jeff King March 20, 2020, 5:20 a.m. UTC | #4

On Thu, Mar 19, 2020 at 03:58:51PM -0700, Junio C Hamano wrote:

> >> Workaround this problem by assuming `diff(1)` output is unified
> >> if we couldn't make anything from normal-diff format.
> 
> I do not mind working it around, but I am a bit disturbed by an
> uneven attitude towards POSIX noncompliance this series has.  If
> we were willing to break other people's "sed" that does not do BRE
> correctly, instead of using '[+]' trick to accomodate them while
> making sure that an implementation that does not use nonstandard
> extension and does only BRE, we should just similarly be writing
> such an implementation of noncompliant diff off as broken, yet we
> bend backwards over to make sure we can work with them here.
> 
> IOW, I do not have trouble changing the test so that it works with
> noncompliant "diff".  But then in the same series, I would prefer to
> see the existing test keeps working with a possibly noncompliant
> "sed" implementation that has been working well with the tests.

I don't think it's inconsistent. Real-world experience trumps standards.
We _know_ that there is a real-world diff that generates only unified
diffs, and it is not too hard to work around it. So we should do so.

A sed that uses ERE and requires backslash-escaping pluses is
theoretical at this point. POSIX forbids it, and I would guess that
working around it would be more than just the "[+]" we found, because
other patterns probably need it, too. But we won't know until we find
one to test on.

So I'm not entirely against "[+]" as a defensive measure. But I have a
slight preference to avoid it until we know it's needed, not because
it's hard to do once, but because I don't want to grow too many
defensive superstitions if we don't know they're warranted.

-Peff

Jeff King March 20, 2020, 5:23 a.m. UTC | #5

On Fri, Mar 20, 2020 at 08:52:23AM +0700, Danh Doan wrote:

> > diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
> > index 971a5a7512..15cb0c81b7 100755
> > --- a/t/t4124-apply-ws-rule.sh
> > +++ b/t/t4124-apply-ws-rule.sh
> > @@ -50,8 +50,9 @@ test_fix () {
> >  	# fix should not barf
> >  	apply_patch --whitespace=fix || return 1
> >  
> > -	# find touched lines
> > -	$DIFF file target | sed -n -e "s/^> //p" >fixed
> > +	# find touched lines; handle either normal or unified
> > +	# diff, as system diff may generate either
> > +	$DIFF file target | grep '^[>+][^+]' >fixed
> >  
> >  	# the changed lines are all expected to change
> >  	fixed_cnt=$(wc -l <fixed)
> > 
> > seems to work for with both busybox diff and GNU diff.
> 
> 3 lines after this one:
> 
> 	?*) expect_cnt=$(grep "[$1]" <fixed | wc -l) ;;
> 
> As of now, we could simply replace sed with grep entirely,
> because ! "$1" ~"[>+]".
> 
> Considering the complicated of:
> 
> 	test_expect_success "rule=$rule" '
> 		git config core.whitespace "$rule" &&
> 		test_fix "$tt$ts$ti$th"
> 	'
> 
> I think it's better to use sed here.

Fair enough. I think it's OK now, but I agree that it puts a pretty
subtle assumption into the test_fix function (and that's far removed
from what actual tests will call it with, so it's easy to miss). Using
sed should be more maintainable.

-Peff

[6/6] t4124: fix test for non-compliance diff

Commit Message

Comments

Patch