diff mbox series

[v2,7/8] t4124: fix test for non-compliant diff(1)

Message ID 10f39c3d30d13e9141f081f985a0620954cc7493.1584838148.git.congdanhqx@gmail.com (mailing list archive)
State New, archived
Headers show
Series fix test failure with busybox | expand

Commit Message

Đoàn Trần Công Danh March 22, 2020, 12:55 a.m. UTC
POSIX's diff(1) requires output in normal diff format.
However, busybox's diff's output is written in unified format.

POSIX requires no option for normal-diff format.

A hint in test-lib-functions::test_cmp said `diff -u` isn't available
everywhere.

Workaround this problem by assuming `diff(1)` output is unified
if we couldn't make anything from normal-diff format.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 t/t4124-apply-ws-rule.sh | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Johannes Schindelin March 23, 2020, 1:58 p.m. UTC | #1
Hi,

On Sun, 22 Mar 2020, Đoàn Trần Công Danh wrote:

> POSIX's diff(1) requires output in normal diff format.
> However, busybox's diff's output is written in unified format.
>
> POSIX requires no option for normal-diff format.
>
> A hint in test-lib-functions::test_cmp said `diff -u` isn't available
> everywhere.
>
> Workaround this problem by assuming `diff(1)` output is unified
> if we couldn't make anything from normal-diff format.
>
> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
> ---
>  t/t4124-apply-ws-rule.sh | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
> index 971a5a7512..075b1912be 100755
> --- a/t/t4124-apply-ws-rule.sh
> +++ b/t/t4124-apply-ws-rule.sh
> @@ -52,6 +52,12 @@ test_fix () {
>
>  	# find touched lines
>  	$DIFF file target | sed -n -e "s/^> //p" >fixed
> +	# busybox's diff(1) output unified format
> +	if ! test -s fixed; then
> +		$DIFF file target |
> +		grep -v '^+++ target' |
> +		sed -e "/^+/s/+//" >fixed
> +	fi

In my patches (which are too unpolished to contribute, I have not found
time to clean them up in several years), I do this differently:

-- snip --
commit cb2f3a28dbf40b92d3d9ca0f3177cd5afb7c4196
Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Date:   Wed Jul 5 22:21:57 2017 +0200

    t4124: avoid using "normal" diff mode

    Everybody and their dogs, cats and other pets settled on using unified
    diffs. It is a really quaint holdover from a long-gone era that GNU diff
    outputs "normal" diff by default.

    Yet, t4124 relied on that mode.

    This mode is so out of fashion in the meantime, though, that e.g.
    BusyBox' diff decided not even to bother to support it. It only supports
    unified diffs.

    So let's just switch away from "normal" diffs and use unified diffs, as
    we really are only interested in the `+` lines.

    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
index 971a5a7512ac..133557b99333 100755
--- a/t/t4124-apply-ws-rule.sh
+++ b/t/t4124-apply-ws-rule.sh
@@ -51,7 +51,7 @@ test_fix () {
        apply_patch --whitespace=fix || return 1

        # find touched lines
-       $DIFF file target | sed -n -e "s/^> //p" >fixed
+       $DIFF -u file target | sed -n -e "3,\$s/^+//p" >fixed

        # the changed lines are all expected to change
        fixed_cnt=$(wc -l <fixed)
-- snap --

Food for thought?

Ciao,
Dscho

>
>  	# the changed lines are all expected to change
>  	fixed_cnt=$(wc -l <fixed)
> --
> 2.26.0.rc2.310.g2932bb562d
>
>
>
Đoàn Trần Công Danh March 23, 2020, 3:04 p.m. UTC | #2
On 2020-03-23 14:58:13+0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
> 
> On Sun, 22 Mar 2020, Đoàn Trần Công Danh wrote:
> 
> > POSIX's diff(1) requires output in normal diff format.
> > However, busybox's diff's output is written in unified format.
> >
> > POSIX requires no option for normal-diff format.
> >
> > A hint in test-lib-functions::test_cmp said `diff -u` isn't available
> > everywhere.
> >
> > Workaround this problem by assuming `diff(1)` output is unified
> > if we couldn't make anything from normal-diff format.
> >
> > Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
> > ---
> >  t/t4124-apply-ws-rule.sh | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
> > index 971a5a7512..075b1912be 100755
> > --- a/t/t4124-apply-ws-rule.sh
> > +++ b/t/t4124-apply-ws-rule.sh
> > @@ -52,6 +52,12 @@ test_fix () {
> >
> >  	# find touched lines
> >  	$DIFF file target | sed -n -e "s/^> //p" >fixed
> > +	# busybox's diff(1) output unified format
> > +	if ! test -s fixed; then
> > +		$DIFF file target |
> > +		grep -v '^+++ target' |
> > +		sed -e "/^+/s/+//" >fixed
> > +	fi
> 
> In my patches (which are too unpolished to contribute, I have not found
> time to clean them up in several years), I do this differently:
> 
> -- snip --
> commit cb2f3a28dbf40b92d3d9ca0f3177cd5afb7c4196
> Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Date:   Wed Jul 5 22:21:57 2017 +0200
> 
>     t4124: avoid using "normal" diff mode
> 
>     Everybody and their dogs, cats and other pets settled on using unified
>     diffs. It is a really quaint holdover from a long-gone era that GNU diff
>     outputs "normal" diff by default.
> 
>     Yet, t4124 relied on that mode.
> 
>     This mode is so out of fashion in the meantime, though, that e.g.
>     BusyBox' diff decided not even to bother to support it. It only supports
>     unified diffs.
> 
>     So let's just switch away from "normal" diffs and use unified diffs, as
>     we really are only interested in the `+` lines.
> 
>     Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
> index 971a5a7512ac..133557b99333 100755
> --- a/t/t4124-apply-ws-rule.sh
> +++ b/t/t4124-apply-ws-rule.sh
> @@ -51,7 +51,7 @@ test_fix () {
>         apply_patch --whitespace=fix || return 1
> 
>         # find touched lines
> -       $DIFF file target | sed -n -e "s/^> //p" >fixed
> +       $DIFF -u file target | sed -n -e "3,\$s/^+//p" >fixed
> 
>         # the changed lines are all expected to change
>         fixed_cnt=$(wc -l <fixed)
> -- snap --
> 
> Food for thought?

A comment in test-lib-functions::test_cmp mentions that
there _is_ a diff out there that doesn't understand "-u".
I don't know which one is it.

If we choose to use "diff -u" here, we've made a certain assumption,
should we flip the switch in test_cmp, too?
Junio C Hamano March 23, 2020, 8:50 p.m. UTC | #3
Danh Doan <congdanhqx@gmail.com> writes:

> A comment in test-lib-functions::test_cmp mentions that
> there _is_ a diff out there that doesn't understand "-u".

That came from 82ebb0b6 (add test_cmp function for test scripts,
2008-03-12).

The change history at the end of the page:

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html

suggests that POSIX did not require "diff -u" until Issue 7, so it
is not surprising that the lack of "diff -u" to cause test_cmp to
fail was of real concern back in March 2008, as the application of
the errata have been only an year and half old, according to:

  https://www.opengroup.org/austin/docs/austin_325.txt

I vaguely recall that some open source projects only took the copied
context diffs and not unified ones the last time I checked, but
admittedly (1) that "last time" was a long time ago and (2) our
popularity and the fact that we do not understand copied context
[*1*] may have forced people to move away from "-c" and adopt "-u"
at the same time.  

So it might be OK to write off any system that does not understand
"diff -u" as an unusable trash these days ;-)

IOW, I think I am fine with assuming "diff -u" is available, like
Dscho's patch does.

Thanks.


[Footnote]

*1* It used to be my desire to teach "git apply" and "git diff" to
    also work with copied context format, but procrastination made
    it less and less relevant X-<.
Đoàn Trần Công Danh March 24, 2020, 3:40 a.m. UTC | #4
On 2020-03-23 13:50:48-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Danh Doan <congdanhqx@gmail.com> writes:
> 
> > A comment in test-lib-functions::test_cmp mentions that
> > there _is_ a diff out there that doesn't understand "-u".
> 
> That came from 82ebb0b6 (add test_cmp function for test scripts,
> 2008-03-12).
> 
> The change history at the end of the page:
> 
>   https://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html
> 
> suggests that POSIX did not require "diff -u" until Issue 7, so it
> is not surprising that the lack of "diff -u" to cause test_cmp to
> fail was of real concern back in March 2008, as the application of
> the errata have been only an year and half old, according to:
> 
>   https://www.opengroup.org/austin/docs/austin_325.txt
> 
> I vaguely recall that some open source projects only took the copied
> context diffs and not unified ones the last time I checked, but
> admittedly (1) that "last time" was a long time ago and (2) our
> popularity and the fact that we do not understand copied context
> [*1*] may have forced people to move away from "-c" and adopt "-u"
> at the same time.  
> 
> So it might be OK to write off any system that does not understand
> "diff -u" as an unusable trash these days ;-)

From the setting in "config.mak.uname", it's likely those systems
don't understand "diff -u":

- Solaris 5.6, 5.7, 5.8, 5.9: both of them are un-supported version
  by their vendor
- AIX, only AIX 5.3 doesn't understand "-u",
  the end-of-support AIX 6.1's diff(1) understand "-u" (1)
- HP-UX: From what I can collect, HP-UX still conforms to UNIX-03, and
  its diff(1) doesn't understand "-u"

Hence, if we're going to drop support for system that doesn't
understand "diff -u", we're going to:
- remove support for those variables:
  + GIT_TEST_CMP_USE_COPIED_CONTEXT
  + GIT_TEST_CMP
- drop support for:
  + Solaris 5.{6,7,8,} and AIX 5.3, which will be fine
  + HP-UX: which needs to be discussed

[1]: https://public.dhe.ibm.com/systems/power/docs/aix/61/aixcmds2_pdf.pdf page 133(143)
Junio C Hamano March 24, 2020, 6:47 p.m. UTC | #5
Danh Doan <congdanhqx@gmail.com> writes:

> Hence, if we're going to drop support for system that doesn't
> understand "diff -u", we're going to:
> - remove support for those variables:
>   + GIT_TEST_CMP_USE_COPIED_CONTEXT

Folks who prefer "-c" even their platforms all are capable of "-u"
may miss this, but I think that is a very small minority, and they
would be OK (it is after all only needed when diagnosing test
failures of our test suite, and by definition, those who are working
to improve Git would know how to read "-u" output).

>   + GIT_TEST_CMP

I am not sure why you need to drop this one?  This is more about
switching between "diff" and "cmp", and there are reasons why folks
prefer latter especially when they are not debugging the tests.

> - drop support for:
>   + Solaris 5.{6,7,8,} and AIX 5.3, which will be fine
>   + HP-UX: which needs to be discussed
>
> [1]: https://public.dhe.ibm.com/systems/power/docs/aix/61/aixcmds2_pdf.pdf page 133(143)
Johannes Schindelin March 24, 2020, 10:29 p.m. UTC | #6
Hi,

On Tue, 24 Mar 2020, Danh Doan wrote:

> On 2020-03-23 13:50:48-0700, Junio C Hamano <gitster@pobox.com> wrote:
> > Danh Doan <congdanhqx@gmail.com> writes:
> >
> > > A comment in test-lib-functions::test_cmp mentions that
> > > there _is_ a diff out there that doesn't understand "-u".
> >
> > That came from 82ebb0b6 (add test_cmp function for test scripts,
> > 2008-03-12).
> >
> > The change history at the end of the page:
> >
> >   https://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html
> >
> > suggests that POSIX did not require "diff -u" until Issue 7, so it
> > is not surprising that the lack of "diff -u" to cause test_cmp to
> > fail was of real concern back in March 2008, as the application of
> > the errata have been only an year and half old, according to:
> >
> >   https://www.opengroup.org/austin/docs/austin_325.txt
> >
> > I vaguely recall that some open source projects only took the copied
> > context diffs and not unified ones the last time I checked, but
> > admittedly (1) that "last time" was a long time ago and (2) our
> > popularity and the fact that we do not understand copied context
> > [*1*] may have forced people to move away from "-c" and adopt "-u"
> > at the same time.
> >
> > So it might be OK to write off any system that does not understand
> > "diff -u" as an unusable trash these days ;-)
>
> From the setting in "config.mak.uname", it's likely those systems
> don't understand "diff -u":
>
> - Solaris 5.6, 5.7, 5.8, 5.9: both of them are un-supported version
>   by their vendor
> - AIX, only AIX 5.3 doesn't understand "-u",
>   the end-of-support AIX 6.1's diff(1) understand "-u" (1)
> - HP-UX: From what I can collect, HP-UX still conforms to UNIX-03, and
>   its diff(1) doesn't understand "-u"
>
> Hence, if we're going to drop support for system that doesn't
> understand "diff -u", we're going to:
> - remove support for those variables:
>   + GIT_TEST_CMP_USE_COPIED_CONTEXT
>   + GIT_TEST_CMP
> - drop support for:
>   + Solaris 5.{6,7,8,} and AIX 5.3, which will be fine
>   + HP-UX: which needs to be discussed
>
> [1]: https://public.dhe.ibm.com/systems/power/docs/aix/61/aixcmds2_pdf.pdf page 133(143)

With this explanation, I totally agree that your patch is better than
mine.

Thanks,
Dscho
Junio C Hamano March 24, 2020, 11:37 p.m. UTC | #7
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> With this explanation, I totally agree that your patch is better than
> mine.

Meaning that we'd have a fallback checker that understands "diff -u"
output, because the presense of "diff -u" cannot be relied upon and
we'd need to keep the original that understands "diff -c" anyway?

If that is what you meant, I am 100% fine with that.
Đoàn Trần Công Danh March 25, 2020, 2:24 p.m. UTC | #8
On 2020-03-24 11:47:32-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Danh Doan <congdanhqx@gmail.com> writes:
> 
> > Hence, if we're going to drop support for system that doesn't
> > understand "diff -u", we're going to:
> > - remove support for those variables:
> >   + GIT_TEST_CMP_USE_COPIED_CONTEXT
> 
> Folks who prefer "-c" even their platforms all are capable of "-u"
> may miss this, but I think that is a very small minority, and they
> would be OK (it is after all only needed when diagnosing test
> failures of our test suite, and by definition, those who are working
> to improve Git would know how to read "-u" output).
> 
> >   + GIT_TEST_CMP
> 
> I am not sure why you need to drop this one?  This is more about
> switching between "diff" and "cmp", and there are reasons why folks
> prefer latter especially when they are not debugging the tests.

I was thinking it could simplify the test_cmp code.
There was a problem which needs to be addressed by 2/8.

Yes, people may prefer to use cmp(1) because cmp(1) should be faster
than diff(1).

Anyway, it seems like we've decided to keep using normal-diff because
of HP-UX.
Johannes Schindelin March 25, 2020, 6:23 p.m. UTC | #9
Hi Junio,

On Tue, 24 Mar 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > With this explanation, I totally agree that your patch is better than
> > mine.
>
> Meaning that we'd have a fallback checker that understands "diff -u"
> output, because the presense of "diff -u" cannot be relied upon and
> we'd need to keep the original that understands "diff -c" anyway?
>
> If that is what you meant, I am 100% fine with that.

Yes, that's what I meant. Thanks for clearing this up!

Ciao,
Dscho
diff mbox series

Patch

diff --git a/t/t4124-apply-ws-rule.sh b/t/t4124-apply-ws-rule.sh
index 971a5a7512..075b1912be 100755
--- a/t/t4124-apply-ws-rule.sh
+++ b/t/t4124-apply-ws-rule.sh
@@ -52,6 +52,12 @@  test_fix () {
 
 	# find touched lines
 	$DIFF file target | sed -n -e "s/^> //p" >fixed
+	# busybox's diff(1) output unified format
+	if ! test -s fixed; then
+		$DIFF file target |
+		grep -v '^+++ target' |
+		sed -e "/^+/s/+//" >fixed
+	fi
 
 	# the changed lines are all expected to change
 	fixed_cnt=$(wc -l <fixed)