diff mbox series

[2/1] t3920: support CR-eating grep

Message ID cbe88abc-c1fb-cb50-6057-47ff27f7a12d@web.de (mailing list archive)
State Accepted
Commit 86325d36e604e872e189e8dba6d914007918311f
Headers show
Series t3920: don't ignore errors of more than one command with `|| true` | expand

Commit Message

René Scharfe Dec. 2, 2022, 4:51 p.m. UTC
grep(1) converts CRLF line endings to CR on current MinGW:

   $ uname -sr
   MINGW64_NT-10.0-22621 3.3.6-341.x86_64

   $ printf 'a\r\n' | hexdump.exe -C
   00000000  61 0d 0a                                          |a..|
   00000003

   $ printf 'a\r\n' | grep . | hexdump.exe -C
   00000000  61 0a                                             |a.|
   00000002

Create the intended test file by grepping the original file with LF
line endings and adding CRs explicitly.

The missing CRs went unnoticed because test_cmp on MinGW ignores line
endings since 4d715ac05c (Windows: a test_cmp that is agnostic to random
LF <> CRLF conversions, 2013-10-26).  Fix this test anyway to avoid
depending on that special test_cmp behavior, especially since this is
the only test that needs it.

Piping the output of grep(1) through append_cr has the side-effect of
ignoring its return value.  That means we no longer need the explicit
"|| true" to support commit messages without a body.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t3920-crlf-messages.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.38.1.windows.1

Comments

Philippe Blain Dec. 2, 2022, 11:14 p.m. UTC | #1
Hi René,

Le 2022-12-02 à 11:51, René Scharfe a écrit :
> grep(1) converts CRLF line endings to CR on current MinGW:
> 
>    $ uname -sr
>    MINGW64_NT-10.0-22621 3.3.6-341.x86_64

I don't really know the conventions in this regards, but for me 
"MinGW" refers to the port of GCC to Windows. Here it would be more 
correct to refer to "the Git for Windows flavor of MSYS2" or something
like this, in my (maybe uninformed) opinion.

> 
>    $ printf 'a\r\n' | hexdump.exe -C
>    00000000  61 0d 0a                                          |a..|
>    00000003
> 
>    $ printf 'a\r\n' | grep . | hexdump.exe -C
>    00000000  61 0a                                             |a.|
>    00000002
> 
> Create the intended test file by grepping the original file with LF
> line endings and adding CRs explicitly.
> 
> The missing CRs went unnoticed because test_cmp on MinGW ignores line
> endings since 4d715ac05c (Windows: a test_cmp that is agnostic to random
> LF <> CRLF conversions, 2013-10-26). 

Reading that commit's messages, if I understand correctly it's more MSYS2's Bash
that is the culprit, rather than its grep, no?

> Fix this test anyway to avoid
> depending on that special test_cmp behavior, especially since this is
> the only test that needs it.

Do you mean the only test in that file, or in the whole Git test suite (which would
mean after this series we could completely remove the Windows-specific 'test_cmp' ) ?

> 
> Piping the output of grep(1) through append_cr has the side-effect of
> ignoring its return value.  That means we no longer need the explicit
> "|| true" to support commit messages without a body.
> 
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  t/t3920-crlf-messages.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
> index a58522c163..67fd2345af 100755
> --- a/t/t3920-crlf-messages.sh
> +++ b/t/t3920-crlf-messages.sh
> @@ -12,7 +12,7 @@ create_crlf_ref () {
>  	cat >.crlf-orig-$branch.txt &&
>  	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
> -	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
> +	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
>  	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
>  	test_tick &&
>  	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
> --
> 2.38.1.windows.1
> 

apart from the minor things in the commit message, the 3 patches look good to me :) 
Here is my Acked-by: for all 3 :

Acked-by: Philippe Blain <levraiphilippeblain@gmail.com>

Thanks for detailed messages as always, it definitely went over my radar at the time
that both 'grep' and  'test_cmp' acted differently on Windows.

Cheers,
Philippe.
p.s. I wonder what happened for format-patch to generate these 2/1, 3/1 and 4/1 patch prefixes :P
Eric Sunshine Dec. 2, 2022, 11:32 p.m. UTC | #2
On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
> grep(1) converts CRLF line endings to CR on current MinGW:

Did you mean s/to CR/to LF/ ?
René Scharfe Dec. 3, 2022, 7:09 a.m. UTC | #3
Am 03.12.22 um 00:14 schrieb Philippe Blain:
> Hi René,
>
> Le 2022-12-02 à 11:51, René Scharfe a écrit :
>> grep(1) converts CRLF line endings to CR on current MinGW:
>>
>>    $ uname -sr
>>    MINGW64_NT-10.0-22621 3.3.6-341.x86_64
>
> I don't really know the conventions in this regards, but for me
> "MinGW" refers to the port of GCC to Windows. Here it would be more
> correct to refer to "the Git for Windows flavor of MSYS2" or something
> like this, in my (maybe uninformed) opinion.

The thing I downloaded and installed is the "Git for Windows SDK", so I
could use that name instead.

>
>>
>>    $ printf 'a\r\n' | hexdump.exe -C
>>    00000000  61 0d 0a                                          |a..|
>>    00000003
>>
>>    $ printf 'a\r\n' | grep . | hexdump.exe -C
>>    00000000  61 0a                                             |a.|
>>    00000002
>>
>> Create the intended test file by grepping the original file with LF
>> line endings and adding CRs explicitly.
>>
>> The missing CRs went unnoticed because test_cmp on MinGW ignores line
>> endings since 4d715ac05c (Windows: a test_cmp that is agnostic to random
>> LF <> CRLF conversions, 2013-10-26).
>
> Reading that commit's messages, if I understand correctly it's more MSYS2's Bash
> that is the culprit, rather than its grep, no?

4d715ac05c blames bash for converting LF to CRLF.  Above I blame grep
for converting CRLF to LF.  The experiment to confirm the latter leaves
the possibility that bash somehow inserts a CR eater between grep and
hexdump.  If it does then it is quite selective; e.g. cat passes CRs on
unscathed:

   $ printf 'a\r\nb\n' | grep . | hexdump.exe -C
   00000000  61 0a 62 0a                                       |a.b.|
   00000004

   $ printf 'a\r\nb\n' | cat | hexdump.exe -C
   00000000  61 0d 0a 62 0a                                    |a..b.|
   00000005

>> Fix this test anyway to avoid
>> depending on that special test_cmp behavior, especially since this is
>> the only test that needs it.
>
> Do you mean the only test in that file, or in the whole Git test suite (which would
> mean after this series we could completely remove the Windows-specific 'test_cmp' ) ?

Both.

>>
>> Piping the output of grep(1) through append_cr has the side-effect of
>> ignoring its return value.  That means we no longer need the explicit
>> "|| true" to support commit messages without a body.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  t/t3920-crlf-messages.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
>> index a58522c163..67fd2345af 100755
>> --- a/t/t3920-crlf-messages.sh
>> +++ b/t/t3920-crlf-messages.sh
>> @@ -12,7 +12,7 @@ create_crlf_ref () {
>>  	cat >.crlf-orig-$branch.txt &&
>>  	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
>>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
>> -	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
>> +	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
>>  	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
>>  	test_tick &&
>>  	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&
>> --
>> 2.38.1.windows.1
>>
>
> apart from the minor things in the commit message, the 3 patches look good to me :)
> Here is my Acked-by: for all 3 :
>
> Acked-by: Philippe Blain <levraiphilippeblain@gmail.com>
>
> Thanks for detailed messages as always, it definitely went over my radar at the time
> that both 'grep' and  'test_cmp' acted differently on Windows.
>
> Cheers,
> Philippe.
> p.s. I wonder what happened for format-patch to generate these 2/1, 3/1 and 4/1 patch prefixes :P

Hand-edited..

René
René Scharfe Dec. 3, 2022, 7:12 a.m. UTC | #4
Am 03.12.22 um 00:32 schrieb Eric Sunshine:
> On Fri, Dec 2, 2022 at 11:51 AM René Scharfe <l.s.r@web.de> wrote:
>> grep(1) converts CRLF line endings to CR on current MinGW:
>
> Did you mean s/to CR/to LF/ ?

Yes.

René
Junio C Hamano Dec. 5, 2022, 1:08 a.m. UTC | #5
René Scharfe <l.s.r@web.de> writes:

>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
> -	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
> +	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&

Doesn't append_cr unconditionally adds CR at the end?  Do we need to
touch this test again when "grep" gets fixed on the platform?
René Scharfe Dec. 5, 2022, 8:28 a.m. UTC | #6
Am 05.12.22 um 02:08 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>  	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
>> -	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
>> +	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
>
> Doesn't append_cr unconditionally adds CR at the end?

It does.

> Do we need to
> touch this test again when "grep" gets fixed on the platform?

Depends on the meaning of "fixed".  If it stops removing CRs then this
line is unaffected -- .crlf-orig-$branch.txt contains no CRs.  If it
starts adding CRs then we'd have a problem with all grep invocations,
which was addressed by 4d715ac05c (Windows: a test_cmp that is agnostic
to random LF <> CRLF conversions, 2013-10-26).

René
Junio C Hamano Dec. 5, 2022, 9:32 a.m. UTC | #7
René Scharfe <l.s.r@web.de> writes:

> Depends on the meaning of "fixed".  If it stops removing CRs then this
> line is unaffected -- .crlf-orig-$branch.txt contains no CRs.

OK, that "fix" was what I was worried about and if there is no
problem with the input we use, that is good ;-)

> If it
> starts adding CRs then we'd have a problem with all grep invocations,
> which was addressed by 4d715ac05c (Windows: a test_cmp that is agnostic
> to random LF <> CRLF conversions, 2013-10-26).

Yeah, but I thought that the unspoken motivation behind recent
changes are so that we do not have to rely on "the differences
between CRLF and LF do not matter" version of test_cmp?

Thanks.
René Scharfe Dec. 5, 2022, 10:43 a.m. UTC | #8
Am 05.12.22 um 10:32 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Depends on the meaning of "fixed".  If it stops removing CRs then this
>> line is unaffected -- .crlf-orig-$branch.txt contains no CRs.
>
> OK, that "fix" was what I was worried about and if there is no
> problem with the input we use, that is good ;-)
>
>> If it
>> starts adding CRs then we'd have a problem with all grep invocations,
>> which was addressed by 4d715ac05c (Windows: a test_cmp that is agnostic
>> to random LF <> CRLF conversions, 2013-10-26).
>
> Yeah, but I thought that the unspoken motivation behind recent
> changes are so that we do not have to rely on "the differences
> between CRLF and LF do not matter" version of test_cmp?

The patch currently enables the removal of mingw_test_cmp; its commit
message mentions it in passing ("[...] especially since this is the only
test that needs it.").  If grep (or bash) would be "fixed" to add CRs
then we'd have to deal with that damage e.g. by keeping mingw_test_cmp.

René
diff mbox series

Patch

diff --git a/t/t3920-crlf-messages.sh b/t/t3920-crlf-messages.sh
index a58522c163..67fd2345af 100755
--- a/t/t3920-crlf-messages.sh
+++ b/t/t3920-crlf-messages.sh
@@ -12,7 +12,7 @@  create_crlf_ref () {
 	cat >.crlf-orig-$branch.txt &&
 	cat .crlf-orig-$branch.txt | append_cr >.crlf-message-$branch.txt &&
 	grep 'Subject' .crlf-orig-$branch.txt | tr '\n' ' ' | sed 's/[ ]*$//' | tr -d '\n' >.crlf-subject-$branch.txt &&
-	{ grep 'Body' .crlf-message-$branch.txt >.crlf-body-$branch.txt || true; } &&
+	grep 'Body' .crlf-orig-$branch.txt | append_cr >.crlf-body-$branch.txt &&
 	LIB_CRLF_BRANCHES="${LIB_CRLF_BRANCHES} ${branch}" &&
 	test_tick &&
 	hash=$(git commit-tree HEAD^{tree} -p HEAD -F .crlf-message-${branch}.txt) &&