[1/3] t0028: eliminate non-standard usage of printf

Message ID	20191031092618.29073-2-congdanhqx@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=xNa/=YY=vger.kernel.org=git-owner@kernel.org> From: Doan Tran Cong Danh <congdanhqx@gmail.com> To: git@vger.kernel.org Cc: Doan Tran Cong Danh <congdanhqx@gmail.com> Subject: [PATCH 1/3] t0028: eliminate non-standard usage of printf Date: Thu, 31 Oct 2019 16:26:16 +0700 Message-Id: <20191031092618.29073-2-congdanhqx@gmail.com> In-Reply-To: <20191031092618.29073-1-congdanhqx@gmail.com> References: <20191031092618.29073-1-congdanhqx@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Linux with musl libc improvement \| expand [0/3] Linux with musl libc improvement [1/3] t0028: eliminate non-standard usage of printf [2/3] configure.ac: define ICONV_OMITS_BOM if necessary [3/3] sequencer: reencode to utf-8 before arrange rebase's todo list

Message ID

20191031092618.29073-2-congdanhqx@gmail.com (mailing list archive)

State

New, archived

Headers

From: Doan Tran Cong Danh <congdanhqx@gmail.com>
To: git@vger.kernel.org
Cc: Doan Tran Cong Danh <congdanhqx@gmail.com>
Subject: [PATCH 1/3] t0028: eliminate non-standard usage of printf
Date: Thu, 31 Oct 2019 16:26:16 +0700
Message-Id: <20191031092618.29073-2-congdanhqx@gmail.com>
In-Reply-To: <20191031092618.29073-1-congdanhqx@gmail.com>
References: <20191031092618.29073-1-congdanhqx@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

Linux with musl libc improvement | expand

Commit Message

Đoàn Trần Công Danh Oct. 31, 2019, 9:26 a.m. UTC

man 1p printf:
   In addition to the escape sequences shown in the Base Definitions
   volume of POSIX.1‐2008, Chapter 5, File Format Notation ('\\',
   '\a', '\b', '\f', '\n', '\r', '\t', '\v'), "\ddd", where ddd is a
   one, two, or three-digit octal number, shall be written as a byte
   with the numeric value specified by the octal number.

printf '\xfe\xff' in an extension of some libc.

With dash:
$ printf '\xfe\xff' | xxd
00000000: 5c78 6665 5c78 6666                      \xfe\xff

Correct its usage.

Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com>
---

Notes:
    Despite that dash's printf doesn't accept \x escape sequence.
    
    My glibc box (with sh linked to dash) can run the test just fine.
    But my musl box couldn't run the test, (because the header).

 t/t0028-working-tree-encoding.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Jeff King Oct. 31, 2019, 5:41 p.m. UTC | #1

On Thu, Oct 31, 2019 at 04:26:16PM +0700, Doan Tran Cong Danh wrote:

> man 1p printf:
>    In addition to the escape sequences shown in the Base Definitions
>    volume of POSIX.1‐2008, Chapter 5, File Format Notation ('\\',
>    '\a', '\b', '\f', '\n', '\r', '\t', '\v'), "\ddd", where ddd is a
>    one, two, or three-digit octal number, shall be written as a byte
>    with the numeric value specified by the octal number.
> 
> printf '\xfe\xff' in an extension of some libc.

I don't think it's libc here at all, but rather the shell.

And as you note, dash does not handle this:

> With dash:
> $ printf '\xfe\xff' | xxd
> 00000000: 5c78 6665 5c78 6666                      \xfe\xff

Which makes me wonder how these tests worked at all for people on say,
Debian systems.

I think the answer is that they don't trigger this prereq:

>  write_utf16 () {
>  	if test_have_prereq NO_UTF16_BOM
>  	then
> -		printf '\xfe\xff'
> +		printf '\376\377'
>  	fi &&
>  	iconv -f UTF-8 -t UTF-16
>  }

which comes from:

  test_lazy_prereq NO_UTF16_BOM '
          test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6
  '

Presumably on your system iconv is using musl's internal iconv, which
behaves differently than the glibc one.

So I think your patch is the right thing to do (hex escapes in shell
printf are definitely not portable), but we might want to note something
like:

  This wasn't caught by most people running the tests, even though
  common shells like dash don't handle hex escapes, because their
  systems don't trigger the NO_UTF16_BOM prereq. But systems with musl
  libc do; when combined with dash, the test fails.

-Peff

brian m. carlson Oct. 31, 2019, 7:50 p.m. UTC | #2

On 2019-10-31 at 09:26:16, Doan Tran Cong Danh wrote:
> diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
> index 7aa0945d8d..bfc4fb9af5 100755
> --- a/t/t0028-working-tree-encoding.sh
> +++ b/t/t0028-working-tree-encoding.sh
> @@ -17,7 +17,7 @@ test_lazy_prereq NO_UTF32_BOM '
>  write_utf16 () {
>  	if test_have_prereq NO_UTF16_BOM
>  	then
> -		printf '\xfe\xff'
> +		printf '\376\377'
>  	fi &&
>  	iconv -f UTF-8 -t UTF-16
>  }
> @@ -25,7 +25,7 @@ write_utf16 () {
>  write_utf32 () {
>  	if test_have_prereq NO_UTF32_BOM
>  	then
> -		printf '\x00\x00\xfe\xff'
> +		printf '\0\0\376\377'
>  	fi &&
>  	iconv -f UTF-8 -t UTF-32
>  }

Yeah, this patch looks obviously correct.  For some reason, I knew that
printf(1) didn't accept hex sequences and yet I still wrote them.

As Peff pointed out, this is because we only trigger this case on musl;
musl is the only known implementation of iconv that produces big-endian
data without a BOM for UTF-16 and UTF-32.

Đoàn Trần Công Danh Nov. 1, 2019, 1:33 a.m. UTC | #3

On 2019-10-31 13:41:18 -0400, Jeff King wrote:
> So I think your patch is the right thing to do (hex escapes in shell
> printf are definitely not portable), but we might want to note something
> like:
> 
>   This wasn't caught by most people running the tests, even though
>   common shells like dash don't handle hex escapes, because their
>   systems don't trigger the NO_UTF16_BOM prereq. But systems with musl
>   libc do; when combined with dash, the test fails.

Make sense. I will take this note in the reroll.

diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 7aa0945d8d..bfc4fb9af5 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -17,7 +17,7 @@  test_lazy_prereq NO_UTF32_BOM '
 write_utf16 () {
 	if test_have_prereq NO_UTF16_BOM
 	then
-		printf '\xfe\xff'
+		printf '\376\377'
 	fi &&
 	iconv -f UTF-8 -t UTF-16
 }
@@ -25,7 +25,7 @@  write_utf16 () {
 write_utf32 () {
 	if test_have_prereq NO_UTF32_BOM
 	then
-		printf '\x00\x00\xfe\xff'
+		printf '\0\0\376\377'
 	fi &&
 	iconv -f UTF-8 -t UTF-32
 }

[1/3] t0028: eliminate non-standard usage of printf

Commit Message

Comments

Patch