Message ID | 20191031092618.29073-2-congdanhqx@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Linux with musl libc improvement | expand |
On Thu, Oct 31, 2019 at 04:26:16PM +0700, Doan Tran Cong Danh wrote: > man 1p printf: > In addition to the escape sequences shown in the Base Definitions > volume of POSIX.1‐2008, Chapter 5, File Format Notation ('\\', > '\a', '\b', '\f', '\n', '\r', '\t', '\v'), "\ddd", where ddd is a > one, two, or three-digit octal number, shall be written as a byte > with the numeric value specified by the octal number. > > printf '\xfe\xff' in an extension of some libc. I don't think it's libc here at all, but rather the shell. And as you note, dash does not handle this: > With dash: > $ printf '\xfe\xff' | xxd > 00000000: 5c78 6665 5c78 6666 \xfe\xff Which makes me wonder how these tests worked at all for people on say, Debian systems. I think the answer is that they don't trigger this prereq: > write_utf16 () { > if test_have_prereq NO_UTF16_BOM > then > - printf '\xfe\xff' > + printf '\376\377' > fi && > iconv -f UTF-8 -t UTF-16 > } which comes from: test_lazy_prereq NO_UTF16_BOM ' test $(printf abc | iconv -f UTF-8 -t UTF-16 | wc -c) = 6 ' Presumably on your system iconv is using musl's internal iconv, which behaves differently than the glibc one. So I think your patch is the right thing to do (hex escapes in shell printf are definitely not portable), but we might want to note something like: This wasn't caught by most people running the tests, even though common shells like dash don't handle hex escapes, because their systems don't trigger the NO_UTF16_BOM prereq. But systems with musl libc do; when combined with dash, the test fails. -Peff
On 2019-10-31 at 09:26:16, Doan Tran Cong Danh wrote: > diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh > index 7aa0945d8d..bfc4fb9af5 100755 > --- a/t/t0028-working-tree-encoding.sh > +++ b/t/t0028-working-tree-encoding.sh > @@ -17,7 +17,7 @@ test_lazy_prereq NO_UTF32_BOM ' > write_utf16 () { > if test_have_prereq NO_UTF16_BOM > then > - printf '\xfe\xff' > + printf '\376\377' > fi && > iconv -f UTF-8 -t UTF-16 > } > @@ -25,7 +25,7 @@ write_utf16 () { > write_utf32 () { > if test_have_prereq NO_UTF32_BOM > then > - printf '\x00\x00\xfe\xff' > + printf '\0\0\376\377' > fi && > iconv -f UTF-8 -t UTF-32 > } Yeah, this patch looks obviously correct. For some reason, I knew that printf(1) didn't accept hex sequences and yet I still wrote them. As Peff pointed out, this is because we only trigger this case on musl; musl is the only known implementation of iconv that produces big-endian data without a BOM for UTF-16 and UTF-32.
On 2019-10-31 13:41:18 -0400, Jeff King wrote: > So I think your patch is the right thing to do (hex escapes in shell > printf are definitely not portable), but we might want to note something > like: > > This wasn't caught by most people running the tests, even though > common shells like dash don't handle hex escapes, because their > systems don't trigger the NO_UTF16_BOM prereq. But systems with musl > libc do; when combined with dash, the test fails. Make sense. I will take this note in the reroll.
diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index 7aa0945d8d..bfc4fb9af5 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -17,7 +17,7 @@ test_lazy_prereq NO_UTF32_BOM ' write_utf16 () { if test_have_prereq NO_UTF16_BOM then - printf '\xfe\xff' + printf '\376\377' fi && iconv -f UTF-8 -t UTF-16 } @@ -25,7 +25,7 @@ write_utf16 () { write_utf32 () { if test_have_prereq NO_UTF32_BOM then - printf '\x00\x00\xfe\xff' + printf '\0\0\376\377' fi && iconv -f UTF-8 -t UTF-32 }
man 1p printf: In addition to the escape sequences shown in the Base Definitions volume of POSIX.1‐2008, Chapter 5, File Format Notation ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v'), "\ddd", where ddd is a one, two, or three-digit octal number, shall be written as a byte with the numeric value specified by the octal number. printf '\xfe\xff' in an extension of some libc. With dash: $ printf '\xfe\xff' | xxd 00000000: 5c78 6665 5c78 6666 \xfe\xff Correct its usage. Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com> --- Notes: Despite that dash's printf doesn't accept \x escape sequence. My glibc box (with sh linked to dash) can run the test just fine. But my musl box couldn't run the test, (because the header). t/t0028-working-tree-encoding.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)