From patchwork Thu Oct 31 09:26:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?xJBvw6BuIFRy4bqnbiBDw7RuZyBEYW5o?= X-Patchwork-Id: 11220801 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 669C9913 for ; Thu, 31 Oct 2019 09:26:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 411CD2086D for ; Thu, 31 Oct 2019 09:26:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="rUtZpeTS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727034AbfJaJ0n (ORCPT ); Thu, 31 Oct 2019 05:26:43 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:33832 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726864AbfJaJ0n (ORCPT ); Thu, 31 Oct 2019 05:26:43 -0400 Received: by mail-pg1-f193.google.com with SMTP id e4so3683351pgs.1 for ; Thu, 31 Oct 2019 02:26:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+4v/XoFPHbrG1euWwc5jqbu7fwyrBikCvEY0B3kMnpw=; b=rUtZpeTSU1TP4+xuTxHWyZGbGYLLP3I3AC5Fp6p28STvfIQDQuRsIncNXjkZeApOlg cpa1bLP548ctvXIG7eWEcz8rMwiFHv2+WOfxEixJ9eYAThIOLIpJ1Fn5Mp5TbzHxsYXT Lfb7Lc7UsPSHR36xExHRmj222qsrMDrhz0u2/pbuNmcXQl9PMDJf8ixcK8K/f5gBlWRh y9pDIegW0z/cp/bgTAH7zbf6mrnH2Bx/k4peyyWNKorOnUc+NwJU+BneYsTxyz31MaHC 4N9tZ6zPWbGIn0xG8WWIMyHoc2RpHKfpS/3UZIUBva5IwLuuePJ7r20K/KNHimaKEDQb e/WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+4v/XoFPHbrG1euWwc5jqbu7fwyrBikCvEY0B3kMnpw=; b=nmx3MN4S9+2n53QC5Jr2SI2g8bCAYn/1q2LpcfQGzg/5DJcW/Od/CDYQ3Y85rlX6MZ ZD3P5CPhhPREEyA6QGH3UyrADXYmNFopgjPk3EIff1xLSMlXpcjVXyefCRyLFD0064iS 2WWZfx2SDoSttpUDJ6idCxZcF0tL+j04w6z2OBplUWurAFji/XOev9cUeOhYm6WCR2bw 0Pq3rYVpeACTvGuCbQa46K713ruXYiUjr1ySYmPGvM6mGQEinPZ9rNGzNpmKjUvLJ04I 2Cp7EfxHNm6Wq+N6BvOwIJt2Jnw5J4p7dhlVr0JjDgVlXL/iPmIKYvNERuTk5/74907f 8wFQ== X-Gm-Message-State: APjAAAW40UBEHBBhzsmuFjWzdVaDDmhNRRcU+70ERzJOCHaH6XkZhOu4 8zMqWJS6vweYAokiBG1/3P7eS+d1 X-Google-Smtp-Source: APXvYqxWYHdmEr5g2w6DXVuSF8UwlXKwgWHWcRI53lL2V/Md9S5KZUqa1PV1fb2LZMOV5t4TD3fi2Q== X-Received: by 2002:aa7:9a94:: with SMTP id w20mr5209396pfi.256.1572514001699; Thu, 31 Oct 2019 02:26:41 -0700 (PDT) Received: from localhost.localdomain ([2402:800:6374:2d45:2809:9830:be60:8e46]) by smtp.gmail.com with ESMTPSA id y24sm3570558pfr.116.2019.10.31.02.26.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Oct 2019 02:26:41 -0700 (PDT) From: Doan Tran Cong Danh To: git@vger.kernel.org Cc: Doan Tran Cong Danh Subject: [PATCH 1/3] t0028: eliminate non-standard usage of printf Date: Thu, 31 Oct 2019 16:26:16 +0700 Message-Id: <20191031092618.29073-2-congdanhqx@gmail.com> X-Mailer: git-send-email 2.24.0.rc1.3.gc8da3990e5 In-Reply-To: <20191031092618.29073-1-congdanhqx@gmail.com> References: <20191031092618.29073-1-congdanhqx@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org man 1p printf: In addition to the escape sequences shown in the Base Definitions volume of POSIX.1‐2008, Chapter 5, File Format Notation ('\\', '\a', '\b', '\f', '\n', '\r', '\t', '\v'), "\ddd", where ddd is a one, two, or three-digit octal number, shall be written as a byte with the numeric value specified by the octal number. printf '\xfe\xff' in an extension of some libc. With dash: $ printf '\xfe\xff' | xxd 00000000: 5c78 6665 5c78 6666 \xfe\xff Correct its usage. Signed-off-by: Doan Tran Cong Danh --- Notes: Despite that dash's printf doesn't accept \x escape sequence. My glibc box (with sh linked to dash) can run the test just fine. But my musl box couldn't run the test, (because the header). t/t0028-working-tree-encoding.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index 7aa0945d8d..bfc4fb9af5 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -17,7 +17,7 @@ test_lazy_prereq NO_UTF32_BOM ' write_utf16 () { if test_have_prereq NO_UTF16_BOM then - printf '\xfe\xff' + printf '\376\377' fi && iconv -f UTF-8 -t UTF-16 } @@ -25,7 +25,7 @@ write_utf16 () { write_utf32 () { if test_have_prereq NO_UTF32_BOM then - printf '\x00\x00\xfe\xff' + printf '\0\0\376\377' fi && iconv -f UTF-8 -t UTF-32 } From patchwork Thu Oct 31 09:26:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?xJBvw6BuIFRy4bqnbiBDw7RuZyBEYW5o?= X-Patchwork-Id: 11220803 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 058F9913 for ; Thu, 31 Oct 2019 09:26:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D41C820663 for ; Thu, 31 Oct 2019 09:26:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="V18uLr2H" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727043AbfJaJ0q (ORCPT ); Thu, 31 Oct 2019 05:26:46 -0400 Received: from mail-pg1-f175.google.com ([209.85.215.175]:39990 "EHLO mail-pg1-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726864AbfJaJ0p (ORCPT ); Thu, 31 Oct 2019 05:26:45 -0400 Received: by mail-pg1-f175.google.com with SMTP id 15so3665295pgt.7 for ; Thu, 31 Oct 2019 02:26:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0PxOV363m1vNlwQ/gyxMAES4j8obbNydbPqnv3X08cI=; b=V18uLr2H63zmBH8kQx3M+TBYb2pcQUqpusHoDWZKUNU9GA9Cs8ZEAfwGH6vMApZiep I7ptk3oAYOdSR5/euqmC7EYAwSMDk3pL6Ec5fyFwZPkhCG3oZiYgl1DiaE7OTsbgDOBa NtAvWz3Zd57nsIDnzgfcXu55eEfsYJjUuDUydUX5Il4CdLBXIzBQbrvigbB458RclHvo AyhxyPyz53SxFKrI3OTgwPbiePELVxi8pbLFAegZ7flcInEa3PKf69CpahGtPnr+MjVL lef6jZgUqeU3li/zxK5Mb6ZL/SyHMmrSZrudk2ZV92wf3BhyrH39+ENIiOeK//VJjbFF Xymg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0PxOV363m1vNlwQ/gyxMAES4j8obbNydbPqnv3X08cI=; b=nu5r/gDImZPMQTpWbjnxrOp5xTegC6onx34//+MehgOX6LN6ZEU/khFG3tlH5Tr0Rh DZo5zlGVqfSBX8iTlkeqDcbR76xG22d/W1zAHUE3VY32lx/Yk/cnNjLpxG7qfO6fuj/b hn0A/EpG9kRr/BVoR/bdsqd85W6DXEVfYvzqZ7j0RhCxPkD93SIt+Y9TSAqobbk3QpKt 0BuGBkZT2ZYQx41U5s9SaWtcBUylrq5+X+l39NcmXZV187Oi0armdwWBXSxXuo+qPgT5 7W7CGd5/yR+VxuvAfKkh3qqlwW84lb1utzFWsQjJdKcljA9ClCgxxeV0aDZRzirunPdv Drcg== X-Gm-Message-State: APjAAAXdNPRq/3cuij941nBmuul9NTWKWoD2ZFs4cxHWUDCNdNSsib8N Rl0COeteuzVnxhfvUxYBZFou5IPe X-Google-Smtp-Source: APXvYqxePoPyQ4SVj4+lCoWJT5k/oPFSbas7hxT3Yhyni9aELoLoIYBqJcR4tMs0nNrLYnxFT3L0uw== X-Received: by 2002:a17:90a:7188:: with SMTP id i8mr5874884pjk.54.1572514004796; Thu, 31 Oct 2019 02:26:44 -0700 (PDT) Received: from localhost.localdomain ([2402:800:6374:2d45:2809:9830:be60:8e46]) by smtp.gmail.com with ESMTPSA id y24sm3570558pfr.116.2019.10.31.02.26.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Oct 2019 02:26:44 -0700 (PDT) From: Doan Tran Cong Danh To: git@vger.kernel.org Cc: Doan Tran Cong Danh Subject: [PATCH 2/3] configure.ac: define ICONV_OMITS_BOM if necessary Date: Thu, 31 Oct 2019 16:26:17 +0700 Message-Id: <20191031092618.29073-3-congdanhqx@gmail.com> X-Mailer: git-send-email 2.24.0.rc1.3.gc8da3990e5 In-Reply-To: <20191031092618.29073-1-congdanhqx@gmail.com> References: <20191031092618.29073-1-congdanhqx@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From commit 79444c9294, ("utf8: handle systems that don't write BOM for UTF-16", 2019-02-12), we're supporting those systems with iconv that omits BOM with: make ICONV_OMITS_BOM=Yes However, typing the flag all the time is cumbersome and error-prone. Add a checking into configure script to detect this flag automatically. Signed-off-by: Doan Tran Cong Danh --- Notes: We deliberately fail for ac_cv_iconv_omits_bom on cross-compiling, in order to ask builder provide the value for the target. We're relied on this technik for ac_cv_fread_reads_directories and ac_cv_snprintf_returns_bogus. Adding one more failure for configuring on cross-compiling is not going to be a burden for distro. configure.ac | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/configure.ac b/configure.ac index a43b476402..790b53bbdc 100644 --- a/configure.ac +++ b/configure.ac @@ -690,6 +690,28 @@ fi fi +# +# Define ICONV_OMITS_BOM if you are on a system which +# iconv omits bom for utf-{16,32} +if test -z "$NO_ICONV"; then +AC_CACHE_CHECK([whether iconv omits bom for utf-16 and utf-32], + [ac_cv_iconv_omits_bom], +[ +if test "x$cross_compiling" = xyes; then + AC_MSG_FAILURE([please provide ac_cv_iconv_omits_bom]) +elif test `printf a | iconv -f utf-8 -t utf-16 | wc -c` = 2; then + ac_cv_iconv_omits_bom=yes +else + ac_cv_iconv_omits_bom=no +fi +]) +if test "x$ac_cv_iconv_omits_bom" = xyes; then + ICONV_OMITS_BOM=Yes +else + ICONV_OMITS_BOM= +fi +GIT_CONF_SUBST([ICONV_OMITS_BOM]) +fi # # Define NO_DEFLATE_BOUND if deflateBound is missing from zlib. From patchwork Thu Oct 31 09:26:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?xJBvw6BuIFRy4bqnbiBDw7RuZyBEYW5o?= X-Patchwork-Id: 11220805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1714D112B for ; Thu, 31 Oct 2019 09:26:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E92B420663 for ; Thu, 31 Oct 2019 09:26:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="pgwDz/ha" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727109AbfJaJ0u (ORCPT ); Thu, 31 Oct 2019 05:26:50 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:41708 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726864AbfJaJ0t (ORCPT ); Thu, 31 Oct 2019 05:26:49 -0400 Received: by mail-pg1-f196.google.com with SMTP id l3so3660998pgr.8 for ; Thu, 31 Oct 2019 02:26:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9de70hLkesdnysxA/4QZ+v4DWWxjEbE4xMW+R8BEhw0=; b=pgwDz/hagKRNffOKUFDmKUa25Hp8Gn5XjBeK7kYA656+1vrcJ1AGap2iZ6TDo170GZ 3S7ihUCM+/XjTODSm7AHhx/1bvwft7kLIZrxDBH6iq4d0l483eFOj3/fWVlaIyvs7EcU 1LeK7+bUcNcZZfjq91xjHQmBEbwCiyZFEkFfy/IpUAX6HXW/SQAqUIzlsKDLytY3BUsr OXRsYprNCx2b++w2g2u/hWh00MK/xcZlxKxdhzKEv9ia4JbU4F5SSXWEwt53olbgALPC uBvZ+L9RrJcI0ySC+guYzyg4K/RtjUyx25oInwjS0HJ/TbjOMHp5jk00RgddJx3DFd04 SM9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9de70hLkesdnysxA/4QZ+v4DWWxjEbE4xMW+R8BEhw0=; b=fk8PRzkBNe5+6tQykmGXm7qdL1UP0WE7RAuvcKRI6vyscaPDTnek848fVr/ao7idVA cWmtM/i6dfckwFdXnqgq0WLKzcwouM8DboBA/vTwtajTeVUnKb2j/Jhw6rSmdH2B1kJH jvP8SklJ+57YC1mjDo2nCZpgJh7rk/l0VL3ndXWriwFDo88JKyOQ4lXasnPGwiB1ZnF5 MjgvJQNp2WvnPIaX/Fa95JMDne2Ic2npAm/+RFgT6xgwb4Pwbcft/wLKYNByCml0wMjX SOpD6Lxd71qpwISrT7AFkFS0vuEE8M6M4honFDdDZaMIXnr3UMbw1eyBfhEnS3Z1JOnw 0Ddw== X-Gm-Message-State: APjAAAVweindDUFHWY4c1PfqS3Gi0q/E4+xvsuuDmDbb2cPayXs3bViH /D0DpGbLqs/u5HcxO8KpZmJd9rzw X-Google-Smtp-Source: APXvYqyoQGbnq/+agPVqQuGzbNKd8Adqy4WMChm05BSU2bAkWP/1rLF5ijjo4Z7468XE57beyIgNKw== X-Received: by 2002:a63:a452:: with SMTP id c18mr5324347pgp.188.1572514008719; Thu, 31 Oct 2019 02:26:48 -0700 (PDT) Received: from localhost.localdomain ([2402:800:6374:2d45:2809:9830:be60:8e46]) by smtp.gmail.com with ESMTPSA id y24sm3570558pfr.116.2019.10.31.02.26.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Oct 2019 02:26:48 -0700 (PDT) From: Doan Tran Cong Danh To: git@vger.kernel.org Cc: Doan Tran Cong Danh Subject: [PATCH 3/3] sequencer: reencode to utf-8 before arrange rebase's todo list Date: Thu, 31 Oct 2019 16:26:18 +0700 Message-Id: <20191031092618.29073-4-congdanhqx@gmail.com> X-Mailer: git-send-email 2.24.0.rc1.3.gc8da3990e5 In-Reply-To: <20191031092618.29073-1-congdanhqx@gmail.com> References: <20191031092618.29073-1-congdanhqx@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On musl libc, ISO-2022-JP encoder is too eager to switch back to 1 byte encoding, musl's iconv always switch back after every combining character. Comparing glibc and musl's output for this command $ sed q t/t3900/ISO-2022-JP.txt| iconv -f ISO-2022-JP -t utf-8 | iconv -f utf-8 -t ISO-2022-JP | xxd glibc: 00000000: 1b24 4224 4f24 6c24 5224 5b24 551b 2842 .$B$O$l$R$[$U.(B 00000010: 0a . musl: 00000000: 1b24 4224 4f1b 2842 1b24 4224 6c1b 2842 .$B$O.(B.$B$l.(B 00000010: 1b24 4224 521b 2842 1b24 4224 5b1b 2842 .$B$R.(B.$B$[.(B 00000020: 1b24 4224 551b 2842 0a .$B$U.(B. Although musl iconv's output isn't optimal, it's still correct. From commit 7d509878b8, ("pretty.c: format string with truncate respects logOutputEncoding", 2014-05-21), we're encoding the message to utf-8 first, then format it and convert the message to the actual output encoding on git commit --squash. Thus, t3900 is failing on Linux with musl libc. Reencode to utf-8 before arranging rebase's todo list. Signed-off-by: Doan Tran Cong Danh --- Notes: The todo list shown to user has already been reencoded by sequencer_make_script, without this patch it looks like this: $ head -3 .git/rebase-merge/git-rebase-todo | xxd 00000000: 7069 636b 2065 6633 3961 3033 201b 2442 pick ef39a03 .$B 00000010: 244f 1b28 421b 2442 246c 1b28 421b 2442 $O.(B.$B$l.(B.$B 00000020: 2452 1b28 421b 2442 245b 1b28 421b 2442 $R.(B.$B$[.(B.$B 00000030: 2455 1b28 420a 7069 636b 2062 3832 3931 $U.(B.pick b8291 00000040: 3336 2073 7175 6173 6821 201b 2442 244f 36 squash! .$B$O 00000050: 1b28 421b 2442 246c 1b28 421b 2442 2452 .(B.$B$l.(B.$B$R 00000060: 1b28 421b 2442 245b 1b28 421b 2442 2455 .(B.$B$[.(B.$B$U 00000070: 1b28 420a 7069 636b 2062 3532 3132 6437 .(B.pick b5212d7 00000080: 2069 6e74 6572 6d65 6469 6174 6520 636f intermediate co 00000090: 6d6d 6974 0a mmit. sequencer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sequencer.c b/sequencer.c index 9d5964fd81..69430fe23f 100644 --- a/sequencer.c +++ b/sequencer.c @@ -5169,7 +5169,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list) *commit_todo_item_at(&commit_todo, item->commit) = item; parse_commit(item->commit); - commit_buffer = get_commit_buffer(item->commit, NULL); + commit_buffer = logmsg_reencode(item->commit, NULL, "UTF-8"); find_commit_subject(commit_buffer, &subject); format_subject(&buf, subject, " "); subject = subjects[i] = strbuf_detach(&buf, &subject_len);