From patchwork Tue Apr 30 18:25:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10924107 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F061A14DB for ; Tue, 30 Apr 2019 18:25:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E31E5289F1 for ; Tue, 30 Apr 2019 18:25:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D706928B12; Tue, 30 Apr 2019 18:25:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7C396289F1 for ; Tue, 30 Apr 2019 18:25:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727115AbfD3SZd (ORCPT ); Tue, 30 Apr 2019 14:25:33 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:34987 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726073AbfD3SZc (ORCPT ); Tue, 30 Apr 2019 14:25:32 -0400 Received: by mail-pg1-f194.google.com with SMTP id h1so7234256pgs.2 for ; Tue, 30 Apr 2019 11:25:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=IOg7K3U0YRtXmjZF9lIBSBumTpKCe56z9DZr7rHuGvM=; b=XKwRM6rDCw7nF7o2nj+6gT0/Z6kmqvR56uPL0zDyJE4rRLtsc/3duhu+TL77HvrnLQ p76qAIEkrRSSNqlSHHmBz9CKeY5YTt0JpEHcKC7e9aSTTY0cjdyZjoKa8+jOZm5kO26C nwkC3mFgGrGlHUIOXgfRBwo508Wb/Lxvd55zxQhMiwkvOSDB8CiCzQbz/ksTvD6hzhDK xhBDRXacKy/wZQrF41NTdd4wez6JP3P/UrXHbFgjd3ZWT+Uo/UDLI/4zHSneyzcSTEM7 MzjQD0I4+0o44rnVXVlQsQFGIYJ5vv3coT4iN0PZACE9irkh0OWS7mAskUFCc2na9fY3 Tbjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=IOg7K3U0YRtXmjZF9lIBSBumTpKCe56z9DZr7rHuGvM=; b=WFTGm+RPvYVRMuh7k4AnpcN18jSNlkokGaFpg6SIMgzBxpRuG/KJaZWdL5eOqEzvgV UXGmyDe9fNrX36KuYLIgkb1HdcU7Xw/m83kRUUVco2jIrul2c7AkVjJz77Xw1H7z6EIS 5dTGociVdgsKmrGlTe1a2Ufud1qjZmATlZt/FZjtzxKePebhG1d3PpdmKTMaZMdFa0cY Nq/LxNr+n9xYG2xp7vPMtEBFbsI3V0bVf7D9lh80MUoinedGeUcznm57XJV03TC/oCwk zdFQ7/ukK4VkkzPypCQlDjzNtw2dhChy72TgpvTP4gQFnP+Wov9lVcveouM76lcySS28 KSSw== X-Gm-Message-State: APjAAAX73lMLGDt4Cu8tdG8V9SYmD8MlvNNM+g274ZJDKJ7qMJlSi4Kj 7NXWAsU9KaMoaLZ+nIZTXxc= X-Google-Smtp-Source: APXvYqzXhOCkS7naQUPOCGqPaSjeagElo9NuB5jN7kVF+xIPIvO81lsVERhSmqbC8f6sq9g5rJAOtQ== X-Received: by 2002:a63:fd06:: with SMTP id d6mr68740201pgh.183.1556648731433; Tue, 30 Apr 2019 11:25:31 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id p2sm111217508pfi.73.2019.04.30.11.25.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 30 Apr 2019 11:25:30 -0700 (PDT) From: Elijah Newren To: gitster@pobox.com Cc: git@vger.kernel.org, Eric Sunshine , Elijah Newren Subject: [PATCH v2 1/5] t9350: fix encoding test to actually test reencoding Date: Tue, 30 Apr 2019 11:25:19 -0700 Message-Id: <20190430182523.3339-2-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.g44aacb1a0b In-Reply-To: <20190430182523.3339-1-newren@gmail.com> References: <20190430182523.3339-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This test used an author with non-ascii characters in the name, but no special commit message. It then grep'ed for those non-ascii characters, but those are guaranteed to exist regardless of the reencoding process since the reencoding only affects the commit message, not the author or committer names. As such, the test would work even if the re-encoding process simply stripped the commit message entirely. Modify the test to actually check that the reencoding in utf-8 worked. Signed-off-by: Elijah Newren --- t/t9350-fast-export.sh | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 5690fe2810..f55759651a 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -94,22 +94,21 @@ test_expect_success 'fast-export --show-original-ids | git fast-import' ' test $MUSS = $(git rev-parse --verify refs/tags/muss) ' -test_expect_success 'iso-8859-1' ' +test_expect_success 'iso-8859-7' ' - git config i18n.commitencoding ISO8859-1 && - # use author and committer name in ISO-8859-1 to match it. - . "$TEST_DIRECTORY"/t3901/8859-1.txt && + test_when_finished "git reset --hard HEAD~1" && + test_config i18n.commitencoding iso-8859-7 && test_tick && echo rosten >file && - git commit -s -m den file && - git fast-export wer^..wer >iso8859-1.fi && - sed "s/wer/i18n/" iso8859-1.fi | + git commit -s -m "$(printf "Pi: \360")" file && + git fast-export wer^..wer >iso-8859-7.fi && + sed "s/wer/i18n/" iso-8859-7.fi | (cd new && git fast-import && git cat-file commit i18n >actual && - grep "Áéí óú" actual) - + grep $(printf "\317\200") actual) ' + test_expect_success 'import/export-marks' ' git checkout -b marks master && @@ -224,7 +223,6 @@ GIT_COMMITTER_NAME='C O Mitter'; export GIT_COMMITTER_NAME test_expect_success 'setup copies' ' - git config --unset i18n.commitencoding && git checkout -b copy rein && git mv file file3 && git commit -m move1 && From patchwork Tue Apr 30 18:25:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10924109 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B60E14DB for ; Tue, 30 Apr 2019 18:25:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7E823288EC for ; Tue, 30 Apr 2019 18:25:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 72E6C28ABE; Tue, 30 Apr 2019 18:25:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0512D288EC for ; Tue, 30 Apr 2019 18:25:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727125AbfD3SZf (ORCPT ); Tue, 30 Apr 2019 14:25:35 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:44914 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727001AbfD3SZd (ORCPT ); Tue, 30 Apr 2019 14:25:33 -0400 Received: by mail-pl1-f194.google.com with SMTP id l2so4354261plt.11 for ; Tue, 30 Apr 2019 11:25:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VTyPCKFG8MqkVf/vx95OJkgVWPSF2hKNEmPUE3uRsE8=; b=lW/ZQ7C1DSlOCykr+BIUTg0hhXYPiIXMKqQobc0qvy5ogr1jX7I9Ntaww0AKBI0Uji m4z/ku3L17n/Xsgds/9gdICG4R/4NiOQ1Cr1SQfsFudPC6NLR6YEdZfp+ACDDfSzIUi4 HaDYZVEjrF+LcvIk0AZ2FNR4rGuG4xbkd0KS/ELa4zzVoVMymp/qa2UHqToB2wdN16GH y3ah6mmFVUnmGH26lRYHKkt5QUj4reEPWjGJ0yri1GI321IGoFDr6PdBV8Mi8Wcxlwsp UhO6mE2IHABVj33jFd8lDvV5teasKWMU5AA1/yGQ5Rz7SL7r7A1csOF7r8IrfLv8D2y5 BaLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VTyPCKFG8MqkVf/vx95OJkgVWPSF2hKNEmPUE3uRsE8=; b=TquA503Zi72w8EKcTXYBahLtJL2TPMv6ObwwMpgoo9IKnfZip48Y1fMtAcRcg10VK4 zYPGH7NUiQH/fnAenRteIMoqS7/ESo+R0ouqR+fR397qaEZ+qnrD7z376AElngC7//tn QhMTHWB7VhI77I0T4QYZQQUzOjWNWYWV1h0TcXCdZOI/3ha2+GkxoLJlwSJMRucDQV8p XBCZgh4hj3EDKdiLtqNGHpFHphKf1eroMHLw6mQQb4JRA0Qg2EK9plMtn5BjWB1YKeIA +/kH0GJWeo0QxthL0mqAKYSXkMML6wF/JGo4nBx1HBiRyb+Z3Cc20NLCyNlSIoJYyp40 kPIA== X-Gm-Message-State: APjAAAWZWpnR8mw9MJcY29ZhJIn2ZSSSaG/7UoQsrIcfIXBuLqgruCn4 4PUMRCO98et1ugYDYpZqQX8= X-Google-Smtp-Source: APXvYqxFyYkHe61vckgkHVEXdO7tNnRXQ3+6gkBa1tC99XvEDwxjE4LlG/M8zZUkDV37Qd0gD+IdXw== X-Received: by 2002:a17:902:4101:: with SMTP id e1mr73303003pld.25.1556648732621; Tue, 30 Apr 2019 11:25:32 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id p2sm111217508pfi.73.2019.04.30.11.25.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 30 Apr 2019 11:25:31 -0700 (PDT) From: Elijah Newren To: gitster@pobox.com Cc: git@vger.kernel.org, Eric Sunshine , Elijah Newren Subject: [PATCH v2 2/5] fast-import: support 'encoding' commit header Date: Tue, 30 Apr 2019 11:25:20 -0700 Message-Id: <20190430182523.3339-3-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.g44aacb1a0b In-Reply-To: <20190430182523.3339-1-newren@gmail.com> References: <20190430182523.3339-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since git supports commit messages with an encoding other than utf-8, allow fast-import to import such commits. This may be useful for folks who do not want to reencode commit messages from an external system, and may also be useful to achieve reversible history rewrites (e.g. sha1sum <-> sha256sum transitions or subtree work) with git repositories that have used specialized encodings in their commit history. Signed-off-by: Elijah Newren --- Documentation/git-fast-import.txt | 7 +++++++ fast-import.c | 11 +++++++++-- t/t9300-fast-import.sh | 20 ++++++++++++++++++++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index d65cdb3d08..7baf9e47b5 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -388,6 +388,7 @@ change to the project. original-oid? ('author' (SP )? SP LT GT SP LF)? 'committer' (SP )? SP LT GT SP LF + ('encoding' SP )? data ('from' SP LF)? ('merge' SP LF)? @@ -455,6 +456,12 @@ that was selected by the --date-format= command-line option. See ``Date Formats'' above for the set of supported formats, and their syntax. +`encoding` +^^^^^^^^^^ +The optional `encoding` command indicates the encoding of the commit +message. Most commits are UTF-8 and the encoding is omitted, but this +allows importing commit messages into git without first reencoding them. + `from` ^^^^^^ The `from` command is used to specify the commit to initialize diff --git a/fast-import.c b/fast-import.c index f38d04fa58..76a7bd3699 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2585,6 +2585,7 @@ static void parse_new_commit(const char *arg) struct branch *b; char *author = NULL; char *committer = NULL; + const char *encoding = NULL; struct hash_list *merge_list = NULL; unsigned int merge_count; unsigned char prev_fanout, new_fanout; @@ -2607,6 +2608,8 @@ static void parse_new_commit(const char *arg) } if (!committer) die("Expected committer but didn't get one"); + if (skip_prefix(command_buf.buf, "encoding ", &encoding)) + read_next_command(); parse_data(&msg, 0, NULL); read_next_command(); parse_from(b); @@ -2670,9 +2673,13 @@ static void parse_new_commit(const char *arg) } strbuf_addf(&new_data, "author %s\n" - "committer %s\n" - "\n", + "committer %s\n", author ? author : committer, committer); + if (encoding) + strbuf_addf(&new_data, + "encoding %s\n", + encoding); + strbuf_addch(&new_data, '\n'); strbuf_addbuf(&new_data, &msg); free(author); free(committer); diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 3668263c40..141b7fa35e 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3299,4 +3299,24 @@ test_expect_success !MINGW 'W: get-mark & empty orphan commit with erroneous thi sed -e s/LFs/LLL/ W-input | tr L "\n" | test_must_fail git fast-import ' +### +### series X (other new features) +### + +test_expect_success 'X: handling encoding' ' + test_tick && + cat >input <<-INPUT_END && + commit refs/heads/encoding + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE + encoding iso-8859-7 + data <>input && + + git fast-import X-Patchwork-Id: 10924113 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5144214DB for ; Tue, 30 Apr 2019 18:25:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3D4C6289F1 for ; Tue, 30 Apr 2019 18:25:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3162028B12; Tue, 30 Apr 2019 18:25:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C4A3A289F1 for ; Tue, 30 Apr 2019 18:25:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727152AbfD3SZh (ORCPT ); Tue, 30 Apr 2019 14:25:37 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:37277 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727116AbfD3SZe (ORCPT ); Tue, 30 Apr 2019 14:25:34 -0400 Received: by mail-pl1-f194.google.com with SMTP id z8so7126981pln.4 for ; Tue, 30 Apr 2019 11:25:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VnwFBj5ZRBwbwUborlMJop//jbijPEB1BcYp0OsKUfw=; b=EqjTq1wmBFPUhkqN743MtMUDqM16J3m86QRv6ty1IvFbkUX7OElEg+7N7dH4jUFbjg SmDG7rwoIWjydLapoBAzxMVdSwp8dZLfEQVAj629StqAASyrmAljCM3BM19bkuWkYAlV tZaG1x3vc0RV5JmehQvfSUFHK0VI7mg6VA4kkHINKtlU0mGVbqQUzv6Zlvq3OcatG2cC DBIBbEQvyb73f+6g/Xpemwkq/8P1Kh5zTEQdp8IcYgMtHY1A9uEC80kjprMqlGaMxico rEwWu5GMMPClVuE+6NOivyowouv1iAhgWRK5uKWyKRzqHvsYwFFuFZtoYLDSLJRUD5fh vYHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VnwFBj5ZRBwbwUborlMJop//jbijPEB1BcYp0OsKUfw=; b=nSuSy69jdWWxyvmgT6XHAA3Rtp4mlxYQZ3AbtfzgceeCfL6siwiRBTWV3y0lZ8nTQb CnJ4FV/cHIIHi0m3m2t+NPCaGDOS/zhaZa5ODRKdyJFlN65c1fNrsWt6ltyvHUpEiQwo sTk7qPE/Efi0+nHHv/g1lbhjm/bsvxoLZs12FQHBlIJSOWl0wrqb4UHElharDfGseyGK 24vdb2dyTzdyfTx+PcWioN1saZmiEKjwLb4+duxbUAfjaNJ2vPy2HuPNmXBBkTHbGMFA kyVS3QNw8MRkZEBuZzvR6LjAybrpRA/2zGKTP20tk6+0ZXk0Vdcz/F1HyUBJryNcF39h CDAQ== X-Gm-Message-State: APjAAAUYnsqrRHoD8o5kkR16PVh3By7C6Hd+5WHG6+dmkUs4v6wNdmj7 KfAqLnYyCYl3zrWzwQrLWRE= X-Google-Smtp-Source: APXvYqx1AqrTDxmVBvNPIaxX6BZ+WiTTKLzgIndApWlpSuFoCoDjtf32GQdLWOwuTk7IxSu3w7DwEw== X-Received: by 2002:a17:902:5a42:: with SMTP id f2mr10693004plm.135.1556648733689; Tue, 30 Apr 2019 11:25:33 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id p2sm111217508pfi.73.2019.04.30.11.25.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 30 Apr 2019 11:25:33 -0700 (PDT) From: Elijah Newren To: gitster@pobox.com Cc: git@vger.kernel.org, Eric Sunshine , Elijah Newren Subject: [PATCH v2 3/5] fast-export: avoid stripping encoding header if we cannot reencode Date: Tue, 30 Apr 2019 11:25:21 -0700 Message-Id: <20190430182523.3339-4-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.g44aacb1a0b In-Reply-To: <20190430182523.3339-1-newren@gmail.com> References: <20190430182523.3339-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When fast-export encounters a commit with an 'encoding' header, it tries to reencode in utf-8 and then drops the encoding header. However, if it fails to reencode in utf-8 because e.g. one of the characters in the commit message was invalid in the old encoding, then we need to retain the original encoding or otherwise we lose information needed to understand all the other (valid) characters in the original commit message. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 7 +++++-- t/t9350-fast-export.sh | 14 ++++++++++++++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 9e283482ef..7734a9f5a5 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -642,9 +642,12 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum); if (show_original_ids) printf("original-oid %s\n", oid_to_hex(&commit->object.oid)); - printf("%.*s\n%.*s\ndata %u\n%s", + printf("%.*s\n%.*s\n", (int)(author_end - author), author, - (int)(committer_end - committer), committer, + (int)(committer_end - committer), committer); + if (!reencoded && encoding) + printf("encoding %s\n", encoding); + printf("data %u\n%s", (unsigned)(reencoded ? strlen(reencoded) : message ? strlen(message) : 0), diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index f55759651a..67dd7ac7f4 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -109,6 +109,20 @@ test_expect_success 'iso-8859-7' ' grep $(printf "\317\200") actual) ' +test_expect_success 'encoding preserved if reencoding fails' ' + + test_when_finished "git reset --hard HEAD~1" && + test_config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -m "$(printf "Pi: \360; Invalid: \377")" file && + git fast-export wer^..wer >iso-8859-7.fi && + sed "s/wer/i18n-invalid/" iso-8859-7.fi | + (cd new && + git fast-import && + git cat-file commit i18n-invalid >actual && + grep ^encoding actual) +' + test_expect_success 'import/export-marks' ' git checkout -b marks master && From patchwork Tue Apr 30 18:25:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10924111 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E3131395 for ; Tue, 30 Apr 2019 18:25:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 80F12288EC for ; Tue, 30 Apr 2019 18:25:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 750F728ABE; Tue, 30 Apr 2019 18:25:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 21110288EC for ; Tue, 30 Apr 2019 18:25:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727139AbfD3SZg (ORCPT ); Tue, 30 Apr 2019 14:25:36 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:41376 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727127AbfD3SZf (ORCPT ); Tue, 30 Apr 2019 14:25:35 -0400 Received: by mail-pg1-f193.google.com with SMTP id f6so7220939pgs.8 for ; Tue, 30 Apr 2019 11:25:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1tVOoScEQ6gBy1+cOusF/o3XZxMgCvYnqy4sXoGqZXY=; b=lJC//kujoccB+ouSrXIx3WGtnMiPbRTmoPNjdXaOhtIx9IxjJf5SfFeGDcdVPBy+Qf 9dgIPiNub8Aa5GUqprCLljtco+WoVncipeugp8S8MiOmFOVqWAC85jK4XnGVfMVTyVkb UVYyHPHZjBsdhmW4tiKcMe0MXFX7dGonvAen2/AmUl8qPrO6+GnU19MyBtHE1wYhxWWZ VuNATn44A7gDmvbphBQ3hr+aeKqw2kZjjEpPBxUAF/e9RCkbxzATQuQNtLdz0jrZVsnl b95h2N67NjOKIrKFxuHfLxUFx8p5u95ll4SHcAU8jaW4IC+9otb8pe8X2BJ8KQKfCh0j WbvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1tVOoScEQ6gBy1+cOusF/o3XZxMgCvYnqy4sXoGqZXY=; b=BZvF5BxqG4BcUZ0lvjBuKf5TsbOY5Uddw1F7ri0WGvRaqABfiilxLRmF96qVhsTV4C az7nAYroFLPRhFlMC6ZHjSWqd5RGjj8HhlDHBIIbM+xXh5KixKubvuCrO6ZyYAPCrUOv 6kcWW34mu2UkHIDq5pf3YU5OTiNoam4fEU7fLEf7Hlifm18Gi/d024fYrzffxq4ChiLk Tw+aE6pHpCFV/fwMgtzwAE6cDjbIEQjJ2F+U9RPG6hCcTuVSGnMQlPSWTvdRpYpY8MDA ljnQ2JO3BMOaFsdJAzp2192BE3SiUdD0WgEiQNknU9uQbqm6Or1ORA5cPamYoWO84dK/ czgg== X-Gm-Message-State: APjAAAXn6WRCNdA80sVXwD2Nqzq9GYUfvMbFJvDIJWQLH583soKyBH9u EhfJfwI8gpG+ezoGfwYOw3k= X-Google-Smtp-Source: APXvYqz1NhsleuZjFq6A3rGGEYVrmRuj7mCyKYpzLAnGKxBF+L3GX86FwV958jSghItzZF//Qig8Vg== X-Received: by 2002:a62:5f84:: with SMTP id t126mr72800229pfb.185.1556648734756; Tue, 30 Apr 2019 11:25:34 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id p2sm111217508pfi.73.2019.04.30.11.25.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 30 Apr 2019 11:25:34 -0700 (PDT) From: Elijah Newren To: gitster@pobox.com Cc: git@vger.kernel.org, Eric Sunshine , Elijah Newren Subject: [PATCH v2 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Date: Tue, 30 Apr 2019 11:25:22 -0700 Message-Id: <20190430182523.3339-5-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.g44aacb1a0b In-Reply-To: <20190430182523.3339-1-newren@gmail.com> References: <20190430182523.3339-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The find_encoding() function returned the encoding used by a commit message, returning a default of git_commit_encoding (usually utf-8). Although the current code does not differentiate between a commit which explicitly requested utf-8 and one where we just assume utf-8 because no encoding is set, it will become important when we try to preserve the encoding header. Since is_encoding_utf8() returns true when passed NULL, we can just return NULL from find_encoding() instead of returning git_commit_encoding. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 7734a9f5a5..66331fa401 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -453,7 +453,7 @@ static const char *find_encoding(const char *begin, const char *end) bol = memmem(begin, end ? end - begin : strlen(begin), needle, strlen(needle)); if (!bol) - return git_commit_encoding; + return NULL; bol += strlen(needle); eol = strchrnul(bol, '\n'); *eol = '\0'; From patchwork Tue Apr 30 18:25:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10924115 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5BA3F1395 for ; Tue, 30 Apr 2019 18:25:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4DB28288EC for ; Tue, 30 Apr 2019 18:25:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 41CEA28ABE; Tue, 30 Apr 2019 18:25:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ACF1A288EC for ; Tue, 30 Apr 2019 18:25:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727116AbfD3SZk (ORCPT ); Tue, 30 Apr 2019 14:25:40 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:41377 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727001AbfD3SZg (ORCPT ); Tue, 30 Apr 2019 14:25:36 -0400 Received: by mail-pg1-f193.google.com with SMTP id f6so7220963pgs.8 for ; Tue, 30 Apr 2019 11:25:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dMp5MIhDBg7AJQkYnjKDso8E6+iFBTb7OFqk6GK4+Mg=; b=LiV06WMCiCzl4MHChv1FAyb/zhjnje19XBHH5llsLj15eirkujnNEJNAgOuqc+D4S2 2Asp7KcnHO0aLRDrUtDsChEUr/Qfik5GS+V8HG1ELz7nOUc2EB1QKnk9/liTtKhymj5a ZA6VzRKfBbFEHiqjs/JsVPni2xZ9fm7FVgtvKNTsLP6/J+0B3+5HqW0VUr0eBI+/MTLG IbuqU3Xkl3dJAt5HgQMtOlUud91McFrfiNqIus/CkPOI9N/l2wJDhI5Gv76zamKV35j7 AfOuPMJUhX4oohvIH2Tsg/3UuIoUJrydymEFgpklqS3mr8+9siqEEK15R7OGTotveG+O EHHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dMp5MIhDBg7AJQkYnjKDso8E6+iFBTb7OFqk6GK4+Mg=; b=ie3xz0mdfaDrHYqS38O0is8egax3/Gm8/Ncp+CS6z0ddDLzM08uE9du0R4CjWgsZT3 UenTo81w8pxrF1I6+Kdk/bomXCahEoJAwt9k/3+9rYt3i0OeILtV9ktIxLMtROCtrR30 lIJ4wOKh4aZvZcxuhjlRetCiGG2ne1FJyliui/woOKs43rciRYMjkQMMvfuXP5RiAcmg qA8FpnYTdwEWIFjPEyoEF2xgELiujB7XMBqoWF2T2pbVM1ZAfEMMtV4oZHI2vRktnTfu p5jMt1FZONa49iCCDmoh5UW5hSjmmeAWWHfpy2lufqYNxhRPNz/sbVcG2RK18UaHVTHI QrIA== X-Gm-Message-State: APjAAAUwiTo/qsexJKbfbI17L/GqFZxzyAAumFSVvEaGUbhSrAG6/CJs cvABHHGQADJPOaS50PglAQ8= X-Google-Smtp-Source: APXvYqxFGMUOfXrTsip/FsaU15uyC5u7NvGPtVrvyrvtonuoOrTEiEqU4m5L6D7qifn/Csvm2lz3nA== X-Received: by 2002:a63:6cc4:: with SMTP id h187mr55438685pgc.437.1556648735733; Tue, 30 Apr 2019 11:25:35 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id p2sm111217508pfi.73.2019.04.30.11.25.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 30 Apr 2019 11:25:35 -0700 (PDT) From: Elijah Newren To: gitster@pobox.com Cc: git@vger.kernel.org, Eric Sunshine , Elijah Newren Subject: [PATCH v2 5/5] fast-export: do automatic reencoding of commit messages only if requested Date: Tue, 30 Apr 2019 11:25:23 -0700 Message-Id: <20190430182523.3339-6-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.g44aacb1a0b In-Reply-To: <20190430182523.3339-1-newren@gmail.com> References: <20190430182523.3339-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Automatic re-encoding of commit messages (and dropping of the encoding header) hurts attempts to do reversible history rewrites (e.g. sha1sum <-> sha256sum transitions, some subtree rewrites), and seems inconsistent with the general principle followed elsewhere in fast-export of requiring explicit user requests to modify the output (e.g. --signed-tags=strip, --tag-of-filtered-object=rewrite). Add a --reencode flag that the user can use to specify, and like other fast-export flags, default it to 'abort'. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 35 ++++++++++++++++++++++++++++++++--- t/t9350-fast-export.sh | 29 ++++++++++++++++++++++++++--- 2 files changed, 58 insertions(+), 6 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 66331fa401..43cc52331c 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -33,6 +33,7 @@ static const char *fast_export_usage[] = { static int progress; static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT; static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = TAG_FILTERING_ABORT; +static enum { REENCODE_ABORT, REENCODE_PLEASE, REENCODE_NEVER } reencode_mode = REENCODE_ABORT; static int fake_missing_tagger; static int use_done_feature; static int no_data; @@ -77,6 +78,20 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt, return 0; } +static int parse_opt_reencode_mode(const struct option *opt, + const char *arg, int unset) +{ + if (unset || !strcmp(arg, "abort")) + reencode_mode = REENCODE_ABORT; + else if (!strcmp(arg, "yes")) + reencode_mode = REENCODE_PLEASE; + else if (!strcmp(arg, "no")) + reencode_mode = REENCODE_NEVER; + else + return error("Unknown reencoding mode: %s", arg); + return 0; +} + static struct decoration idnums; static uint32_t last_idnum; @@ -633,10 +648,21 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, } mark_next_object(&commit->object); - if (anonymize) + if (anonymize) { reencoded = anonymize_commit_message(message); - else if (!is_encoding_utf8(encoding)) - reencoded = reencode_string(message, "UTF-8", encoding); + } else if (encoding) { + switch(reencode_mode) { + case REENCODE_PLEASE: + reencoded = reencode_string(message, "UTF-8", encoding); + break; + case REENCODE_NEVER: + break; + case REENCODE_ABORT: + die("Encountered commit-specific encoding %s in commit " + "%s; use --reencode= to handle it", + encoding, oid_to_hex(&commit->object.oid)); + } + } if (!commit->parents) printf("reset %s\n", refname); printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum); @@ -1091,6 +1117,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) OPT_CALLBACK(0, "tag-of-filtered-object", &tag_of_filtered_mode, N_("mode"), N_("select handling of tags that tag filtered objects"), parse_opt_tag_of_filtered_mode), + OPT_CALLBACK(0, "reencode", &reencode_mode, N_("mode"), + N_("select handling of commit messages in an alternate encoding"), + parse_opt_reencode_mode), OPT_STRING(0, "export-marks", &export_filename, N_("file"), N_("Dump marks to this file")), OPT_STRING(0, "import-marks", &import_filename, N_("file"), diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 67dd7ac7f4..92cfeb6cfc 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -94,14 +94,14 @@ test_expect_success 'fast-export --show-original-ids | git fast-import' ' test $MUSS = $(git rev-parse --verify refs/tags/muss) ' -test_expect_success 'iso-8859-7' ' +test_expect_success 'reencoding iso-8859-7' ' test_when_finished "git reset --hard HEAD~1" && test_config i18n.commitencoding iso-8859-7 && test_tick && echo rosten >file && git commit -s -m "$(printf "Pi: \360")" file && - git fast-export wer^..wer >iso-8859-7.fi && + git fast-export --reencode=yes wer^..wer >iso-8859-7.fi && sed "s/wer/i18n/" iso-8859-7.fi | (cd new && git fast-import && @@ -109,13 +109,36 @@ test_expect_success 'iso-8859-7' ' grep $(printf "\317\200") actual) ' +test_expect_success 'aborting on iso-8859-7' ' + + test_when_finished "git reset --hard HEAD~1" && + test_config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -m "$(printf "Pi: \360")" file && + test_must_fail git fast-export --reencode=abort wer^..wer >iso-8859-7.fi +' + +test_expect_success 'preserving iso-8859-7' ' + + test_when_finished "git reset --hard HEAD~1" && + test_config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -m "$(printf "Pi: \360")" file && + git fast-export --reencode=no wer^..wer >iso-8859-7.fi && + sed "s/wer/i18n-no-recoding/" iso-8859-7.fi | + (cd new && + git fast-import && + git cat-file commit i18n-no-recoding >actual && + grep $(printf "\360") actual) +' + test_expect_success 'encoding preserved if reencoding fails' ' test_when_finished "git reset --hard HEAD~1" && test_config i18n.commitencoding iso-8859-7 && echo rosten >file && git commit -s -m "$(printf "Pi: \360; Invalid: \377")" file && - git fast-export wer^..wer >iso-8859-7.fi && + git fast-export --reencode=yes wer^..wer >iso-8859-7.fi && sed "s/wer/i18n-invalid/" iso-8859-7.fi | (cd new && git fast-import &&