From patchwork Thu Apr 25 15:51:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10917399 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 68C7C912 for ; Thu, 25 Apr 2019 15:51:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B24C28D0E for ; Thu, 25 Apr 2019 15:51:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4D8A628D0D; Thu, 25 Apr 2019 15:51:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 311DF28D0D for ; Thu, 25 Apr 2019 15:51:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728635AbfDYPv3 (ORCPT ); Thu, 25 Apr 2019 11:51:29 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:36621 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726279AbfDYPv2 (ORCPT ); Thu, 25 Apr 2019 11:51:28 -0400 Received: by mail-pg1-f195.google.com with SMTP id 85so61266pgc.3 for ; Thu, 25 Apr 2019 08:51:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9xXKRepJQ3cq0XLqXzeTfQ4xDiSV03kB7Y95atqgvqg=; b=P6v1CyAg929rAjoq9Fw/BQtal1CiI8wDHEgwAjKBHKAexvKYMx/Ajrrp2kGv4hvBMb sM6+akqtUKQBr4qNvIeKE4ihCoRWyig8HBy8hk8JGQu5eOUYfJPcy1mMHHDWcmA0ovgs 0u7mMi8URZEOKYG4tNyqnw586FeY9Od64TpL592NXb0dHwF/VdFSXKj/dM1BYvaknkLA Rd3o6qFqUqQL0U5nl9LUKTiBB6phO8jHsyPqlBIlLj9NyQvuFKOdmTa0Z7BZuMa+fmp7 OYyimjiX+WJYNPvwYYq2M8aldLZE0BlU7a6kcgv5gH4LggN0BlUH3dCWoG/c3lolJ4OC sJEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9xXKRepJQ3cq0XLqXzeTfQ4xDiSV03kB7Y95atqgvqg=; b=OSQcMywF+GVAOUGVe8IMAEF/dSu4LmS/ygMXWXf6kBrNJH6+a7GqSsngj3DAPrexm3 s/8trdLFW4XYv9N9cnpgqNECvDyoD3CbN0A/w1HUit0pd+V5HN/KliBLnX3TIhcLiAjT WYg1sZxo/lOtQUdQnjLs0kzyNrtKBLAV408M0TE+05RE9T/8xmOmcPtnTDW39qigEQBS LUZ8nYd4vF8BC63OZSFE1H7OuVv76HaL+mmx5vA8b0DNnVa2fYx02EboLHUbAInl48k/ 7BPPhH4Ph3nfKe20hMGKkm6U9bszpjF1fgWCX5kmjirF61Vy62HxIUvpLmKy35y6kfD3 iBuA== X-Gm-Message-State: APjAAAUAayyxOr60ILvUksGGcNAJRjpr7HX9nwHt+R0TFbgY1U767Afv GzaSkAjGzQdSqw08asUJteguXNu1 X-Google-Smtp-Source: APXvYqz40b8kVkMzNGTsP5prynPiCpiT55L63UrRm4cp53zvf1H53TDjDfcyfWZTCOpvXBZ+1MirKQ== X-Received: by 2002:aa7:8b4c:: with SMTP id i12mr22701467pfd.189.1556207486890; Thu, 25 Apr 2019 08:51:26 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id e6sm15244914pfe.158.2019.04.25.08.51.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 25 Apr 2019 08:51:26 -0700 (PDT) From: Elijah Newren To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, Elijah Newren Subject: [PATCH 1/5] t9350: fix encoding test to actually test reencoding Date: Thu, 25 Apr 2019 08:51:14 -0700 Message-Id: <20190425155118.7918-2-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gb6cebc4909 In-Reply-To: <20190425155118.7918-1-newren@gmail.com> References: <20190425155118.7918-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This test used an author with non-ascii characters in the name, but no special commit message. It then grep'ed for those non-ascii characters, but those are guaranteed to exist regardless of the reencoding process since the reencoding only affects the commit message, not the author or committer names. As such, the test would work even if the re-encoding process simply stripped the commit message entirely. Modify the test to actually check that the reencoding in utf-8 worked. Signed-off-by: Elijah Newren --- t/t9350-fast-export.sh | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 5690fe2810..6c07f910eb 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -94,22 +94,22 @@ test_expect_success 'fast-export --show-original-ids | git fast-import' ' test $MUSS = $(git rev-parse --verify refs/tags/muss) ' -test_expect_success 'iso-8859-1' ' +test_expect_success 'iso-8859-7' ' - git config i18n.commitencoding ISO8859-1 && - # use author and committer name in ISO-8859-1 to match it. - . "$TEST_DIRECTORY"/t3901/8859-1.txt && + test_when_finished "git reset --hard HEAD~1" && + test_when_finished "git config --unset i18n.commitencoding" && + git config i18n.commitencoding iso-8859-7 && test_tick && echo rosten >file && - git commit -s -m den file && - git fast-export wer^..wer >iso8859-1.fi && - sed "s/wer/i18n/" iso8859-1.fi | + git commit -s -m "$(printf "Pi: \360")" file && + git fast-export wer^..wer >iso-8859-7.fi && + sed "s/wer/i18n/" iso-8859-7.fi | (cd new && git fast-import && git cat-file commit i18n >actual && - grep "Áéí óú" actual) - + grep $(printf "\317\200") actual) ' + test_expect_success 'import/export-marks' ' git checkout -b marks master && @@ -224,7 +224,6 @@ GIT_COMMITTER_NAME='C O Mitter'; export GIT_COMMITTER_NAME test_expect_success 'setup copies' ' - git config --unset i18n.commitencoding && git checkout -b copy rein && git mv file file3 && git commit -m move1 && From patchwork Thu Apr 25 15:51:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10917401 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 39F961575 for ; Thu, 25 Apr 2019 15:51:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2BC9B28B91 for ; Thu, 25 Apr 2019 15:51:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1996228D12; Thu, 25 Apr 2019 15:51:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A041F28B91 for ; Thu, 25 Apr 2019 15:51:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728652AbfDYPvb (ORCPT ); Thu, 25 Apr 2019 11:51:31 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:38482 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728646AbfDYPva (ORCPT ); Thu, 25 Apr 2019 11:51:30 -0400 Received: by mail-pg1-f195.google.com with SMTP id j26so55902pgl.5 for ; Thu, 25 Apr 2019 08:51:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hrY+jpjJ/la7napOEkZfoHJt/Vt7L0FgqA6m6iB7Ens=; b=Skz3TlngI8f1dbCG3Jze0Ggdg0qCjwb/fEHnr46dfQ7mEBzfMh/Zh/reONSoVjO9h3 JHMFCgpG66C8KpsRouELRIyo5kg4opEFJFgV2lLJM8mCyhCkY1R+mIJbZ6I1oF/jv3fB 2VrTE5FwdTpvYKjojGbnWqfvXKb48AfQx/TAd3tFgQfbjdknI/6UB/KtoSZkIbJt070z adaGxjuA8Vo8RBDNEOhAznp4eCh7a59E0UV5YQghoU9ig/HkIw3GOEG8O7I5aFfL8AsY SHcyhMCSQ6CzjjNLlDFrkwfYxWW/DhMf9QjGoN01Iirrwhst2gxO//r7FKlQMLl6egpk 3WFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hrY+jpjJ/la7napOEkZfoHJt/Vt7L0FgqA6m6iB7Ens=; b=ZymQYLgAWNbhC8wulXlfaI1+gsKE7jy1MeKDvvacNQQ83gDovnOqegoeU24lL3lmln v+jsUJ5uG1uA614OaWjCrHvpGJgL3W0tyBzt9/rWmJFotAEa4EWK+tldTEUsBPklz5t1 XS0srV8y1ofuY5Ipl5tjS6HzKHUi9NVs41WJ4PHHcIe6wWd0h6sIJ2uV5P33imt5Lp3L 8ojUJdEQK5mL0puf/3x04LWhsTHYGIFUFp+QiZN6tPhXxjyOI54ndJiisHozy+FoHEqw WAcGF+yy/jqu6AGXw32/rP8rdTeUk6raq3PvYkZc9OfPilbpfOwtPmTdPR5WtharI9AB 1Fxg== X-Gm-Message-State: APjAAAUPIOOmmDNkdhiHNpTJgeYT82aOSy4v12zehye0igXn3eVUH5SS c3zFoHBy1EdLQpAKy3+zEVUO3+7d X-Google-Smtp-Source: APXvYqycGUSt8Rfb2R8If4rZbcaLf0mYxEk3clP4tewtEJY/4u++6DSzYCq2ebka1COEvfvKG54quw== X-Received: by 2002:a63:5012:: with SMTP id e18mr37386411pgb.383.1556207488131; Thu, 25 Apr 2019 08:51:28 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id e6sm15244914pfe.158.2019.04.25.08.51.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 25 Apr 2019 08:51:27 -0700 (PDT) From: Elijah Newren To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, Elijah Newren Subject: [PATCH 2/5] fast-import: support 'encoding' commit header Date: Thu, 25 Apr 2019 08:51:15 -0700 Message-Id: <20190425155118.7918-3-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gb6cebc4909 In-Reply-To: <20190425155118.7918-1-newren@gmail.com> References: <20190425155118.7918-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since git supports commit messages with an encoding other than utf-8, allow fast-import to import such commits. This may be useful for folks who do not want to reencode commit messages from an external system, and may also be useful to achieve reversible history rewrites (e.g. sha1sum <-> sha256sum transitions or subtree work) with git repositories that have used specialized encodings in their commit history. Signed-off-by: Elijah Newren --- Documentation/git-fast-import.txt | 7 +++++++ fast-import.c | 12 ++++++++++-- t/t9300-fast-import.sh | 20 ++++++++++++++++++++ 3 files changed, 37 insertions(+), 2 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index d65cdb3d08..7baf9e47b5 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -388,6 +388,7 @@ change to the project. original-oid? ('author' (SP )? SP LT GT SP LF)? 'committer' (SP )? SP LT GT SP LF + ('encoding' SP )? data ('from' SP LF)? ('merge' SP LF)? @@ -455,6 +456,12 @@ that was selected by the --date-format= command-line option. See ``Date Formats'' above for the set of supported formats, and their syntax. +`encoding` +^^^^^^^^^^ +The optional `encoding` command indicates the encoding of the commit +message. Most commits are UTF-8 and the encoding is omitted, but this +allows importing commit messages into git without first reencoding them. + `from` ^^^^^^ The `from` command is used to specify the commit to initialize diff --git a/fast-import.c b/fast-import.c index f38d04fa58..25026c068a 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2585,6 +2585,7 @@ static void parse_new_commit(const char *arg) struct branch *b; char *author = NULL; char *committer = NULL; + const char *encoding = NULL; struct hash_list *merge_list = NULL; unsigned int merge_count; unsigned char prev_fanout, new_fanout; @@ -2607,6 +2608,9 @@ static void parse_new_commit(const char *arg) } if (!committer) die("Expected committer but didn't get one"); + if (skip_prefix(command_buf.buf, "encoding ", &encoding)) { + read_next_command(); + } parse_data(&msg, 0, NULL); read_next_command(); parse_from(b); @@ -2670,9 +2674,13 @@ static void parse_new_commit(const char *arg) } strbuf_addf(&new_data, "author %s\n" - "committer %s\n" - "\n", + "committer %s\n", author ? author : committer, committer); + if (encoding) + strbuf_addf(&new_data, + "encoding %s\n", + encoding); + strbuf_addf(&new_data, "\n"); strbuf_addbuf(&new_data, &msg); free(author); free(committer); diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 3668263c40..141b7fa35e 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3299,4 +3299,24 @@ test_expect_success !MINGW 'W: get-mark & empty orphan commit with erroneous thi sed -e s/LFs/LLL/ W-input | tr L "\n" | test_must_fail git fast-import ' +### +### series X (other new features) +### + +test_expect_success 'X: handling encoding' ' + test_tick && + cat >input <<-INPUT_END && + commit refs/heads/encoding + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE + encoding iso-8859-7 + data <>input && + + git fast-import X-Patchwork-Id: 10917403 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9B9A1669 for ; Thu, 25 Apr 2019 15:51:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BA79B28D05 for ; Thu, 25 Apr 2019 15:51:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B839528D12; Thu, 25 Apr 2019 15:51:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 59C5328D07 for ; Thu, 25 Apr 2019 15:51:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728700AbfDYPvc (ORCPT ); Thu, 25 Apr 2019 11:51:32 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:44922 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726279AbfDYPva (ORCPT ); Thu, 25 Apr 2019 11:51:30 -0400 Received: by mail-pg1-f193.google.com with SMTP id z16so39700pgv.11 for ; Thu, 25 Apr 2019 08:51:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=r0pan+jubN3FbM6wnXkNvfYqTU0f3coUCryga+dFXdw=; b=jSH/z1SYo7vvRGoM3p/e04Z+BLjb3WUlDPA6LFrJGWnqZhdZ1akvQ7K3v/uy36PFkX RV0HD+nMidsP71rM01zPmYBWKckGOOQWBaBIuQPhFGkHtX+FxZbJh1+LOQaEhWNg/kvn PVeILyTpWnLVxFHrMk154JcJxButmuZYEuPVG6Sc1pW9q52488VOox/jhtFtwr0iPza5 TKKD0nHxpfN5anKe+vIWu5qRV1+AdcyyDfslNvZtFNKcKKP9M9IZmWXgKSOGCogqYE67 rbhw2U4z5KS/al4n8w09GJg7lWUBheHFQortwrgk0ztDIw1YT+ai0YFtzpK7IfHKH0di bnfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=r0pan+jubN3FbM6wnXkNvfYqTU0f3coUCryga+dFXdw=; b=oVFZCiCWNg8GdegI5L01oKKPsOGRMv6g7Ugyp3rPDi87qh3B/fAPHDpodQaYWLXfhc XrlFav06ORyYz4kJT3N4q8nn4JIEHWXWqXSdAk1p/LRJUNDWfhMVUAi8Nr/455jdrZh6 ks5qEoM39ebEdbn9fw3mGrohaQ5qL0zaWZEbtrmG8A7Yq2RUraCqqeJHXbHB0SV9Cno6 BoWsE08JRlSSC7f8qK3mwG5tJgo/Hag5KicUJofMLSNRonrciMUrrG+ACvviQcuNUnG0 mnm3CYRbhcYqWore+wMuqARqg0tBYZ3ftCGJJ1E9/uVPTHcPrJYZ7leXve/HrkNzuwc8 0ZCA== X-Gm-Message-State: APjAAAV2r6TfBX6QvpIaCthWYl7/377yzwP5WPXvOvr9vfjHhhY9XlWL BF7vqxCCvugteEMHozJB8uZf5wkG X-Google-Smtp-Source: APXvYqzV4/2v0/tKb2WEkzkBqje1z1Bo0trAFYK05E1oVHJLjhaTlQgoYlk12By+xdA9L5lnlD6yJg== X-Received: by 2002:a63:d34b:: with SMTP id u11mr6085886pgi.385.1556207489510; Thu, 25 Apr 2019 08:51:29 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id e6sm15244914pfe.158.2019.04.25.08.51.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 25 Apr 2019 08:51:28 -0700 (PDT) From: Elijah Newren To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, Elijah Newren Subject: [PATCH 3/5] fast-export: avoid stripping encoding header if we cannot reencode Date: Thu, 25 Apr 2019 08:51:16 -0700 Message-Id: <20190425155118.7918-4-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gb6cebc4909 In-Reply-To: <20190425155118.7918-1-newren@gmail.com> References: <20190425155118.7918-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When fast-export encounters a commit with an 'encoding' header, it tries to reencode in utf-8 and then drops the encoding header. However, if it fails to reencode in utf-8 because e.g. one of the characters in the commit message was invalid in the old encoding, then we need to retain the original encoding or otherwise we lose information needed to understand all the other (valid) characters in the original commit message. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 7 +++++-- t/t9350-fast-export.sh | 15 +++++++++++++++ 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 9e283482ef..7734a9f5a5 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -642,9 +642,12 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum); if (show_original_ids) printf("original-oid %s\n", oid_to_hex(&commit->object.oid)); - printf("%.*s\n%.*s\ndata %u\n%s", + printf("%.*s\n%.*s\n", (int)(author_end - author), author, - (int)(committer_end - committer), committer, + (int)(committer_end - committer), committer); + if (!reencoded && encoding) + printf("encoding %s\n", encoding); + printf("data %u\n%s", (unsigned)(reencoded ? strlen(reencoded) : message ? strlen(message) : 0), diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 6c07f910eb..975c8c4014 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -110,6 +110,21 @@ test_expect_success 'iso-8859-7' ' grep $(printf "\317\200") actual) ' +test_expect_success 'encoding preserved if reencoding fails' ' + + test_when_finished "git reset --hard HEAD~1" && + test_when_finished "git config --unset i18n.commitencoding" && + git config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -m "$(printf "Pi: \360; Invalid: \377")" file && + git fast-export wer^..wer >iso-8859-7.fi && + sed "s/wer/i18n-invalid/" iso-8859-7.fi | + (cd new && + git fast-import && + git cat-file commit i18n-invalid >actual && + grep ^encoding actual) +' + test_expect_success 'import/export-marks' ' git checkout -b marks master && From patchwork Thu Apr 25 15:51:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10917407 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 80C4B912 for ; Thu, 25 Apr 2019 15:51:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7244728BB3 for ; Thu, 25 Apr 2019 15:51:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 70C7228D0E; Thu, 25 Apr 2019 15:51:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25D5028BB3 for ; Thu, 25 Apr 2019 15:51:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728754AbfDYPve (ORCPT ); Thu, 25 Apr 2019 11:51:34 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:45924 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728671AbfDYPvb (ORCPT ); Thu, 25 Apr 2019 11:51:31 -0400 Received: by mail-pl1-f193.google.com with SMTP id o5so6976186pls.12 for ; Thu, 25 Apr 2019 08:51:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dVvf7mO3OB63Yr7gaYToOtJpsoKT66F3aylKffHa2no=; b=WzVaPVxiYlWww2fqJgMKZWiAgF40/FlugkeFjTi5TWs2nPto3TvkIWF2wbS/UYl4rm Ijtm2K2LyM2E+s5Vq1G0kGn7/D/z8S729Atm4JdEe1coAQUPYQ6xcybj2rURglCaw7HF Q4+gK8VoHLHKo6k2hwLb6m8RWkrtlAzO9a+sl/mbhxB43Wa9ZdR5xT9QDuohIulxrNrE cZCHS7KiaPFpSl9m6SBMG+TKCOqldHCRyFcED9ph2XPQR2eUQ0iczx1hdLGDP0zEAh5p 02+NyNF3JfZKOU0jtqv/5kHcugLZTIuNXZ9w/0glp+/9SHzMx0eLi53q6MCgNFX4B9bp OypA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dVvf7mO3OB63Yr7gaYToOtJpsoKT66F3aylKffHa2no=; b=pJ9cgzB85OzhVNmogkENm8Ts1xBdQtwtBw5X3pO11+Q8M7SaI7cE+hBxYoyBJL2V+W JbfOQ84x0Ts/LhxUb2NN7+h3fmweoW7gtE4izeNJ0vNKQrDIHBtCFzCi+Z/9MOad6IsG koRIzGVtE+nPFrNwqciS0JO1cvIV9D1KwyjGVJM6zB6uVTtA2RcEC1s6PYKDcVDzvDL3 Nw0qKm8+Slm0jpgWBFDiDuGSXQpa7tLAxLznXEGH26PbE3VvRmZgsfcYjgPgYWDpSpyJ ra7hloTKD+ydU9plYgl2Z8FgxDMRVjhtXBN4U3GEut+Jnt+Iv8XYgEpl7g3IUtQLOJA6 4p3Q== X-Gm-Message-State: APjAAAXMF2aV+CucaGXCxGqITzcr0CEd9a9SN9MCvPl+aUDLwHVgIN+V IQndE501nvloudNdzwdVVvyc9kZV X-Google-Smtp-Source: APXvYqzbyXbwIoz6qJ76SDozH0WdbYbyqb1Hulrfom0Uf3p8qr8iYXeSY1uPDc7UcIByh7955a7B5A== X-Received: by 2002:a17:902:b095:: with SMTP id p21mr8187767plr.40.1556207490612; Thu, 25 Apr 2019 08:51:30 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id e6sm15244914pfe.158.2019.04.25.08.51.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 25 Apr 2019 08:51:29 -0700 (PDT) From: Elijah Newren To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, Elijah Newren Subject: [PATCH 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Date: Thu, 25 Apr 2019 08:51:17 -0700 Message-Id: <20190425155118.7918-5-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gb6cebc4909 In-Reply-To: <20190425155118.7918-1-newren@gmail.com> References: <20190425155118.7918-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The find_encoding() function returned the encoding used by a commit message, returning a default of git_commit_encoding (usually utf-8). Although the current code does not differentiate between a commit which explicitly requested utf-8 and one where we just assume utf-8 because no encoding is set, it will become important when we try to preserve the encoding header. Since is_encoding_utf8() returns true when passed NULL, we can just return NULL from find_encoding() instead of returning git_commit_encoding. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 7734a9f5a5..66331fa401 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -453,7 +453,7 @@ static const char *find_encoding(const char *begin, const char *end) bol = memmem(begin, end ? end - begin : strlen(begin), needle, strlen(needle)); if (!bol) - return git_commit_encoding; + return NULL; bol += strlen(needle); eol = strchrnul(bol, '\n'); *eol = '\0'; From patchwork Thu Apr 25 15:51:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10917405 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 766AB912 for ; Thu, 25 Apr 2019 15:51:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 693AA28D0E for ; Thu, 25 Apr 2019 15:51:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5E02A28BB3; Thu, 25 Apr 2019 15:51:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C326F28ABB for ; Thu, 25 Apr 2019 15:51:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728765AbfDYPvf (ORCPT ); Thu, 25 Apr 2019 11:51:35 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:39820 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726279AbfDYPvd (ORCPT ); Thu, 25 Apr 2019 11:51:33 -0400 Received: by mail-pg1-f193.google.com with SMTP id l18so53167pgj.6 for ; Thu, 25 Apr 2019 08:51:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5yT0g1QeYOphu9Z/Dm6VHupxsCEOtGhZlz9HknSR3ko=; b=ZzzBhZoNIDq6boSHhUfIRBJIINB6mOzU9lgdUJUrqZU5KfM/43WNybOjMx6Ai1mCg2 Sbt+uHcrcNR3vHhbXKxgj6yHHiTP8+a1BkeQsCAZ21CAzRNx6bXsBhhaMTVqw4sqJ7/+ RotcgW19sSYLabKX5TX2SBKqSGhz1PCs1Eiqv6pNxE732+/gsUeOltu9OJ82+aHGG8L5 UBodmslp5XImaN0AAry0o42JX+lvOhW1d9V7jF/yoco/ZqK4IsnnUsTRaulab99W3sz2 hDKc6zzSMXnW3W1AlQ1OhJfXKlrC7dHhpZcUxrB3wMPHWr3CLK/Ct3bgu1Sp328LRVPl rGoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5yT0g1QeYOphu9Z/Dm6VHupxsCEOtGhZlz9HknSR3ko=; b=bqdv2mxu6fkN7ilapJO541OXUVRBTmsLqHG0POvEPm/7d0zlgk6o15LkQfAtLtgm6F MAb3o+jEpO8cFWyaEW/QRO+Ca+xeRyT7kVIzJS7VrHv+kacJ3hInVWCYwPF5qcCgci+9 6bXUJAlRbBQdalI0W6n+XEIb00/N4nxs+wCUNgH4H8g5yJpbJRdVQXyxgpNdZzR3B3gz dhS25Dpb19LfgsvTwEyYeldPGNY2oxZLeiRLyEvbvbQ7UVFueKNJ2cz/XGLU3SI8pFmz VizNnCqgy/3CwVW0epjjv1q7cKzAMK4JqqK9gWO9z7aWDa9UR2CvecGqwi9FK41Khfrb Z9lg== X-Gm-Message-State: APjAAAVMD2AiQBiIZogO9w4aKr3A/AjWXB327l+cGL3MslCf/+ljp0dr wCJbrVnhQkSij+ak+1w6nBWT50GE X-Google-Smtp-Source: APXvYqwOp5kGMPNb6SJV51O2IS8nVSHEA7j28gm60DhvJ1bFEKtc2K1k5/tFFc8wHx1EWCT7vKZo3g== X-Received: by 2002:aa7:8453:: with SMTP id r19mr41147052pfn.44.1556207491708; Thu, 25 Apr 2019 08:51:31 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id e6sm15244914pfe.158.2019.04.25.08.51.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 25 Apr 2019 08:51:30 -0700 (PDT) From: Elijah Newren To: git@vger.kernel.org Cc: gitster@pobox.com, johannes.schindelin@gmx.de, Elijah Newren Subject: [PATCH 5/5] fast-export: do automatic reencoding of commit messages only if requested Date: Thu, 25 Apr 2019 08:51:18 -0700 Message-Id: <20190425155118.7918-6-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gb6cebc4909 In-Reply-To: <20190425155118.7918-1-newren@gmail.com> References: <20190425155118.7918-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Automatic re-encoding of commit messages (and dropping of the encoding header) hurts attempts to do reversible history rewrites (e.g. sha1sum <-> sha256sum transitions, some subtree rewrites), and seems inconsistent with the general principle followed elsewhere in fast-export of requiring explicit user requests to modify the output (e.g. --signed-tags=strip, --tag-of-filtered-object=rewrite). Add a --reencode flag that the user can use to specify, and like other fast-export flags, default it to 'abort'. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 35 ++++++++++++++++++++++++++++++++--- t/t9350-fast-export.sh | 31 ++++++++++++++++++++++++++++--- 2 files changed, 60 insertions(+), 6 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 66331fa401..43cc52331c 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -33,6 +33,7 @@ static const char *fast_export_usage[] = { static int progress; static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT; static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = TAG_FILTERING_ABORT; +static enum { REENCODE_ABORT, REENCODE_PLEASE, REENCODE_NEVER } reencode_mode = REENCODE_ABORT; static int fake_missing_tagger; static int use_done_feature; static int no_data; @@ -77,6 +78,20 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt, return 0; } +static int parse_opt_reencode_mode(const struct option *opt, + const char *arg, int unset) +{ + if (unset || !strcmp(arg, "abort")) + reencode_mode = REENCODE_ABORT; + else if (!strcmp(arg, "yes")) + reencode_mode = REENCODE_PLEASE; + else if (!strcmp(arg, "no")) + reencode_mode = REENCODE_NEVER; + else + return error("Unknown reencoding mode: %s", arg); + return 0; +} + static struct decoration idnums; static uint32_t last_idnum; @@ -633,10 +648,21 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, } mark_next_object(&commit->object); - if (anonymize) + if (anonymize) { reencoded = anonymize_commit_message(message); - else if (!is_encoding_utf8(encoding)) - reencoded = reencode_string(message, "UTF-8", encoding); + } else if (encoding) { + switch(reencode_mode) { + case REENCODE_PLEASE: + reencoded = reencode_string(message, "UTF-8", encoding); + break; + case REENCODE_NEVER: + break; + case REENCODE_ABORT: + die("Encountered commit-specific encoding %s in commit " + "%s; use --reencode= to handle it", + encoding, oid_to_hex(&commit->object.oid)); + } + } if (!commit->parents) printf("reset %s\n", refname); printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum); @@ -1091,6 +1117,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) OPT_CALLBACK(0, "tag-of-filtered-object", &tag_of_filtered_mode, N_("mode"), N_("select handling of tags that tag filtered objects"), parse_opt_tag_of_filtered_mode), + OPT_CALLBACK(0, "reencode", &reencode_mode, N_("mode"), + N_("select handling of commit messages in an alternate encoding"), + parse_opt_reencode_mode), OPT_STRING(0, "export-marks", &export_filename, N_("file"), N_("Dump marks to this file")), OPT_STRING(0, "import-marks", &import_filename, N_("file"), diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 975c8c4014..4774926bb6 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -94,7 +94,7 @@ test_expect_success 'fast-export --show-original-ids | git fast-import' ' test $MUSS = $(git rev-parse --verify refs/tags/muss) ' -test_expect_success 'iso-8859-7' ' +test_expect_success 'reencoding iso-8859-7' ' test_when_finished "git reset --hard HEAD~1" && test_when_finished "git config --unset i18n.commitencoding" && @@ -102,7 +102,7 @@ test_expect_success 'iso-8859-7' ' test_tick && echo rosten >file && git commit -s -m "$(printf "Pi: \360")" file && - git fast-export wer^..wer >iso-8859-7.fi && + git fast-export --reencode=yes wer^..wer >iso-8859-7.fi && sed "s/wer/i18n/" iso-8859-7.fi | (cd new && git fast-import && @@ -110,6 +110,31 @@ test_expect_success 'iso-8859-7' ' grep $(printf "\317\200") actual) ' +test_expect_success 'aborting on iso-8859-7' ' + + test_when_finished "git reset --hard HEAD~1" && + test_when_finished "git config --unset i18n.commitencoding" && + git config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -m "$(printf "Pi: \360")" file && + test_must_fail git fast-export --reencode=abort wer^..wer >iso-8859-7.fi +' + +test_expect_success 'preserving iso-8859-7' ' + + test_when_finished "git reset --hard HEAD~1" && + test_when_finished "git config --unset i18n.commitencoding" && + git config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -m "$(printf "Pi: \360")" file && + git fast-export --reencode=no wer^..wer >iso-8859-7.fi && + sed "s/wer/i18n-no-recoding/" iso-8859-7.fi | + (cd new && + git fast-import && + git cat-file commit i18n-no-recoding >actual && + grep $(printf "\360") actual) +' + test_expect_success 'encoding preserved if reencoding fails' ' test_when_finished "git reset --hard HEAD~1" && @@ -117,7 +142,7 @@ test_expect_success 'encoding preserved if reencoding fails' ' git config i18n.commitencoding iso-8859-7 && echo rosten >file && git commit -s -m "$(printf "Pi: \360; Invalid: \377")" file && - git fast-export wer^..wer >iso-8859-7.fi && + git fast-export --reencode=yes wer^..wer >iso-8859-7.fi && sed "s/wer/i18n-invalid/" iso-8859-7.fi | (cd new && git fast-import &&