From patchwork Mon May 13 23:17:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10941821 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 73ADB912 for ; Mon, 13 May 2019 23:17:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 62B5D28355 for ; Mon, 13 May 2019 23:17:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56B9C283A8; Mon, 13 May 2019 23:17:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC83228355 for ; Mon, 13 May 2019 23:17:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726594AbfEMXRi (ORCPT ); Mon, 13 May 2019 19:17:38 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:37096 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726233AbfEMXRh (ORCPT ); Mon, 13 May 2019 19:17:37 -0400 Received: by mail-pl1-f194.google.com with SMTP id p15so7220306pll.4 for ; Mon, 13 May 2019 16:17:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HXgk7z/58HyeVsQxLdOaabgOZbmBoCFGAHLXgiHTEic=; b=KUguyJwvqGKQnLzLZ5fmCeKQqJJtzsb44F3attXj9tGFy+BtmMNRKpJrZ2bk23VARU kUllwGRjQXrDQabIOe7oQuLOVv4N1N6WwLRWJgVQlmDNK1bQd9AbdW0YSRbzFzp/4u9e 2GwnFPOTvH8aQjiRMJRdcf7uuyjRqcsDSfYmwrlxNlHiUFtuZpzUjux9/nxSQrD9+sna fXLRqSyx899hARWqqopASaoXIkz7b3DN/5JKtgGZCUWSuszP/j9+48rGt5fnIQEeaTLN sdb5dbt+X1wP53xEwJk7B3e/AuUjx7lay8GFvbhnODX/WnQmVKFC1NiVckN+/zhh8ZFt ZPcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HXgk7z/58HyeVsQxLdOaabgOZbmBoCFGAHLXgiHTEic=; b=JYTTjUUC+cvlpZhn0yjlKjA/xgHcNUmlimsNc19RP1adev8Chz6imxJthDtXytGexQ G1e3o3MGpF+XwOjcERQ8kbPM0ITsmpCLdrdhRTdrUyg5pAyFKQWEo+27hkRULhPBVi0E YDZZ9qii2RnOE9AM+3UlzcXUVA+35cyLhpRb4HzQ57y4z1tn0rT+UW6SDxrBj2sPeFxz rNaY7rkDBKJTbOVswXnXZZexxMjtoFqa5mi24IBW4jyB2MkDBC3WepqKP5jrzEvQ9D/j IJT47yA9MepjYLoJjHp/f48IQsYh+LWxBtz/Gp9+L+eZaguERsw4ifbPdNm3engtjfsM GaeQ== X-Gm-Message-State: APjAAAUKB91hYIwu+rYuDplRR+FYuOL8/f6kYm+shMqGKB/mHrENgqcE 3hRja0DVqs3XnWb1ee7F1DE= X-Google-Smtp-Source: APXvYqyYbZAbiYO743qWwQMon0GUx0TStM2DTmiPwx/ld+SOXZ73KWhrD8ntP3Ex1l5esAVb/LVwbg== X-Received: by 2002:a17:902:be0e:: with SMTP id r14mr16706408pls.152.1557789456743; Mon, 13 May 2019 16:17:36 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id g10sm30664307pfg.153.2019.05.13.16.17.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 13 May 2019 16:17:35 -0700 (PDT) From: Elijah Newren To: Junio C Hamano Cc: git@vger.kernel.org, Eric Sunshine , Johannes Schindelin , Johannes Sixt , =?utf-8?q?Torsten_B=C3=B6gershausen?= , Elijah Newren Subject: [PATCH v5 2/5] fast-import: support 'encoding' commit header Date: Mon, 13 May 2019 16:17:23 -0700 Message-Id: <20190513231726.16218-3-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gd8be4ee826 In-Reply-To: <20190513231726.16218-1-newren@gmail.com> References: <20190513164722.31534-1-newren@gmail.com> <20190513231726.16218-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since git supports commit messages with an encoding other than utf-8, allow fast-import to import such commits. This may be useful for folks who do not want to reencode commit messages from an external system, and may also be useful to achieve reversible history rewrites (e.g. sha1sum <-> sha256sum transitions or subtree work) with git repositories that have used specialized encodings in their commit history. Signed-off-by: Elijah Newren --- Documentation/git-fast-import.txt | 7 +++++++ fast-import.c | 11 +++++++++-- t/t9300-fast-import.sh | 20 ++++++++++++++++++++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index d65cdb3d08..7baf9e47b5 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -388,6 +388,7 @@ change to the project. original-oid? ('author' (SP )? SP LT GT SP LF)? 'committer' (SP )? SP LT GT SP LF + ('encoding' SP )? data ('from' SP LF)? ('merge' SP LF)? @@ -455,6 +456,12 @@ that was selected by the --date-format= command-line option. See ``Date Formats'' above for the set of supported formats, and their syntax. +`encoding` +^^^^^^^^^^ +The optional `encoding` command indicates the encoding of the commit +message. Most commits are UTF-8 and the encoding is omitted, but this +allows importing commit messages into git without first reencoding them. + `from` ^^^^^^ The `from` command is used to specify the commit to initialize diff --git a/fast-import.c b/fast-import.c index f38d04fa58..76a7bd3699 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2585,6 +2585,7 @@ static void parse_new_commit(const char *arg) struct branch *b; char *author = NULL; char *committer = NULL; + const char *encoding = NULL; struct hash_list *merge_list = NULL; unsigned int merge_count; unsigned char prev_fanout, new_fanout; @@ -2607,6 +2608,8 @@ static void parse_new_commit(const char *arg) } if (!committer) die("Expected committer but didn't get one"); + if (skip_prefix(command_buf.buf, "encoding ", &encoding)) + read_next_command(); parse_data(&msg, 0, NULL); read_next_command(); parse_from(b); @@ -2670,9 +2673,13 @@ static void parse_new_commit(const char *arg) } strbuf_addf(&new_data, "author %s\n" - "committer %s\n" - "\n", + "committer %s\n", author ? author : committer, committer); + if (encoding) + strbuf_addf(&new_data, + "encoding %s\n", + encoding); + strbuf_addch(&new_data, '\n'); strbuf_addbuf(&new_data, &msg); free(author); free(committer); diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 3668263c40..141b7fa35e 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3299,4 +3299,24 @@ test_expect_success !MINGW 'W: get-mark & empty orphan commit with erroneous thi sed -e s/LFs/LLL/ W-input | tr L "\n" | test_must_fail git fast-import ' +### +### series X (other new features) +### + +test_expect_success 'X: handling encoding' ' + test_tick && + cat >input <<-INPUT_END && + commit refs/heads/encoding + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE + encoding iso-8859-7 + data <>input && + + git fast-import X-Patchwork-Id: 10941823 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 424AF76 for ; Mon, 13 May 2019 23:17:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3286628355 for ; Mon, 13 May 2019 23:17:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 270FC283A8; Mon, 13 May 2019 23:17:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF0FF28355 for ; Mon, 13 May 2019 23:17:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726607AbfEMXRk (ORCPT ); Mon, 13 May 2019 19:17:40 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:38711 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726598AbfEMXRk (ORCPT ); Mon, 13 May 2019 19:17:40 -0400 Received: by mail-pg1-f195.google.com with SMTP id j26so7545740pgl.5 for ; Mon, 13 May 2019 16:17:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=K5KGWlgLIaFfGbNmAYY2dGdpfN8dIpTvYhklN2HMICA=; b=RMNU14uBAkFuHHNH7SNM4TzBnAvzILqjR/1+IcO3Hu92Lzjn+R3d/e8XetVyf8e259 1drMpfhA+GBl66qnhOBAoNzWOX5/OvCLoT0jdTG1bMpgQk5iqQoHACsAHK60V7lIEUc3 E3B5EHM3zTbMlYibxNoFeolATJQNMrSj1Zlijsa9GZxYmm/xHNGRaZUDjKnWOUXPFcg8 bWLSx+D9IaDeYRVOHHE85a13K7gbKg5uXACN9fGzIycQpaYEaIMiQaPNE7wQFU37GZ5W y7v2r0M5wJrksvB1wXmjwe4M4HQG0VRaazJDr08pdu8i6FYDp0zILJ9jrTJZEBLy/Lwa chsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=K5KGWlgLIaFfGbNmAYY2dGdpfN8dIpTvYhklN2HMICA=; b=DmI95x+wT2TlhaNQXV+E/gRzo8kmwextLh74Kfh08ZNO2bkgngSwambgiDxkXX9kjZ dVCuz+AxBZHEDv8APi6k5IOxdRpkXKn7IIX38PjBuiD/f57kBVfEj8UCXviv/YVae9qY ghg1mUYooaxUW8fDqhnYozJfA4DcQTt6RLnm9obgKxDJ8vY0+r2/muEqaZ9cQo9Rtg0E /kDKZtwZRpq9M45+S38JAa6b9dMV7bdImafCI+60bcoz6vn94Pf3Lehh/AjT5IQ4nDxs k0YYjF3g27oHVMrXhrPQXVbXXMWwk7RxdDr2sckrR+2uwpGtyrW7xzt3bLl5gKMLSdqU 4how== X-Gm-Message-State: APjAAAU6uu9MR/EUlIT5mrb7xvm3HBr+uqD08xBNvJrW9W1buwK85m75 GqvPfZb0lETrUny7w6oKoN4= X-Google-Smtp-Source: APXvYqwd+sPwDnZZxF8W4fUPkIcU2cJG9950XqLzDt4RoMUXSa0z7RYmHKKjEPzgd+iPQ0Z7DOAgZQ== X-Received: by 2002:aa7:808d:: with SMTP id v13mr5819870pff.198.1557789459251; Mon, 13 May 2019 16:17:39 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id g10sm30664307pfg.153.2019.05.13.16.17.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 13 May 2019 16:17:38 -0700 (PDT) From: Elijah Newren To: Junio C Hamano Cc: git@vger.kernel.org, Eric Sunshine , Johannes Schindelin , Johannes Sixt , =?utf-8?q?Torsten_B=C3=B6gershausen?= , Elijah Newren Subject: [PATCH v5 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Date: Mon, 13 May 2019 16:17:25 -0700 Message-Id: <20190513231726.16218-5-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gd8be4ee826 In-Reply-To: <20190513231726.16218-1-newren@gmail.com> References: <20190513164722.31534-1-newren@gmail.com> <20190513231726.16218-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The find_encoding() function returned the encoding used by a commit message, returning a default of git_commit_encoding (usually utf-8). Although the current code does not differentiate between a commit which explicitly requested utf-8 and one where we just assume utf-8 because no encoding is set, it will become important when we try to preserve the encoding header. Since is_encoding_utf8() returns true when passed NULL, we can just return NULL from find_encoding() instead of returning git_commit_encoding. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 7734a9f5a5..66331fa401 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -453,7 +453,7 @@ static const char *find_encoding(const char *begin, const char *end) bol = memmem(begin, end ? end - begin : strlen(begin), needle, strlen(needle)); if (!bol) - return git_commit_encoding; + return NULL; bol += strlen(needle); eol = strchrnul(bol, '\n'); *eol = '\0'; From patchwork Mon May 13 23:17:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10941825 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD43676 for ; Mon, 13 May 2019 23:17:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CCDCF28355 for ; Mon, 13 May 2019 23:17:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C166A283A8; Mon, 13 May 2019 23:17:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 200E328355 for ; Mon, 13 May 2019 23:17:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726612AbfEMXRn (ORCPT ); Mon, 13 May 2019 19:17:43 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:40999 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726598AbfEMXRl (ORCPT ); Mon, 13 May 2019 19:17:41 -0400 Received: by mail-pg1-f196.google.com with SMTP id z3so7547550pgp.8 for ; Mon, 13 May 2019 16:17:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4j8PUnCn6FXZZXhl10n62GvlgqpHNtp4FRd0UeGI0L0=; b=IV2mctCr/gbIMth7R0/2A1cZ34gLwp9VXchR+6le3tl1jbO2Rg10LisT/vsnb0+E8q F9SLGx5ZRyHwvqTK00n5DTf8kcDvGmVILjvQ/Nkq55JlnuMwoc7e8ROT1zblbRC5hefl UiTQ10I+O58YeDnR2zbzIDeLYykfqbF65kBZWi/ydC0CMhITr03iiFkeCvfEzH2fdsVk Y2Ea68VfTaqWBO9jxr/EuuM7WcySL6W9C8qQpAecnOvRIMTwOM109Y2M8Y4kBwg+entI MZxGIoSZBu12YsCQsJEgm7A/BmLMj5nsRqAnX3C4EY3dp+xKgF9v0LS1hifO51xUUafd P2Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4j8PUnCn6FXZZXhl10n62GvlgqpHNtp4FRd0UeGI0L0=; b=uiQLYrWO6gtAeKFb1FsUBiSrGdMIWiPCS3wtOTKnP/4LkRn0ZnZOGXQGVfXa/oUf/6 p2zfR/VdBiL1ce1xIM9mvjxL0Ylh/UlWuxDn+p2d2MnNcNJY17P1LXHwm9Am+x8tIOfu C1mXTeF95BRCo2r+brgqAFzUFeU02yZRSy9mfpFZSChBqG3mJCmc9iToA7cjyLIkvilF FJT1Ufz4zLjWYL0FDxXIYyWzeyIlCYcAPv43r1dBPi8zTm4ObedyHOIM+/Vdy07BkNhs KE+6SJLl/GiV8fAMBbcGskXO3E5W6T5OqHh2RdbX8r1E0xkk3gs9whyQWwa7+GD1aOgi fuag== X-Gm-Message-State: APjAAAXSKCKEYL4b2DB2kh25SZne4542IZdSwitHLmh4fpFzBJHfBmBF iGmPJW56B4xtO2sF5gEGDsM= X-Google-Smtp-Source: APXvYqyY7+zObGsFPpLbSRIb2TDak2Cnpf40hf6Fr9o9M9LGpCxV9MN8VtMD5nIdb20+yicW7shDQQ== X-Received: by 2002:a62:2b4e:: with SMTP id r75mr36766539pfr.131.1557789460644; Mon, 13 May 2019 16:17:40 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id g10sm30664307pfg.153.2019.05.13.16.17.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 13 May 2019 16:17:39 -0700 (PDT) From: Elijah Newren To: Junio C Hamano Cc: git@vger.kernel.org, Eric Sunshine , Johannes Schindelin , Johannes Sixt , =?utf-8?q?Torsten_B=C3=B6gershausen?= , Elijah Newren Subject: [PATCH v5 5/5] fast-export: do automatic reencoding of commit messages only if requested Date: Mon, 13 May 2019 16:17:26 -0700 Message-Id: <20190513231726.16218-6-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gd8be4ee826 In-Reply-To: <20190513231726.16218-1-newren@gmail.com> References: <20190513164722.31534-1-newren@gmail.com> <20190513231726.16218-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Automatic re-encoding of commit messages (and dropping of the encoding header) hurts attempts to do reversible history rewrites (e.g. sha1sum <-> sha256sum transitions, some subtree rewrites), and seems inconsistent with the general principle followed elsewhere in fast-export of requiring explicit user requests to modify the output (e.g. --signed-tags=strip, --tag-of-filtered-object=rewrite). Add a --reencode flag that the user can use to specify, and like other fast-export flags, default it to 'abort'. Signed-off-by: Elijah Newren --- Documentation/git-fast-export.txt | 7 +++++ builtin/fast-export.c | 46 +++++++++++++++++++++++++++++-- t/t9350-fast-export.sh | 38 +++++++++++++++++++++++-- 3 files changed, 85 insertions(+), 6 deletions(-) diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index 64c01ba918..11427acdde 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -129,6 +129,13 @@ marks the same across runs. for intermediary filters (e.g. for rewriting commit messages which refer to older commits, or for stripping blobs by id). +--reencode=(yes|no|abort):: + Specify how to handle `encoding` header in commit objects. When + asking to 'abort' (which is the default), this program will die + when encountering such a commit object. With 'yes', the commit + message will be reencoded into UTF-8. With 'no', the original + encoding will be preserved. + --refspec:: Apply the specified refspec to each ref exported. Multiple of them can be specified. diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 66331fa401..0bb65b3886 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -33,6 +33,7 @@ static const char *fast_export_usage[] = { static int progress; static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT; static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = TAG_FILTERING_ABORT; +static enum { REENCODE_ABORT, REENCODE_YES, REENCODE_NO } reencode_mode = REENCODE_ABORT; static int fake_missing_tagger; static int use_done_feature; static int no_data; @@ -77,6 +78,31 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt, return 0; } +static int parse_opt_reencode_mode(const struct option *opt, + const char *arg, int unset) +{ + if (unset) { + reencode_mode = REENCODE_ABORT; + return 0; + } + + switch (git_parse_maybe_bool(arg)) { + case 0: + reencode_mode = REENCODE_NO; + break; + case 1: + reencode_mode = REENCODE_YES; + break; + default: + if (arg && !strcasecmp(arg, "abort")) + reencode_mode = REENCODE_ABORT; + else + return error("Unknown reencoding mode: %s", arg); + } + + return 0; +} + static struct decoration idnums; static uint32_t last_idnum; @@ -633,10 +659,21 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, } mark_next_object(&commit->object); - if (anonymize) + if (anonymize) { reencoded = anonymize_commit_message(message); - else if (!is_encoding_utf8(encoding)) - reencoded = reencode_string(message, "UTF-8", encoding); + } else if (encoding) { + switch(reencode_mode) { + case REENCODE_YES: + reencoded = reencode_string(message, "UTF-8", encoding); + break; + case REENCODE_NO: + break; + case REENCODE_ABORT: + die("Encountered commit-specific encoding %s in commit " + "%s; use --reencode=[yes|no] to handle it", + encoding, oid_to_hex(&commit->object.oid)); + } + } if (!commit->parents) printf("reset %s\n", refname); printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum); @@ -1091,6 +1128,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) OPT_CALLBACK(0, "tag-of-filtered-object", &tag_of_filtered_mode, N_("mode"), N_("select handling of tags that tag filtered objects"), parse_opt_tag_of_filtered_mode), + OPT_CALLBACK(0, "reencode", &reencode_mode, N_("mode"), + N_("select handling of commit messages in an alternate encoding"), + parse_opt_reencode_mode), OPT_STRING(0, "export-marks", &export_filename, N_("file"), N_("Dump marks to this file")), OPT_STRING(0, "import-marks", &import_filename, N_("file"), diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 4fd637312a..d21d7bf9a9 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -94,14 +94,14 @@ test_expect_success 'fast-export --show-original-ids | git fast-import' ' test $MUSS = $(git rev-parse --verify refs/tags/muss) ' -test_expect_success 'iso-8859-7' ' +test_expect_success 'reencoding iso-8859-7' ' test_when_finished "git reset --hard HEAD~1" && test_config i18n.commitencoding iso-8859-7 && test_tick && echo rosten >file && git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file && - git fast-export wer^..wer >iso-8859-7.fi && + git fast-export --reencode=yes wer^..wer >iso-8859-7.fi && sed "s/wer/i18n/" iso-8859-7.fi | (cd new && git fast-import && @@ -118,13 +118,45 @@ test_expect_success 'iso-8859-7' ' ! grep ^encoding actual) ' +test_expect_success 'aborting on iso-8859-7' ' + + test_when_finished "git reset --hard HEAD~1" && + test_config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file && + test_must_fail git fast-export --reencode=abort wer^..wer >iso-8859-7.fi +' + +test_expect_success 'preserving iso-8859-7' ' + + test_when_finished "git reset --hard HEAD~1" && + test_config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file && + git fast-export --reencode=no wer^..wer >iso-8859-7.fi && + sed "s/wer/i18n-no-recoding/" iso-8859-7.fi | + (cd new && + git fast-import && + # The commit object, if not re-encoded, is 240 bytes. + # Removing the "encoding iso-8859-7\n" header would drops 20 + # bytes. Re-encoding the Pi character from \xF0 (\360) in + # iso-8859-7 to \xCF\x80 (\317\200) in utf-8 adds a byte. + # Check for the expected size... + test 240 -eq "$(git cat-file -s i18n-no-recoding)" && + # ...as well as the expected byte. + git cat-file commit i18n-no-recoding >actual && + grep $(printf "\360") actual && + # Also make sure the commit has the "encoding" header + grep ^encoding actual) +' + test_expect_success 'encoding preserved if reencoding fails' ' test_when_finished "git reset --hard HEAD~1" && test_config i18n.commitencoding iso-8859-7 && echo rosten >file && git commit -s -F "$TEST_DIRECTORY/t9350/broken-iso-8859-7-commit-message.txt" file && - git fast-export wer^..wer >iso-8859-7.fi && + git fast-export --reencode=yes wer^..wer >iso-8859-7.fi && sed "s/wer/i18n-invalid/" iso-8859-7.fi | (cd new && git fast-import &&