From patchwork Mon May 13 23:17:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10941821 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 73ADB912 for ; Mon, 13 May 2019 23:17:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 62B5D28355 for ; Mon, 13 May 2019 23:17:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56B9C283A8; Mon, 13 May 2019 23:17:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC83228355 for ; Mon, 13 May 2019 23:17:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726594AbfEMXRi (ORCPT ); Mon, 13 May 2019 19:17:38 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:37096 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726233AbfEMXRh (ORCPT ); Mon, 13 May 2019 19:17:37 -0400 Received: by mail-pl1-f194.google.com with SMTP id p15so7220306pll.4 for ; Mon, 13 May 2019 16:17:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HXgk7z/58HyeVsQxLdOaabgOZbmBoCFGAHLXgiHTEic=; b=KUguyJwvqGKQnLzLZ5fmCeKQqJJtzsb44F3attXj9tGFy+BtmMNRKpJrZ2bk23VARU kUllwGRjQXrDQabIOe7oQuLOVv4N1N6WwLRWJgVQlmDNK1bQd9AbdW0YSRbzFzp/4u9e 2GwnFPOTvH8aQjiRMJRdcf7uuyjRqcsDSfYmwrlxNlHiUFtuZpzUjux9/nxSQrD9+sna fXLRqSyx899hARWqqopASaoXIkz7b3DN/5JKtgGZCUWSuszP/j9+48rGt5fnIQEeaTLN sdb5dbt+X1wP53xEwJk7B3e/AuUjx7lay8GFvbhnODX/WnQmVKFC1NiVckN+/zhh8ZFt ZPcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HXgk7z/58HyeVsQxLdOaabgOZbmBoCFGAHLXgiHTEic=; b=JYTTjUUC+cvlpZhn0yjlKjA/xgHcNUmlimsNc19RP1adev8Chz6imxJthDtXytGexQ G1e3o3MGpF+XwOjcERQ8kbPM0ITsmpCLdrdhRTdrUyg5pAyFKQWEo+27hkRULhPBVi0E YDZZ9qii2RnOE9AM+3UlzcXUVA+35cyLhpRb4HzQ57y4z1tn0rT+UW6SDxrBj2sPeFxz rNaY7rkDBKJTbOVswXnXZZexxMjtoFqa5mi24IBW4jyB2MkDBC3WepqKP5jrzEvQ9D/j IJT47yA9MepjYLoJjHp/f48IQsYh+LWxBtz/Gp9+L+eZaguERsw4ifbPdNm3engtjfsM GaeQ== X-Gm-Message-State: APjAAAUKB91hYIwu+rYuDplRR+FYuOL8/f6kYm+shMqGKB/mHrENgqcE 3hRja0DVqs3XnWb1ee7F1DE= X-Google-Smtp-Source: APXvYqyYbZAbiYO743qWwQMon0GUx0TStM2DTmiPwx/ld+SOXZ73KWhrD8ntP3Ex1l5esAVb/LVwbg== X-Received: by 2002:a17:902:be0e:: with SMTP id r14mr16706408pls.152.1557789456743; Mon, 13 May 2019 16:17:36 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id g10sm30664307pfg.153.2019.05.13.16.17.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 13 May 2019 16:17:35 -0700 (PDT) From: Elijah Newren To: Junio C Hamano Cc: git@vger.kernel.org, Eric Sunshine , Johannes Schindelin , Johannes Sixt , =?utf-8?q?Torsten_B=C3=B6gershausen?= , Elijah Newren Subject: [PATCH v5 2/5] fast-import: support 'encoding' commit header Date: Mon, 13 May 2019 16:17:23 -0700 Message-Id: <20190513231726.16218-3-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.gd8be4ee826 In-Reply-To: <20190513231726.16218-1-newren@gmail.com> References: <20190513164722.31534-1-newren@gmail.com> <20190513231726.16218-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since git supports commit messages with an encoding other than utf-8, allow fast-import to import such commits. This may be useful for folks who do not want to reencode commit messages from an external system, and may also be useful to achieve reversible history rewrites (e.g. sha1sum <-> sha256sum transitions or subtree work) with git repositories that have used specialized encodings in their commit history. Signed-off-by: Elijah Newren --- Documentation/git-fast-import.txt | 7 +++++++ fast-import.c | 11 +++++++++-- t/t9300-fast-import.sh | 20 ++++++++++++++++++++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index d65cdb3d08..7baf9e47b5 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -388,6 +388,7 @@ change to the project. original-oid? ('author' (SP )? SP LT GT SP LF)? 'committer' (SP )? SP LT GT SP LF + ('encoding' SP )? data ('from' SP LF)? ('merge' SP LF)? @@ -455,6 +456,12 @@ that was selected by the --date-format= command-line option. See ``Date Formats'' above for the set of supported formats, and their syntax. +`encoding` +^^^^^^^^^^ +The optional `encoding` command indicates the encoding of the commit +message. Most commits are UTF-8 and the encoding is omitted, but this +allows importing commit messages into git without first reencoding them. + `from` ^^^^^^ The `from` command is used to specify the commit to initialize diff --git a/fast-import.c b/fast-import.c index f38d04fa58..76a7bd3699 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2585,6 +2585,7 @@ static void parse_new_commit(const char *arg) struct branch *b; char *author = NULL; char *committer = NULL; + const char *encoding = NULL; struct hash_list *merge_list = NULL; unsigned int merge_count; unsigned char prev_fanout, new_fanout; @@ -2607,6 +2608,8 @@ static void parse_new_commit(const char *arg) } if (!committer) die("Expected committer but didn't get one"); + if (skip_prefix(command_buf.buf, "encoding ", &encoding)) + read_next_command(); parse_data(&msg, 0, NULL); read_next_command(); parse_from(b); @@ -2670,9 +2673,13 @@ static void parse_new_commit(const char *arg) } strbuf_addf(&new_data, "author %s\n" - "committer %s\n" - "\n", + "committer %s\n", author ? author : committer, committer); + if (encoding) + strbuf_addf(&new_data, + "encoding %s\n", + encoding); + strbuf_addch(&new_data, '\n'); strbuf_addbuf(&new_data, &msg); free(author); free(committer); diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 3668263c40..141b7fa35e 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3299,4 +3299,24 @@ test_expect_success !MINGW 'W: get-mark & empty orphan commit with erroneous thi sed -e s/LFs/LLL/ W-input | tr L "\n" | test_must_fail git fast-import ' +### +### series X (other new features) +### + +test_expect_success 'X: handling encoding' ' + test_tick && + cat >input <<-INPUT_END && + commit refs/heads/encoding + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE + encoding iso-8859-7 + data <>input && + + git fast-import