From patchwork Mon May 13 16:47:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10941383 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DEF0814DB for ; Mon, 13 May 2019 16:47:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D1F9C27D29 for ; Mon, 13 May 2019 16:47:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C64322837D; Mon, 13 May 2019 16:47:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58ED527D29 for ; Mon, 13 May 2019 16:47:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730862AbfEMQrg (ORCPT ); Mon, 13 May 2019 12:47:36 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:38798 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728347AbfEMQrf (ORCPT ); Mon, 13 May 2019 12:47:35 -0400 Received: by mail-pf1-f195.google.com with SMTP id y2so2047674pfg.5 for ; Mon, 13 May 2019 09:47:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HiVpC7KjzW8bbMnsL9KkNQYg59lLbsFbWoXBqgEAodg=; b=IYdhf/qnK8Oi1cxWnsAcEDKKZ023b8HqJ3JuAWwSflp8/5DCQOAspsKlgwMxgSmQts y28LcmWMUngohZ3sde6AQnXCkIV/aO74yaEEm05a5DgUQLzZiMbsVAm6N4RbMpDylD6K qFwC5pzdfT83VYUclEKPFt5vUPUyioYzHcnKFy7VsN1ThfosuPMmvqoyUGMgM84JVxvG XBdoFeuipn1ZQWn4DyzLHs2G6Fk7sV1AcUhq2UPN/sBSfi7tm1ySI4GonOYfekxHiY92 25i5kaO7xQgvCfJGdOdwIt91pSeUbqY/pxfo7rh+mmM3KQi136BXX5PtLR845yaYY+l0 aQvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HiVpC7KjzW8bbMnsL9KkNQYg59lLbsFbWoXBqgEAodg=; b=GIU7PuacXQiq02465VZS4HFeQedF3Eu+GQe4PyY6Cn+tnUxcpLWZpy4M1SmURZwTOI VvJGMoP67s9QlARSdBt7BYQi45oP9EslxwQKb95AL+1MtNalIafqKy9CjAw2XIvlELbj CVAA/L4m/jML3GtE1TAkN6imETWO84pI8stwrNk6u4QGLI/VgtfFZjNTGADGimByiuVE xP2tmIKT/WZOhOrjLoOdbfUosBBQimTJ4/MKBZRgoZy71zg/hQRT3c05pjJoxZUIF3Ab FuppB+VcnWejmgH3hX5kpiK7SjzsQgzOpFo4+92IsyVBPWYgfc5MDrsRJEmHJe9jofu0 Y7Qg== X-Gm-Message-State: APjAAAUdmlUTgJtmyX3KaGSbuBSjJZk+eR5tJonibx0RJmH7bZUwEyOH rleJH6gX1DzFQGxsEg/pCks= X-Google-Smtp-Source: APXvYqzr1NW1lpGgYzl60iASowtKtvdol+RcxXcQhE6jtEuf/BOLazhZPRwRC4wVghFowyOdJsicUg== X-Received: by 2002:a63:d816:: with SMTP id b22mr31935307pgh.16.1557766054434; Mon, 13 May 2019 09:47:34 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id n35sm2851206pgl.44.2019.05.13.09.47.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 13 May 2019 09:47:33 -0700 (PDT) From: Elijah Newren To: Junio C Hamano Cc: git@vger.kernel.org, Eric Sunshine , Johannes Schindelin , Johannes Sixt , =?utf-8?q?Torsten_B=C3=B6gershausen?= , Elijah Newren Subject: [PATCH v4 2/5] fast-import: support 'encoding' commit header Date: Mon, 13 May 2019 09:47:19 -0700 Message-Id: <20190513164722.31534-3-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.g571613a09e In-Reply-To: <20190513164722.31534-1-newren@gmail.com> References: <20190510205335.19968-1-newren@gmail.com> <20190513164722.31534-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since git supports commit messages with an encoding other than utf-8, allow fast-import to import such commits. This may be useful for folks who do not want to reencode commit messages from an external system, and may also be useful to achieve reversible history rewrites (e.g. sha1sum <-> sha256sum transitions or subtree work) with git repositories that have used specialized encodings in their commit history. Signed-off-by: Elijah Newren --- Documentation/git-fast-import.txt | 7 +++++++ fast-import.c | 11 +++++++++-- t/t9300-fast-import.sh | 20 ++++++++++++++++++++ 3 files changed, 36 insertions(+), 2 deletions(-) diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index d65cdb3d08..7baf9e47b5 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -388,6 +388,7 @@ change to the project. original-oid? ('author' (SP )? SP LT GT SP LF)? 'committer' (SP )? SP LT GT SP LF + ('encoding' SP )? data ('from' SP LF)? ('merge' SP LF)? @@ -455,6 +456,12 @@ that was selected by the --date-format= command-line option. See ``Date Formats'' above for the set of supported formats, and their syntax. +`encoding` +^^^^^^^^^^ +The optional `encoding` command indicates the encoding of the commit +message. Most commits are UTF-8 and the encoding is omitted, but this +allows importing commit messages into git without first reencoding them. + `from` ^^^^^^ The `from` command is used to specify the commit to initialize diff --git a/fast-import.c b/fast-import.c index f38d04fa58..76a7bd3699 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2585,6 +2585,7 @@ static void parse_new_commit(const char *arg) struct branch *b; char *author = NULL; char *committer = NULL; + const char *encoding = NULL; struct hash_list *merge_list = NULL; unsigned int merge_count; unsigned char prev_fanout, new_fanout; @@ -2607,6 +2608,8 @@ static void parse_new_commit(const char *arg) } if (!committer) die("Expected committer but didn't get one"); + if (skip_prefix(command_buf.buf, "encoding ", &encoding)) + read_next_command(); parse_data(&msg, 0, NULL); read_next_command(); parse_from(b); @@ -2670,9 +2673,13 @@ static void parse_new_commit(const char *arg) } strbuf_addf(&new_data, "author %s\n" - "committer %s\n" - "\n", + "committer %s\n", author ? author : committer, committer); + if (encoding) + strbuf_addf(&new_data, + "encoding %s\n", + encoding); + strbuf_addch(&new_data, '\n'); strbuf_addbuf(&new_data, &msg); free(author); free(committer); diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 3668263c40..141b7fa35e 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3299,4 +3299,24 @@ test_expect_success !MINGW 'W: get-mark & empty orphan commit with erroneous thi sed -e s/LFs/LLL/ W-input | tr L "\n" | test_must_fail git fast-import ' +### +### series X (other new features) +### + +test_expect_success 'X: handling encoding' ' + test_tick && + cat >input <<-INPUT_END && + commit refs/heads/encoding + committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE + encoding iso-8859-7 + data <>input && + + git fast-import X-Patchwork-Id: 10941379 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 604881398 for ; Mon, 13 May 2019 16:47:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5112B27D29 for ; Mon, 13 May 2019 16:47:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 452522837D; Mon, 13 May 2019 16:47:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC27F27D29 for ; Mon, 13 May 2019 16:47:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730925AbfEMQrh (ORCPT ); Mon, 13 May 2019 12:47:37 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:40232 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730880AbfEMQrh (ORCPT ); Mon, 13 May 2019 12:47:37 -0400 Received: by mail-pf1-f194.google.com with SMTP id u17so7496897pfn.7 for ; Mon, 13 May 2019 09:47:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xVHQaSOJgVC/6fi+624umdLrXXhYdp2u3rHBvH1Us1M=; b=vMnORXPcvLLEwP4uGh8x7MTBGOhIaoC6/fgkxt9IgZ+CgOun2wktddRQP93fpvyVnR CWKvAjRfdL+A6CRrR6HKbZiGztuhRNlgRIZPbu7ww3bU3mtUENzZfup+bbx9zAAnXthU 3SsqO7oXLe6T5Ofc+cidbbTTho30oJqJwmHOZnUXs5Lc4bCgfXGDM3UwBZNYXOv3+Lgz q95Sl0mwz9gKkfVlPStlklOT7i+yXdvwd/w03RHnMDy3xgRL/BappDLOE4isLcxJsggA GL0OlDT19BqHQTPhqmMK2f5TDBQtNOAeaIPC0oSK2LJiUqieFzf6HluxJqvQZ7vQQpzA xanw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xVHQaSOJgVC/6fi+624umdLrXXhYdp2u3rHBvH1Us1M=; b=ena87z+yeNvX2AFGICrOL+VDqKkVBUHMJP4x1cHWsfpnShODZihqRmRGKNTrBkQW/M BkWu7sAJAjL1NSlEM/LdJDJAfth6I3rODVYhMeoJLnIj/F7ZGFdAMjFigy5TRWdUC34j 83yqaYpjtCrPQ3wSe3Rwicbu5/YuMIvc7v/3yPsuvixk4uBaw4WD1/BSTnSqnRiQm6cK Z/r5WxH6IYzabcBtKg/bVtPPiAbZ/q7DHF3JNMhHDD3V1KiHEoBCA3m3oSwtDz30nzOu x7HP59YlfmMi/Z+EtTnAUT2s8vsGUA6QX3KOwtU8Su3FOhK5+uDFsXgkbdkYAEPBdaqW ogdQ== X-Gm-Message-State: APjAAAUMZtARt3MvmPD0FE8/EwT4Ml9WtEcQhYWD0RyLBTjSBDNQh4vY k4bl542qxTzEdEN7A6QRFOk= X-Google-Smtp-Source: APXvYqwzEoaT2VZ90LhqwLcGClFcv4HxM/yupURRaftCOBdZ4c6FUNUHP8tJ2DDzvMsFqw0+Cp9+wg== X-Received: by 2002:a63:1d09:: with SMTP id d9mr32448982pgd.289.1557766056659; Mon, 13 May 2019 09:47:36 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id n35sm2851206pgl.44.2019.05.13.09.47.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 13 May 2019 09:47:36 -0700 (PDT) From: Elijah Newren To: Junio C Hamano Cc: git@vger.kernel.org, Eric Sunshine , Johannes Schindelin , Johannes Sixt , =?utf-8?q?Torsten_B=C3=B6gershausen?= , Elijah Newren Subject: [PATCH v4 4/5] fast-export: differentiate between explicitly utf-8 and implicitly utf-8 Date: Mon, 13 May 2019 09:47:21 -0700 Message-Id: <20190513164722.31534-5-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.g571613a09e In-Reply-To: <20190513164722.31534-1-newren@gmail.com> References: <20190510205335.19968-1-newren@gmail.com> <20190513164722.31534-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The find_encoding() function returned the encoding used by a commit message, returning a default of git_commit_encoding (usually utf-8). Although the current code does not differentiate between a commit which explicitly requested utf-8 and one where we just assume utf-8 because no encoding is set, it will become important when we try to preserve the encoding header. Since is_encoding_utf8() returns true when passed NULL, we can just return NULL from find_encoding() instead of returning git_commit_encoding. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 7734a9f5a5..66331fa401 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -453,7 +453,7 @@ static const char *find_encoding(const char *begin, const char *end) bol = memmem(begin, end ? end - begin : strlen(begin), needle, strlen(needle)); if (!bol) - return git_commit_encoding; + return NULL; bol += strlen(needle); eol = strchrnul(bol, '\n'); *eol = '\0'; From patchwork Mon May 13 16:47:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Elijah Newren X-Patchwork-Id: 10941381 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3F7641398 for ; Mon, 13 May 2019 16:47:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 31CED27D29 for ; Mon, 13 May 2019 16:47:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 25A732837D; Mon, 13 May 2019 16:47:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7DD3E27D29 for ; Mon, 13 May 2019 16:47:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730937AbfEMQrj (ORCPT ); Mon, 13 May 2019 12:47:39 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:37025 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730880AbfEMQri (ORCPT ); Mon, 13 May 2019 12:47:38 -0400 Received: by mail-pf1-f193.google.com with SMTP id g3so7505588pfi.4 for ; Mon, 13 May 2019 09:47:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=J4QLb3ZrRXepGAxpeLLj422ypaKsBEymgUbJeWP4+zY=; b=Vdqu212GT5J2i2dtU0t5wYcBg6E90jLg19quNm0Ht0nBHWDt+YIzItmVfp/nc4mnuT VuPdJHXfXOojL9NY6UZQAFTWzUKoWqKNjFiJU2E0X7T4Eu3i8z8LKmbV3d+xPHLdLsfG p4WrygW0BZn86WOn2LzmEnnBY6eFWkaOAadcFtRIBnUgRbXyPaptQYkVXVE2HiWu0zUU 0AcYENsExSek7S0hMRvCbVf1W7ps/ps9XCmgvpkkKguUOX32dUWfP0w5ppNqd4QfGn8W TO40zIBYlcEmUXlanyLTRC/HHJHg0AC8Qwk8NHEhOqs4kcZ1+hCuVWlEHjdF/S4Fux/c 12ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=J4QLb3ZrRXepGAxpeLLj422ypaKsBEymgUbJeWP4+zY=; b=kQA3A+ERUtgdZSnmEl3u0A3DSVp6YZ3p0DQJ2/cPrfhNHLeR4zV+oYuKqXSpmK/rlz 6e22ap1XYz6EEvJ4OCN2znvixfzdZvbJOX34nm/Zdp+DtOKqmkP8gVBP8SrfwtuRlVmr Ro1xLFYHNDiIDPN04eK6q3QGCLq3NXb5ry20PGnvCzBLdN6TngIg4LrpLmhWyJh4WlB1 KaCQ+rMEeayedIY0nrvacGfmgZaSUHF9Jg77qkjKTtYaJGvhazjweLgmCpEyR+bzvOuO F8b+6x7/OJaKH83BMI54SuDowvunKcttQaAe0nMxy8PPnI6b/FOzekc4nRuGdd2IFY8/ zBzg== X-Gm-Message-State: APjAAAWpYfNvREQ2tKlNe9ZlmkpJWtlxH9Ks67Y9F5NwYKgd1MdC4X8m YEArxfrfkWS1lkHjKiDMFVg= X-Google-Smtp-Source: APXvYqzY9dfMg4XI4NuFxbUogM7cFXuBmY6VGOCTWqYN8NAOK7tNDVFLzRvzYGBjP44KebU77PjCYA== X-Received: by 2002:a63:9214:: with SMTP id o20mr32469267pgd.203.1557766057896; Mon, 13 May 2019 09:47:37 -0700 (PDT) Received: from newren2-linux.yojoe.local ([8.4.231.67]) by smtp.gmail.com with ESMTPSA id n35sm2851206pgl.44.2019.05.13.09.47.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 13 May 2019 09:47:37 -0700 (PDT) From: Elijah Newren To: Junio C Hamano Cc: git@vger.kernel.org, Eric Sunshine , Johannes Schindelin , Johannes Sixt , =?utf-8?q?Torsten_B=C3=B6gershausen?= , Elijah Newren Subject: [PATCH v4 5/5] fast-export: do automatic reencoding of commit messages only if requested Date: Mon, 13 May 2019 09:47:22 -0700 Message-Id: <20190513164722.31534-6-newren@gmail.com> X-Mailer: git-send-email 2.21.0.782.g571613a09e In-Reply-To: <20190513164722.31534-1-newren@gmail.com> References: <20190510205335.19968-1-newren@gmail.com> <20190513164722.31534-1-newren@gmail.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Automatic re-encoding of commit messages (and dropping of the encoding header) hurts attempts to do reversible history rewrites (e.g. sha1sum <-> sha256sum transitions, some subtree rewrites), and seems inconsistent with the general principle followed elsewhere in fast-export of requiring explicit user requests to modify the output (e.g. --signed-tags=strip, --tag-of-filtered-object=rewrite). Add a --reencode flag that the user can use to specify, and like other fast-export flags, default it to 'abort'. Signed-off-by: Elijah Newren --- builtin/fast-export.c | 35 ++++++++++++++++++++++++++++++++--- t/t9350-fast-export.sh | 38 +++++++++++++++++++++++++++++++++++--- 2 files changed, 67 insertions(+), 6 deletions(-) diff --git a/builtin/fast-export.c b/builtin/fast-export.c index 66331fa401..4906b23248 100644 --- a/builtin/fast-export.c +++ b/builtin/fast-export.c @@ -33,6 +33,7 @@ static const char *fast_export_usage[] = { static int progress; static enum { SIGNED_TAG_ABORT, VERBATIM, WARN, WARN_STRIP, STRIP } signed_tag_mode = SIGNED_TAG_ABORT; static enum { TAG_FILTERING_ABORT, DROP, REWRITE } tag_of_filtered_mode = TAG_FILTERING_ABORT; +static enum { REENCODE_ABORT, REENCODE_YES, REENCODE_NO } reencode_mode = REENCODE_ABORT; static int fake_missing_tagger; static int use_done_feature; static int no_data; @@ -77,6 +78,20 @@ static int parse_opt_tag_of_filtered_mode(const struct option *opt, return 0; } +static int parse_opt_reencode_mode(const struct option *opt, + const char *arg, int unset) +{ + if (unset || !strcmp(arg, "abort")) + reencode_mode = REENCODE_ABORT; + else if (!strcmp(arg, "yes") || !strcmp(arg, "true") || !strcmp(arg, "on")) + reencode_mode = REENCODE_YES; + else if (!strcmp(arg, "no") || !strcmp(arg, "false") || !strcmp(arg, "off")) + reencode_mode = REENCODE_NO; + else + return error("Unknown reencoding mode: %s", arg); + return 0; +} + static struct decoration idnums; static uint32_t last_idnum; @@ -633,10 +648,21 @@ static void handle_commit(struct commit *commit, struct rev_info *rev, } mark_next_object(&commit->object); - if (anonymize) + if (anonymize) { reencoded = anonymize_commit_message(message); - else if (!is_encoding_utf8(encoding)) - reencoded = reencode_string(message, "UTF-8", encoding); + } else if (encoding) { + switch(reencode_mode) { + case REENCODE_YES: + reencoded = reencode_string(message, "UTF-8", encoding); + break; + case REENCODE_NO: + break; + case REENCODE_ABORT: + die("Encountered commit-specific encoding %s in commit " + "%s; use --reencode=[yes|no] to handle it", + encoding, oid_to_hex(&commit->object.oid)); + } + } if (!commit->parents) printf("reset %s\n", refname); printf("commit %s\nmark :%"PRIu32"\n", refname, last_idnum); @@ -1091,6 +1117,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) OPT_CALLBACK(0, "tag-of-filtered-object", &tag_of_filtered_mode, N_("mode"), N_("select handling of tags that tag filtered objects"), parse_opt_tag_of_filtered_mode), + OPT_CALLBACK(0, "reencode", &reencode_mode, N_("mode"), + N_("select handling of commit messages in an alternate encoding"), + parse_opt_reencode_mode), OPT_STRING(0, "export-marks", &export_filename, N_("file"), N_("Dump marks to this file")), OPT_STRING(0, "import-marks", &import_filename, N_("file"), diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh index 4fd637312a..d21d7bf9a9 100755 --- a/t/t9350-fast-export.sh +++ b/t/t9350-fast-export.sh @@ -94,14 +94,14 @@ test_expect_success 'fast-export --show-original-ids | git fast-import' ' test $MUSS = $(git rev-parse --verify refs/tags/muss) ' -test_expect_success 'iso-8859-7' ' +test_expect_success 'reencoding iso-8859-7' ' test_when_finished "git reset --hard HEAD~1" && test_config i18n.commitencoding iso-8859-7 && test_tick && echo rosten >file && git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file && - git fast-export wer^..wer >iso-8859-7.fi && + git fast-export --reencode=yes wer^..wer >iso-8859-7.fi && sed "s/wer/i18n/" iso-8859-7.fi | (cd new && git fast-import && @@ -118,13 +118,45 @@ test_expect_success 'iso-8859-7' ' ! grep ^encoding actual) ' +test_expect_success 'aborting on iso-8859-7' ' + + test_when_finished "git reset --hard HEAD~1" && + test_config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file && + test_must_fail git fast-export --reencode=abort wer^..wer >iso-8859-7.fi +' + +test_expect_success 'preserving iso-8859-7' ' + + test_when_finished "git reset --hard HEAD~1" && + test_config i18n.commitencoding iso-8859-7 && + echo rosten >file && + git commit -s -F "$TEST_DIRECTORY/t9350/simple-iso-8859-7-commit-message.txt" file && + git fast-export --reencode=no wer^..wer >iso-8859-7.fi && + sed "s/wer/i18n-no-recoding/" iso-8859-7.fi | + (cd new && + git fast-import && + # The commit object, if not re-encoded, is 240 bytes. + # Removing the "encoding iso-8859-7\n" header would drops 20 + # bytes. Re-encoding the Pi character from \xF0 (\360) in + # iso-8859-7 to \xCF\x80 (\317\200) in utf-8 adds a byte. + # Check for the expected size... + test 240 -eq "$(git cat-file -s i18n-no-recoding)" && + # ...as well as the expected byte. + git cat-file commit i18n-no-recoding >actual && + grep $(printf "\360") actual && + # Also make sure the commit has the "encoding" header + grep ^encoding actual) +' + test_expect_success 'encoding preserved if reencoding fails' ' test_when_finished "git reset --hard HEAD~1" && test_config i18n.commitencoding iso-8859-7 && echo rosten >file && git commit -s -F "$TEST_DIRECTORY/t9350/broken-iso-8859-7-commit-message.txt" file && - git fast-export wer^..wer >iso-8859-7.fi && + git fast-export --reencode=yes wer^..wer >iso-8859-7.fi && sed "s/wer/i18n-invalid/" iso-8859-7.fi | (cd new && git fast-import &&