[v6,9/9] sequencer: fallback to sane label in making rebase todo list
diff mbox series

Message ID 78daf050de8c0cdc81fed4adc8fef92d1768c63a.1573452046.git.congdanhqx@gmail.com
State New
Headers show
Series
  • sequencer: handle other encoding better
Related show

Commit Message

Danh Doan Nov. 11, 2019, 6:03 a.m. UTC
In later stage of rebasing, we will make a ref in
refs/rewritten/<label>, this label is extracted from the subject of
current merge commit.

If that subject has forbidden character for refname, git couldn't create
the rewritten ref. One such case is the merge commit subject has
a multibyte character encoded in ISO-2022-JP.

Provide a sane fallback in order to help git-rebase works in such case

Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com>
---
 sequencer.c            | 11 +++++++++--
 t/t3434-rebase-i18n.sh |  2 +-
 2 files changed, 10 insertions(+), 3 deletions(-)

Comments

Jeff King Nov. 11, 2019, 8:39 a.m. UTC | #1
On Mon, Nov 11, 2019 at 01:03:42PM +0700, Doan Tran Cong Danh wrote:

> In later stage of rebasing, we will make a ref in
> refs/rewritten/<label>, this label is extracted from the subject of
> current merge commit.
> 
> If that subject has forbidden character for refname, git couldn't create
> the rewritten ref. One such case is the merge commit subject has
> a multibyte character encoded in ISO-2022-JP.
> 
> Provide a sane fallback in order to help git-rebase works in such case

Good find. Not having worked much with this part of the sequencer code,
I don't have a strong opinion on the fallback label. But it seems better
than the current behavior would be.

I suspect there may be other subtle problems, too, for filesystems that
can't represent some part of the subject (e.g., I wonder if some of your
ISO-2022-JP tests might already have trouble on HFS+, which excepts all
paths to be UTF-8).

> @@ -4607,9 +4608,15 @@ static int make_script_with_merges(struct pretty_print_context *pp,
>  			if (isspace(*p1))
>  				*(char *)p1 = '-';
>  
> +		hex_oid = oid_to_hex(&commit->object.oid);
> +
> +		if (check_refname_format(label.buf, REFNAME_ALLOW_ONELEVEL) < 0) {
> +			strbuf_reset(&label);
> +			strbuf_addf(&label, "label-%s", hex_oid);
> +		}
> +
>  		strbuf_reset(&buf);
> -		strbuf_addf(&buf, "%s -C %s",
> -			    cmd_merge, oid_to_hex(&commit->object.oid));
> +		strbuf_addf(&buf, "%s -C %s", cmd_merge, hex_oid);

A minor nit, but I think it's better here to just call oid_to_hex()
twice, rather than assigning to the hex_oid variable. It's returning a
pointer to a static buffer, so holding onto the pointers for too long is
dangerous. What you have here is definitely OK, because nothing
interesting happens in between, but seeing any assignment of the result
of "oid_to_hex" makes auditing harder.

And it's not like it's that expensive, or that this is
performance-critical code (and if it were, we'd do better to use
oid_to_hex_r() directly into the strbuf anyway).

-Peff
Phillip Wood Nov. 11, 2019, 4:22 p.m. UTC | #2
Hi Doan & Peff

On 11/11/2019 08:39, Jeff King wrote:
> On Mon, Nov 11, 2019 at 01:03:42PM +0700, Doan Tran Cong Danh wrote:
> 
>> In later stage of rebasing, we will make a ref in
>> refs/rewritten/<label>, this label is extracted from the subject of
>> current merge commit.
>>
>> If that subject has forbidden character for refname, git couldn't create
>> the rewritten ref. One such case is the merge commit subject has
>> a multibyte character encoded in ISO-2022-JP.
>>
>> Provide a sane fallback in order to help git-rebase works in such case
> 
> Good find. Not having worked much with this part of the sequencer code,
> I don't have a strong opinion on the fallback label. But it seems better
> than the current behavior would be.

There's been some discussion about sanitizing the subject recently 
[1,2]. It would be good to co-ordinate this effort with that one

Best Wishes

Phillip

[1] 
https://public-inbox.org/git/4a02c38442dd8a4c0381adc8db0dce81c253da09.1567432900.git.gitgitgadget@gmail.com
[2] 
https://public-inbox.org/git/nycvar.QRO.7.76.6.1910251508100.46@tvgsbejvaqbjf.bet

> I suspect there may be other subtle problems, too, for filesystems that
> can't represent some part of the subject (e.g., I wonder if some of your
> ISO-2022-JP tests might already have trouble on HFS+, which excepts all
> paths to be UTF-8).
> 
>> @@ -4607,9 +4608,15 @@ static int make_script_with_merges(struct pretty_print_context *pp,
>>   			if (isspace(*p1))
>>   				*(char *)p1 = '-';
>>   
>> +		hex_oid = oid_to_hex(&commit->object.oid);
>> +
>> +		if (check_refname_format(label.buf, REFNAME_ALLOW_ONELEVEL) < 0) {
>> +			strbuf_reset(&label);
>> +			strbuf_addf(&label, "label-%s", hex_oid);
>> +		}
>> +
>>   		strbuf_reset(&buf);
>> -		strbuf_addf(&buf, "%s -C %s",
>> -			    cmd_merge, oid_to_hex(&commit->object.oid));
>> +		strbuf_addf(&buf, "%s -C %s", cmd_merge, hex_oid);
> 
> A minor nit, but I think it's better here to just call oid_to_hex()
> twice, rather than assigning to the hex_oid variable. It's returning a
> pointer to a static buffer, so holding onto the pointers for too long is
> dangerous. What you have here is definitely OK, because nothing
> interesting happens in between, but seeing any assignment of the result
> of "oid_to_hex" makes auditing harder.
> 
> And it's not like it's that expensive, or that this is
> performance-critical code (and if it were, we'd do better to use
> oid_to_hex_r() directly into the strbuf anyway).
> 
> -Peff
>
Johannes Schindelin Nov. 11, 2019, 6:26 p.m. UTC | #3
Hi,

On Mon, 11 Nov 2019, Doan Tran Cong Danh wrote:

> In later stage of rebasing, we will make a ref in
> refs/rewritten/<label>, this label is extracted from the subject of
> current merge commit.
>
> If that subject has forbidden character for refname, git couldn't create
> the rewritten ref. One such case is the merge commit subject has
> a multibyte character encoded in ISO-2022-JP.
>
> Provide a sane fallback in order to help git-rebase works in such case
>
> Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com>
> ---

There was an alternative patch under discussion, one that also addressed
problem like e.g. characters that are illegal in Windows file names:
https://github.com/gitgitgadget/git/pull/327

Unfortunately, the reviews asked for an extensive revision for which I have
not been able to find the time to implement them.

Having said that, I think that the solution proposed over there is more
complete, and maybe even more robust (nothing in your patch prevents
`label-<hex-oid>` to _already_ having been taken by another label).

Ciao,
Dscho

>  sequencer.c            | 11 +++++++++--
>  t/t3434-rebase-i18n.sh |  2 +-
>  2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index 2f32b44f52..f664ba727e 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4560,6 +4560,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
>  	while ((commit = get_revision(revs))) {
>  		struct commit_list *to_merge;
>  		const char *p1, *p2;
> +		const char *hex_oid;
>  		struct object_id *oid;
>  		int is_empty;
>
> @@ -4607,9 +4608,15 @@ static int make_script_with_merges(struct pretty_print_context *pp,
>  			if (isspace(*p1))
>  				*(char *)p1 = '-';
>
> +		hex_oid = oid_to_hex(&commit->object.oid);
> +
> +		if (check_refname_format(label.buf, REFNAME_ALLOW_ONELEVEL) < 0) {
> +			strbuf_reset(&label);
> +			strbuf_addf(&label, "label-%s", hex_oid);
> +		}
> +
>  		strbuf_reset(&buf);
> -		strbuf_addf(&buf, "%s -C %s",
> -			    cmd_merge, oid_to_hex(&commit->object.oid));
> +		strbuf_addf(&buf, "%s -C %s", cmd_merge, hex_oid);
>
>  		/* label the tips of merged branches */
>  		for (; to_merge; to_merge = to_merge->next) {
> diff --git a/t/t3434-rebase-i18n.sh b/t/t3434-rebase-i18n.sh
> index 4b5b128cd6..c7c835cde9 100755
> --- a/t/t3434-rebase-i18n.sh
> +++ b/t/t3434-rebase-i18n.sh
> @@ -45,7 +45,7 @@ test_expect_success 'rebase --rebase-merges update encoding eucJP to UTF-8' '
>  	compare_msg eucJP.txt eucJP UTF-8
>  '
>
> -test_expect_failure 'rebase --rebase-merges update encoding eucJP to ISO-2022-JP' '
> +test_expect_success 'rebase --rebase-merges update encoding eucJP to ISO-2022-JP' '
>  	git switch -c merge-eucJP-ISO-2022-JP first &&
>  	git config i18n.commitencoding eucJP &&
>  	git merge -F "$TEST_DIRECTORY/t3434/eucJP.txt" second &&
> --
> 2.24.0.164.g78daf050de.dirty
>
>
Junio C Hamano Nov. 12, 2019, 4:17 a.m. UTC | #4
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Having said that, I think that the solution proposed over there is more
> complete, and maybe even more robust (nothing in your patch prevents
> `label-<hex-oid>` to _already_ having been taken by another label).

Good thinking.  Let me drop this last one from the series in the
meantime.

Thanks.

Patch
diff mbox series

diff --git a/sequencer.c b/sequencer.c
index 2f32b44f52..f664ba727e 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4560,6 +4560,7 @@  static int make_script_with_merges(struct pretty_print_context *pp,
 	while ((commit = get_revision(revs))) {
 		struct commit_list *to_merge;
 		const char *p1, *p2;
+		const char *hex_oid;
 		struct object_id *oid;
 		int is_empty;
 
@@ -4607,9 +4608,15 @@  static int make_script_with_merges(struct pretty_print_context *pp,
 			if (isspace(*p1))
 				*(char *)p1 = '-';
 
+		hex_oid = oid_to_hex(&commit->object.oid);
+
+		if (check_refname_format(label.buf, REFNAME_ALLOW_ONELEVEL) < 0) {
+			strbuf_reset(&label);
+			strbuf_addf(&label, "label-%s", hex_oid);
+		}
+
 		strbuf_reset(&buf);
-		strbuf_addf(&buf, "%s -C %s",
-			    cmd_merge, oid_to_hex(&commit->object.oid));
+		strbuf_addf(&buf, "%s -C %s", cmd_merge, hex_oid);
 
 		/* label the tips of merged branches */
 		for (; to_merge; to_merge = to_merge->next) {
diff --git a/t/t3434-rebase-i18n.sh b/t/t3434-rebase-i18n.sh
index 4b5b128cd6..c7c835cde9 100755
--- a/t/t3434-rebase-i18n.sh
+++ b/t/t3434-rebase-i18n.sh
@@ -45,7 +45,7 @@  test_expect_success 'rebase --rebase-merges update encoding eucJP to UTF-8' '
 	compare_msg eucJP.txt eucJP UTF-8
 '
 
-test_expect_failure 'rebase --rebase-merges update encoding eucJP to ISO-2022-JP' '
+test_expect_success 'rebase --rebase-merges update encoding eucJP to ISO-2022-JP' '
 	git switch -c merge-eucJP-ISO-2022-JP first &&
 	git config i18n.commitencoding eucJP &&
 	git merge -F "$TEST_DIRECTORY/t3434/eucJP.txt" second &&