Message ID | 78daf050de8c0cdc81fed4adc8fef92d1768c63a.1573452046.git.congdanhqx@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | sequencer: handle other encoding better | expand |
On Mon, Nov 11, 2019 at 01:03:42PM +0700, Doan Tran Cong Danh wrote: > In later stage of rebasing, we will make a ref in > refs/rewritten/<label>, this label is extracted from the subject of > current merge commit. > > If that subject has forbidden character for refname, git couldn't create > the rewritten ref. One such case is the merge commit subject has > a multibyte character encoded in ISO-2022-JP. > > Provide a sane fallback in order to help git-rebase works in such case Good find. Not having worked much with this part of the sequencer code, I don't have a strong opinion on the fallback label. But it seems better than the current behavior would be. I suspect there may be other subtle problems, too, for filesystems that can't represent some part of the subject (e.g., I wonder if some of your ISO-2022-JP tests might already have trouble on HFS+, which excepts all paths to be UTF-8). > @@ -4607,9 +4608,15 @@ static int make_script_with_merges(struct pretty_print_context *pp, > if (isspace(*p1)) > *(char *)p1 = '-'; > > + hex_oid = oid_to_hex(&commit->object.oid); > + > + if (check_refname_format(label.buf, REFNAME_ALLOW_ONELEVEL) < 0) { > + strbuf_reset(&label); > + strbuf_addf(&label, "label-%s", hex_oid); > + } > + > strbuf_reset(&buf); > - strbuf_addf(&buf, "%s -C %s", > - cmd_merge, oid_to_hex(&commit->object.oid)); > + strbuf_addf(&buf, "%s -C %s", cmd_merge, hex_oid); A minor nit, but I think it's better here to just call oid_to_hex() twice, rather than assigning to the hex_oid variable. It's returning a pointer to a static buffer, so holding onto the pointers for too long is dangerous. What you have here is definitely OK, because nothing interesting happens in between, but seeing any assignment of the result of "oid_to_hex" makes auditing harder. And it's not like it's that expensive, or that this is performance-critical code (and if it were, we'd do better to use oid_to_hex_r() directly into the strbuf anyway). -Peff
Hi Doan & Peff On 11/11/2019 08:39, Jeff King wrote: > On Mon, Nov 11, 2019 at 01:03:42PM +0700, Doan Tran Cong Danh wrote: > >> In later stage of rebasing, we will make a ref in >> refs/rewritten/<label>, this label is extracted from the subject of >> current merge commit. >> >> If that subject has forbidden character for refname, git couldn't create >> the rewritten ref. One such case is the merge commit subject has >> a multibyte character encoded in ISO-2022-JP. >> >> Provide a sane fallback in order to help git-rebase works in such case > > Good find. Not having worked much with this part of the sequencer code, > I don't have a strong opinion on the fallback label. But it seems better > than the current behavior would be. There's been some discussion about sanitizing the subject recently [1,2]. It would be good to co-ordinate this effort with that one Best Wishes Phillip [1] https://public-inbox.org/git/4a02c38442dd8a4c0381adc8db0dce81c253da09.1567432900.git.gitgitgadget@gmail.com [2] https://public-inbox.org/git/nycvar.QRO.7.76.6.1910251508100.46@tvgsbejvaqbjf.bet > I suspect there may be other subtle problems, too, for filesystems that > can't represent some part of the subject (e.g., I wonder if some of your > ISO-2022-JP tests might already have trouble on HFS+, which excepts all > paths to be UTF-8). > >> @@ -4607,9 +4608,15 @@ static int make_script_with_merges(struct pretty_print_context *pp, >> if (isspace(*p1)) >> *(char *)p1 = '-'; >> >> + hex_oid = oid_to_hex(&commit->object.oid); >> + >> + if (check_refname_format(label.buf, REFNAME_ALLOW_ONELEVEL) < 0) { >> + strbuf_reset(&label); >> + strbuf_addf(&label, "label-%s", hex_oid); >> + } >> + >> strbuf_reset(&buf); >> - strbuf_addf(&buf, "%s -C %s", >> - cmd_merge, oid_to_hex(&commit->object.oid)); >> + strbuf_addf(&buf, "%s -C %s", cmd_merge, hex_oid); > > A minor nit, but I think it's better here to just call oid_to_hex() > twice, rather than assigning to the hex_oid variable. It's returning a > pointer to a static buffer, so holding onto the pointers for too long is > dangerous. What you have here is definitely OK, because nothing > interesting happens in between, but seeing any assignment of the result > of "oid_to_hex" makes auditing harder. > > And it's not like it's that expensive, or that this is > performance-critical code (and if it were, we'd do better to use > oid_to_hex_r() directly into the strbuf anyway). > > -Peff >
Hi, On Mon, 11 Nov 2019, Doan Tran Cong Danh wrote: > In later stage of rebasing, we will make a ref in > refs/rewritten/<label>, this label is extracted from the subject of > current merge commit. > > If that subject has forbidden character for refname, git couldn't create > the rewritten ref. One such case is the merge commit subject has > a multibyte character encoded in ISO-2022-JP. > > Provide a sane fallback in order to help git-rebase works in such case > > Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com> > --- There was an alternative patch under discussion, one that also addressed problem like e.g. characters that are illegal in Windows file names: https://github.com/gitgitgadget/git/pull/327 Unfortunately, the reviews asked for an extensive revision for which I have not been able to find the time to implement them. Having said that, I think that the solution proposed over there is more complete, and maybe even more robust (nothing in your patch prevents `label-<hex-oid>` to _already_ having been taken by another label). Ciao, Dscho > sequencer.c | 11 +++++++++-- > t/t3434-rebase-i18n.sh | 2 +- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/sequencer.c b/sequencer.c > index 2f32b44f52..f664ba727e 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -4560,6 +4560,7 @@ static int make_script_with_merges(struct pretty_print_context *pp, > while ((commit = get_revision(revs))) { > struct commit_list *to_merge; > const char *p1, *p2; > + const char *hex_oid; > struct object_id *oid; > int is_empty; > > @@ -4607,9 +4608,15 @@ static int make_script_with_merges(struct pretty_print_context *pp, > if (isspace(*p1)) > *(char *)p1 = '-'; > > + hex_oid = oid_to_hex(&commit->object.oid); > + > + if (check_refname_format(label.buf, REFNAME_ALLOW_ONELEVEL) < 0) { > + strbuf_reset(&label); > + strbuf_addf(&label, "label-%s", hex_oid); > + } > + > strbuf_reset(&buf); > - strbuf_addf(&buf, "%s -C %s", > - cmd_merge, oid_to_hex(&commit->object.oid)); > + strbuf_addf(&buf, "%s -C %s", cmd_merge, hex_oid); > > /* label the tips of merged branches */ > for (; to_merge; to_merge = to_merge->next) { > diff --git a/t/t3434-rebase-i18n.sh b/t/t3434-rebase-i18n.sh > index 4b5b128cd6..c7c835cde9 100755 > --- a/t/t3434-rebase-i18n.sh > +++ b/t/t3434-rebase-i18n.sh > @@ -45,7 +45,7 @@ test_expect_success 'rebase --rebase-merges update encoding eucJP to UTF-8' ' > compare_msg eucJP.txt eucJP UTF-8 > ' > > -test_expect_failure 'rebase --rebase-merges update encoding eucJP to ISO-2022-JP' ' > +test_expect_success 'rebase --rebase-merges update encoding eucJP to ISO-2022-JP' ' > git switch -c merge-eucJP-ISO-2022-JP first && > git config i18n.commitencoding eucJP && > git merge -F "$TEST_DIRECTORY/t3434/eucJP.txt" second && > -- > 2.24.0.164.g78daf050de.dirty > >
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Having said that, I think that the solution proposed over there is more > complete, and maybe even more robust (nothing in your patch prevents > `label-<hex-oid>` to _already_ having been taken by another label). Good thinking. Let me drop this last one from the series in the meantime. Thanks.
diff --git a/sequencer.c b/sequencer.c index 2f32b44f52..f664ba727e 100644 --- a/sequencer.c +++ b/sequencer.c @@ -4560,6 +4560,7 @@ static int make_script_with_merges(struct pretty_print_context *pp, while ((commit = get_revision(revs))) { struct commit_list *to_merge; const char *p1, *p2; + const char *hex_oid; struct object_id *oid; int is_empty; @@ -4607,9 +4608,15 @@ static int make_script_with_merges(struct pretty_print_context *pp, if (isspace(*p1)) *(char *)p1 = '-'; + hex_oid = oid_to_hex(&commit->object.oid); + + if (check_refname_format(label.buf, REFNAME_ALLOW_ONELEVEL) < 0) { + strbuf_reset(&label); + strbuf_addf(&label, "label-%s", hex_oid); + } + strbuf_reset(&buf); - strbuf_addf(&buf, "%s -C %s", - cmd_merge, oid_to_hex(&commit->object.oid)); + strbuf_addf(&buf, "%s -C %s", cmd_merge, hex_oid); /* label the tips of merged branches */ for (; to_merge; to_merge = to_merge->next) { diff --git a/t/t3434-rebase-i18n.sh b/t/t3434-rebase-i18n.sh index 4b5b128cd6..c7c835cde9 100755 --- a/t/t3434-rebase-i18n.sh +++ b/t/t3434-rebase-i18n.sh @@ -45,7 +45,7 @@ test_expect_success 'rebase --rebase-merges update encoding eucJP to UTF-8' ' compare_msg eucJP.txt eucJP UTF-8 ' -test_expect_failure 'rebase --rebase-merges update encoding eucJP to ISO-2022-JP' ' +test_expect_success 'rebase --rebase-merges update encoding eucJP to ISO-2022-JP' ' git switch -c merge-eucJP-ISO-2022-JP first && git config i18n.commitencoding eucJP && git merge -F "$TEST_DIRECTORY/t3434/eucJP.txt" second &&
In later stage of rebasing, we will make a ref in refs/rewritten/<label>, this label is extracted from the subject of current merge commit. If that subject has forbidden character for refname, git couldn't create the rewritten ref. One such case is the merge commit subject has a multibyte character encoded in ISO-2022-JP. Provide a sane fallback in order to help git-rebase works in such case Signed-off-by: Doan Tran Cong Danh <congdanhqx@gmail.com> --- sequencer.c | 11 +++++++++-- t/t3434-rebase-i18n.sh | 2 +- 2 files changed, 10 insertions(+), 3 deletions(-)