Message ID | patch-2.2-b6136380c28-20211004T013611Z-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | i18n: improve translatability of ambiguous object output | expand |
On Mon, Oct 04, 2021 at 03:42:49AM +0200, Ævar Arnfjörð Bjarmason wrote: > Change the output of show_ambiguous_object() added in [1] and last > tweaked in [2] to be more friendly to translators. By being able to > customize the sprintf formats we're even ready for RTL languages. > > 1. ef9b0370da6 (sha1-name.c: store and use repo in struct > disambiguate_state, 2019-04-16) > 2. 5cc044e0257 (get_short_oid: sort ambiguous objects by type, > then SHA-1, 2018-05-10) I suspect you meant 1ffa26c461 (get_short_sha1: list ambiguous objects on error, 2016-09-26) for the first one. I had to stare at the patch for a while to understand the goal here. I think this would have been a bit easier to review if "change" in your first sentence was described a bit more. Perhaps: The list of candidates output by show_ambiguous_output() is not marked for translation. At the very least we want to allow the text "the candidates are" to be translated. But we also format individual candidate lines like: deadbeef commit 2021-01-01 - Some Commit Message by formatting the individual components, then using a printf-format to arrange them in the correct order. Even though there's no text here to be translated, the order and spacing is determined by the format string. Allowing that to be translated helps RTL languages. I have a few comments on the patch itself. The biggest thing is that it changes the format to add an extra newline (between "The candidates are:" and the actual list). I don't have a strong opinion on including that or not, but it seemed unintentional given the comment on the first commit (and its lack of mention here). The rest are mostly observations, not criticisms. You can take them with the appropriate grain of salt given that I don't do translation work myself, nor know any RTL languages. > @@ -366,18 +373,34 @@ static int show_ambiguous_object(const struct object_id *oid, void *data) > if (commit) { > struct pretty_print_context pp = {0}; > pp.date_mode.type = DATE_SHORT; > - format_commit_message(commit, " %ad - %s", &desc, &pp); > + format_commit_message(commit, _(" %ad - %s"), &desc, &pp); > } Is it OK to use non-printf expansions with the gettext code? Presumably the translated string would have the same set of placeholders in it, but my understanding is that gettext may sometimes munge the %-placeholders (e.g., allowing numbered ones for re-ordering). I admit I don't know how any of that works, but I just wonder if this "%ad" may cause confusion (or even if not, if it is even possible to re-order it for an RTL language). > } else if (type == OBJ_TAG) { > struct tag *tag = lookup_tag(ds->repo, oid); > if (!parse_tag(tag) && tag->tag) > - strbuf_addf(&desc, " %s", tag->tag); > + strbuf_addf(&desc, _(" %s"), tag->tag); > } I wonder whether " %s" is worthwhile as a translatable string. It does seem to be unique among strings marked for translation, but there are a ton of non-translated instances. Would context ever matter here? My impression is that this kind of translation-lego is frowned upon, and we might be better off repeating ourselves a bit more. I.e., something like: if (commit) { struct strbuf date = STRBUF_INIT; struct strbuf subject = STRBUF_INIT; format_commit_message(commit, "%ad", &date, &pp); format_commit_message(commit, "%s", &subject, &pp); strbuf_addf(advice, _(" %s commit %s - %s\n"), repo_find_unique_abbrev(...), date.buf, subject.buf); strbuf_release(&date); strbuf_release(&subject); } else if (type == OBJ_TAG) { ... strbuf_addf(advice, _(" %s tag %s\n"), repo_find_unique_abbrev(...), tag->tag); } else { /* TRANSLATORS: the fields are abbreviated oid and type */ strbuf_addf(advice, _(" %s %s\n"), repo_find_unique_abbrev(...), type_name(type)); } Though that last one similarly has a real lack of context. > - advise(" %s %s%s", > - repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV), > - type_name(type) ? type_name(type) : "unknown type", > - desc.buf); > + strbuf_addf(advice, > + /* > + * TRANSLATORS: This is a line of ambiguous object > + * output. E.g.: > + * > + * "deadbeef commit 2021-01-01 - Some Commit Message\n" > + * "deadbeef tag Some Tag Message\n" > + * "deadbeef tree\n" > + * > + * I.e. the first argument is a short OID, the > + * second is the type name of the object, and the > + * third a description of the object, if it's a > + * commit or tag. In that case the " %ad - %s" and > + * " %s" formats above will be used for the third > + * argument. > + */ > + _(" %s %s%s\n"), > + repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV), > + type_name(type) ? type_name(type) : "unknown type", > + desc.buf); Would you want to translate "unknown type" here, as well? It's probably not that important in practice, but it seems like a funny omission. > @@ -488,12 +516,19 @@ static enum get_oid_result get_short_oid(struct repository *r, > if (!ds.ambiguous) > ds.fn = NULL; > > - advise(_("The candidates are:")); > repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect); > sort_ambiguous_oid_array(r, &collect); > > - if (oid_array_for_each(&collect, show_ambiguous_object, &ds)) > + if (oid_array_for_each(&collect, show_ambiguous_object, &as)) > BUG("show_ambiguous_object shouldn't return non-zero"); > + > + /* > + * TRANSLATORS: The argument is the list of ambiguous > + * objects composed in show_ambiguous_object(). See > + * its "TRANSLATORS" comment for details. > + */ > + advise(_("The candidates are:\n\n%s"), sb.buf); Here's where the extra newline. I understand why the earlier ones were changed for RTL languages. But this one is always line-oriented. Is the point to help bottom-to-top languages? I can buy that, though it feels like that would be something that the terminal would deal with (because even with this, you're still getting the "error:" line printed separately, for example). I don't think what this is doing is wrong (at first I wondered about the "hint:" lines, but because advise() looks for embedded newlines, we're OK). But if the translation doesn't need to reorder things across lines, this extra format-into-a-strbuf step doesn't seem necessary. We can just call advise() directly in show_ambiguous_object(), as before. If it is necessary, then note that you leak "sb" here. -Peff
On Mon, Oct 04 2021, Jeff King wrote: > On Mon, Oct 04, 2021 at 03:42:49AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Change the output of show_ambiguous_object() added in [1] and last >> tweaked in [2] to be more friendly to translators. By being able to >> customize the sprintf formats we're even ready for RTL languages. >> >> 1. ef9b0370da6 (sha1-name.c: store and use repo in struct >> disambiguate_state, 2019-04-16) >> 2. 5cc044e0257 (get_short_oid: sort ambiguous objects by type, >> then SHA-1, 2018-05-10) > > I suspect you meant 1ffa26c461 (get_short_sha1: list ambiguous objects > on error, 2016-09-26) for the first one. > > I had to stare at the patch for a while to understand the goal here. I > think this would have been a bit easier to review if "change" in your > first sentence was described a bit more. Perhaps: > > The list of candidates output by show_ambiguous_output() is not marked > for translation. At the very least we want to allow the text "the > candidates are" to be translated. But we also format individual > candidate lines like: > > deadbeef commit 2021-01-01 - Some Commit Message > > by formatting the individual components, then using a printf-format to > arrange them in the correct order. Even though there's no text here to > be translated, the order and spacing is determined by the format > string. Allowing that to be translated helps RTL languages. > > I have a few comments on the patch itself. The biggest thing is that it > changes the format to add an extra newline (between "The candidates > are:" and the actual list). I don't have a strong opinion on including > that or not, but it seemed unintentional given the comment on the first > commit (and its lack of mention here). That was unintentional, sorry. Will fix. > The rest are mostly observations, not criticisms. You can take them with > the appropriate grain of salt given that I don't do translation work > myself, nor know any RTL languages. > >> @@ -366,18 +373,34 @@ static int show_ambiguous_object(const struct object_id *oid, void *data) >> if (commit) { >> struct pretty_print_context pp = {0}; >> pp.date_mode.type = DATE_SHORT; >> - format_commit_message(commit, " %ad - %s", &desc, &pp); >> + format_commit_message(commit, _(" %ad - %s"), &desc, &pp); >> } > > Is it OK to use non-printf expansions with the gettext code? Presumably > the translated string would have the same set of placeholders in it, but > my understanding is that gettext may sometimes munge the %-placeholders > (e.g., allowing numbered ones for re-ordering). I admit I don't know how > any of that works, but I just wonder if this "%ad" may cause confusion > (or even if not, if it is even possible to re-order it for an RTL > language). It's not, oops. I missed that, blinders on for the "%ad". Will construct it in advance and use %s interpolation separately. >> } else if (type == OBJ_TAG) { >> struct tag *tag = lookup_tag(ds->repo, oid); >> if (!parse_tag(tag) && tag->tag) >> - strbuf_addf(&desc, " %s", tag->tag); >> + strbuf_addf(&desc, _(" %s"), tag->tag); >> } > > I wonder whether " %s" is worthwhile as a translatable string. It does > seem to be unique among strings marked for translation, but there are a > ton of non-translated instances. Would context ever matter here? > > My impression is that this kind of translation-lego is frowned upon, and > we might be better off repeating ourselves a bit more. I.e., something > like: > > if (commit) { > struct strbuf date = STRBUF_INIT; > struct strbuf subject = STRBUF_INIT; > format_commit_message(commit, "%ad", &date, &pp); > format_commit_message(commit, "%s", &subject, &pp); > strbuf_addf(advice, _(" %s commit %s - %s\n"), > repo_find_unique_abbrev(...), > date.buf, subject.buf); > strbuf_release(&date); > strbuf_release(&subject); > } else if (type == OBJ_TAG) { > ... > strbuf_addf(advice, _(" %s tag %s\n"), > repo_find_unique_abbrev(...), tag->tag); > } else { > /* TRANSLATORS: the fields are abbreviated oid and type */ > strbuf_addf(advice, _(" %s %s\n"), > repo_find_unique_abbrev(...), type_name(type)); > } > > Though that last one similarly has a real lack of context. Yeah that's better. Will change it to something like that. >> - advise(" %s %s%s", >> - repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV), >> - type_name(type) ? type_name(type) : "unknown type", >> - desc.buf); >> + strbuf_addf(advice, >> + /* >> + * TRANSLATORS: This is a line of ambiguous object >> + * output. E.g.: >> + * >> + * "deadbeef commit 2021-01-01 - Some Commit Message\n" >> + * "deadbeef tag Some Tag Message\n" >> + * "deadbeef tree\n" >> + * >> + * I.e. the first argument is a short OID, the >> + * second is the type name of the object, and the >> + * third a description of the object, if it's a >> + * commit or tag. In that case the " %ad - %s" and >> + * " %s" formats above will be used for the third >> + * argument. >> + */ >> + _(" %s %s%s\n"), >> + repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV), >> + type_name(type) ? type_name(type) : "unknown type", >> + desc.buf); > > Would you want to translate "unknown type" here, as well? It's probably > not that important in practice, but it seems like a funny omission. Willdo. >> @@ -488,12 +516,19 @@ static enum get_oid_result get_short_oid(struct repository *r, >> if (!ds.ambiguous) >> ds.fn = NULL; >> >> - advise(_("The candidates are:")); >> repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect); >> sort_ambiguous_oid_array(r, &collect); >> >> - if (oid_array_for_each(&collect, show_ambiguous_object, &ds)) >> + if (oid_array_for_each(&collect, show_ambiguous_object, &as)) >> BUG("show_ambiguous_object shouldn't return non-zero"); >> + >> + /* >> + * TRANSLATORS: The argument is the list of ambiguous >> + * objects composed in show_ambiguous_object(). See >> + * its "TRANSLATORS" comment for details. >> + */ >> + advise(_("The candidates are:\n\n%s"), sb.buf); > > Here's where the extra newline. > > I understand why the earlier ones were changed for RTL languages. But > this one is always line-oriented. Is the point to help bottom-to-top > languages? I can buy that, though it feels like that would be something > that the terminal would deal with (because even with this, you're still > getting the "error:" line printed separately, for example). > > I don't think what this is doing is wrong (at first I wondered about the > "hint:" lines, but because advise() looks for embedded newlines, we're > OK). But if the translation doesn't need to reorder things across lines, > this extra format-into-a-strbuf step doesn't seem necessary. We can just > call advise() directly in show_ambiguous_object(), as before. > > If it is necessary, then note that you leak "sb" here. I'll keep that bit as-is, it's not strictly necessary, but it gives translators a bit more context.
On Mon, Oct 04, 2021 at 10:26:10AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> + /* > >> + * TRANSLATORS: The argument is the list of ambiguous > >> + * objects composed in show_ambiguous_object(). See > >> + * its "TRANSLATORS" comment for details. > >> + */ > >> + advise(_("The candidates are:\n\n%s"), sb.buf); > > > > Here's where the extra newline. > > > > I understand why the earlier ones were changed for RTL languages. But > > this one is always line-oriented. Is the point to help bottom-to-top > > languages? I can buy that, though it feels like that would be something > > that the terminal would deal with (because even with this, you're still > > getting the "error:" line printed separately, for example). > > > > I don't think what this is doing is wrong (at first I wondered about the > > "hint:" lines, but because advise() looks for embedded newlines, we're > > OK). But if the translation doesn't need to reorder things across lines, > > this extra format-into-a-strbuf step doesn't seem necessary. We can just > > call advise() directly in show_ambiguous_object(), as before. > > > > If it is necessary, then note that you leak "sb" here. > > I'll keep that bit as-is, it's not strictly necessary, but it gives > translators a bit more context. If it's just for the context, wouldn't this do the same thing: /* * TRANSLATORS: This is followed by the list of ambiguous * objects composed in show_ambiguous_object(). See its * "TRANSLATORS" comments for details. */ advise(_("The candidates are:")); ... if (oid_array_for_each(&collect, show_ambiguous_object, &ds)) ... I.e., leave the code as-is, and just add the extra comment. There is no need for the extra struct or any change of ordering between this advise() and the others. I would think it is worthwhile if we are de-lego-ing a message that is made in chunks, but in this case the we have to construct an opaque "%s" to represent the individual lines for each object, because we don't know how many of them there will be. -Peff PS In my "something like this" commit message, I indicated that the "candidates" message was getting translated, but it actually is already translated in the pre-image. So I think we would not need to touch that line at all.
On Mon, Oct 04 2021, Jeff King wrote: > On Mon, Oct 04, 2021 at 10:26:10AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> >> + /* >> >> + * TRANSLATORS: The argument is the list of ambiguous >> >> + * objects composed in show_ambiguous_object(). See >> >> + * its "TRANSLATORS" comment for details. >> >> + */ >> >> + advise(_("The candidates are:\n\n%s"), sb.buf); >> > >> > Here's where the extra newline. >> > >> > I understand why the earlier ones were changed for RTL languages. But >> > this one is always line-oriented. Is the point to help bottom-to-top >> > languages? I can buy that, though it feels like that would be something >> > that the terminal would deal with (because even with this, you're still >> > getting the "error:" line printed separately, for example). >> > >> > I don't think what this is doing is wrong (at first I wondered about the >> > "hint:" lines, but because advise() looks for embedded newlines, we're >> > OK). But if the translation doesn't need to reorder things across lines, >> > this extra format-into-a-strbuf step doesn't seem necessary. We can just >> > call advise() directly in show_ambiguous_object(), as before. >> > >> > If it is necessary, then note that you leak "sb" here. >> >> I'll keep that bit as-is, it's not strictly necessary, but it gives >> translators a bit more context. > > If it's just for the context, wouldn't this do the same thing: > > /* > * TRANSLATORS: This is followed by the list of ambiguous > * objects composed in show_ambiguous_object(). See its > * "TRANSLATORS" comments for details. > */ > advise(_("The candidates are:")); > ... > if (oid_array_for_each(&collect, show_ambiguous_object, &ds)) > ... > > I.e., leave the code as-is, and just add the extra comment. There is no > need for the extra struct or any change of ordering between this > advise() and the others. > > I would think it is worthwhile if we are de-lego-ing a message that is > made in chunks, but in this case the we have to construct an opaque "%s" > to represent the individual lines for each object, because we don't know > how many of them there will be. > > -Peff > > PS In my "something like this" commit message, I indicated that the > "candidates" message was getting translated, but it actually is > already translated in the pre-image. So I think we would not need to > touch that line at all. Yes you're right. You've got me, I guess :) An unstated motivation of mine here is that I've got a series that changes the advise() function itself so that it automatically adds the "and run xyz to disable this message". Now some don't emit it, some don't even have associated configuration or documentation. It's a mess. I originally hacked this up because this is the one in-tree user of advise() that constructs output incrementally. So for that improvement to advise() it either needs to be changed to not do so (this patch), or I'd need an advise_no_template() or advise_hint_line() or whatever as a workaround. I didn't mean to be too subterfuge-y about it. It's just hard to find a balance between a single long series & a few shorter ones, and when to distract reviewers with "this design choice is also because of XYZ tangentally related end-goal". Anyway, now that we're here I'm not sure what the best way forward is. One is to just address the pointed-out bugs and keep that accumulate/print pattern I instituded, which would help that subsequent series. But I agree that while I think it is a bit better to translate the "foo:\n\n%s" message (it gives a bit more context about what sort of message it is), it's not really worth it just in the context of this patch. What do you think? That we could let this pass for now, or we should drop this and I can try to re-visit it as part of some larger topic? That meaningful improvement to advise() depends on this + another series of advise fixes I submitted in parallel at [1]. 1. https://lore.kernel.org/git/cover-0.5-00000000000-20211004T015432Z-avarab@gmail.com/T/#t
On Mon, Oct 04, 2021 at 01:16:24PM +0200, Ævar Arnfjörð Bjarmason wrote: > An unstated motivation of mine here is that I've got a series that > changes the advise() function itself so that it automatically adds the > "and run xyz to disable this message". > > Now some don't emit it, some don't even have associated configuration or > documentation. It's a mess. > > I originally hacked this up because this is the one in-tree user of > advise() that constructs output incrementally. So for that improvement > to advise() it either needs to be changed to not do so (this patch), or > I'd need an advise_no_template() or advise_hint_line() or whatever as a > workaround. OK, that makes sense. In general, I think it's much easier if those motivations _aren't_ unstated. Both for the benefit of reviewers, but also folks reading commit messages later who wonder "hey, it looks like we didn't need this hunk, and it is causing a hassle, so why can't I just revert it". But... > I didn't mean to be too subterfuge-y about it. It's just hard to find a > balance between a single long series & a few shorter ones, and when to > distract reviewers with "this design choice is also because of XYZ > tangentally related end-goal". ...yeah, if you have patches that say "do X, because later maybe we'll do Y", then it is often hard to evaluate them if Y is not in the same series. And _especially_ so if there is some other Z happening in the current series with X, because even talking about X is muddling things. So in an ideal world, you'd not do X at all (in this case, touch the advise() lines), and leave Y (rolling up a buf to hand to a single advise() line) as a preparatory patch in a series that does Z (your change to advise() to print the extra stuff). Things don't always break down that way, but I think they do here. Nothing you want to do here is semantically related to the later change to advise() you want to make. There are textual dependencies, which means you'll want to wait for one series to graduate before the other, but that's already the case if you stuff the preparatory patch in this series. -Peff
diff --git a/object-name.c b/object-name.c index fdff4601b2c..7e7f671e337 100644 --- a/object-name.c +++ b/object-name.c @@ -351,9 +351,16 @@ static int init_object_disambiguation(struct repository *r, return 0; } +struct show_ambiguous_state { + const struct disambiguate_state *ds; + struct strbuf *advice; +}; + static int show_ambiguous_object(const struct object_id *oid, void *data) { - const struct disambiguate_state *ds = data; + struct show_ambiguous_state *state = data; + const struct disambiguate_state *ds = state->ds; + struct strbuf *advice = state->advice; struct strbuf desc = STRBUF_INIT; int type; @@ -366,18 +373,34 @@ static int show_ambiguous_object(const struct object_id *oid, void *data) if (commit) { struct pretty_print_context pp = {0}; pp.date_mode.type = DATE_SHORT; - format_commit_message(commit, " %ad - %s", &desc, &pp); + format_commit_message(commit, _(" %ad - %s"), &desc, &pp); } } else if (type == OBJ_TAG) { struct tag *tag = lookup_tag(ds->repo, oid); if (!parse_tag(tag) && tag->tag) - strbuf_addf(&desc, " %s", tag->tag); + strbuf_addf(&desc, _(" %s"), tag->tag); } - advise(" %s %s%s", - repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV), - type_name(type) ? type_name(type) : "unknown type", - desc.buf); + strbuf_addf(advice, + /* + * TRANSLATORS: This is a line of ambiguous object + * output. E.g.: + * + * "deadbeef commit 2021-01-01 - Some Commit Message\n" + * "deadbeef tag Some Tag Message\n" + * "deadbeef tree\n" + * + * I.e. the first argument is a short OID, the + * second is the type name of the object, and the + * third a description of the object, if it's a + * commit or tag. In that case the " %ad - %s" and + * " %s" formats above will be used for the third + * argument. + */ + _(" %s %s%s\n"), + repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV), + type_name(type) ? type_name(type) : "unknown type", + desc.buf); strbuf_release(&desc); return 0; @@ -475,7 +498,12 @@ static enum get_oid_result get_short_oid(struct repository *r, } if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) { + struct strbuf sb = STRBUF_INIT; struct oid_array collect = OID_ARRAY_INIT; + struct show_ambiguous_state as = { + .ds = &ds, + .advice = &sb, + }; error(_("short object ID %s is ambiguous"), ds.hex_pfx); @@ -488,12 +516,19 @@ static enum get_oid_result get_short_oid(struct repository *r, if (!ds.ambiguous) ds.fn = NULL; - advise(_("The candidates are:")); repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect); sort_ambiguous_oid_array(r, &collect); - if (oid_array_for_each(&collect, show_ambiguous_object, &ds)) + if (oid_array_for_each(&collect, show_ambiguous_object, &as)) BUG("show_ambiguous_object shouldn't return non-zero"); + + /* + * TRANSLATORS: The argument is the list of ambiguous + * objects composed in show_ambiguous_object(). See + * its "TRANSLATORS" comment for details. + */ + advise(_("The candidates are:\n\n%s"), sb.buf); + oid_array_clear(&collect); }
Change the output of show_ambiguous_object() added in [1] and last tweaked in [2] to be more friendly to translators. By being able to customize the sprintf formats we're even ready for RTL languages. 1. ef9b0370da6 (sha1-name.c: store and use repo in struct disambiguate_state, 2019-04-16) 2. 5cc044e0257 (get_short_oid: sort ambiguous objects by type, then SHA-1, 2018-05-10) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- object-name.c | 53 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 44 insertions(+), 9 deletions(-)