Message ID | patch-v2-1.2-55bde16aa23-20211004T142523Z-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | i18n: improve translatability of ambiguous object output | expand |
On Mon, Oct 4, 2021 at 10:27 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > Mark the "commit", "tree", "blob" and "tag" types for translation, and > add an extern "unknown type" string for the OBJ_NONE case. > > It is usually bad practice to translate individual words like this, > but for e.g. the list list output emitted by the "short object ID dead "list list"? > is ambiguous" advice it makes sense. > > A subsequent commit will make that output translatable, and use these > translation markings to do so. Well, we won't use "commit", but let's > mark it up anyway for consistency. It'll probably come in handy sooner > than later to have it already be translated, and it's to much of a > burden to place on translators if they're translating the other three > object types anyway. At first I thought you meant s/to much/too much/, but that doesn't seem to make sense (unless I'm misunderstanding), so perhaps you mean s/to/not/. > Aside: I think it would probably make sense to change the "NULL" entry > for type_name() to be the "unknown type". I've ran into cases where > type_name() was unconditionally interpolated in e.g. an sprintf() > format, but let's leave that for #leftoverbits as that would be > changing the behavior of the type_name() function. > > All of these will be new in the git.pot file, except "blob" which will > be shared with a "cat-file" command-line option, see > 7bcf3414535 (cat-file --textconv/--filters: allow specifying the path > separately, 2016-09-09) for its introduction. > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
On 04/10/21 21.27, Ævar Arnfjörð Bjarmason wrote: > static const char *object_type_strings[] = { > NULL, /* OBJ_NONE = 0 */ > - "commit", /* OBJ_COMMIT = 1 */ > - "tree", /* OBJ_TREE = 2 */ > - "blob", /* OBJ_BLOB = 3 */ > - "tag", /* OBJ_TAG = 4 */ > + /* > + * TRANSLATORS: "commit", "tree", "blob" and "tag" are the > + * name of Git's object types. These names are interpolated > + * stand-alone when doing so is unambiguous for translation > + * and doesn't require extra context. E.g. as part of an > + * already-translated string that needs to have a type name > + * quoted verbatim, or the short description of a command-line > + * option expecting a given type. > + */ > + N_("commit"), /* OBJ_COMMIT = 1 */ > + N_("tree"), /* OBJ_TREE = 2 */ > + N_("blob"), /* OBJ_BLOB = 3 */ > + N_("tag"), /* OBJ_TAG = 4 */ > }; > Are these object type names safe for translating? (e.g. can they be translatable without affecting private API string, which aren't translatable)? > +/* > + * TRANSLATORS: This is the short type name of an object that's not > + * one of Git's known object types, as opposed to "commit", "tree", > + * "blob" and "tag" above. > + * > + * A user is unlikely to ever encounter these, but they can be > + * manually created with "git hash-object --literally". > + */ > +const char *unknown_type = N_("unknown type"); > + > const char *type_name(unsigned int type) Did you mean that "unknown type" is generic shorthand?
On Tue, Oct 05 2021, Bagas Sanjaya wrote: > On 04/10/21 21.27, Ævar Arnfjörð Bjarmason wrote: >> static const char *object_type_strings[] = { >> NULL, /* OBJ_NONE = 0 */ >> - "commit", /* OBJ_COMMIT = 1 */ >> - "tree", /* OBJ_TREE = 2 */ >> - "blob", /* OBJ_BLOB = 3 */ >> - "tag", /* OBJ_TAG = 4 */ >> + /* >> + * TRANSLATORS: "commit", "tree", "blob" and "tag" are the >> + * name of Git's object types. These names are interpolated >> + * stand-alone when doing so is unambiguous for translation >> + * and doesn't require extra context. E.g. as part of an >> + * already-translated string that needs to have a type name >> + * quoted verbatim, or the short description of a command-line >> + * option expecting a given type. >> + */ >> + N_("commit"), /* OBJ_COMMIT = 1 */ >> + N_("tree"), /* OBJ_TREE = 2 */ >> + N_("blob"), /* OBJ_BLOB = 3 */ >> + N_("tag"), /* OBJ_TAG = 4 */ >> }; >> > > Are these object type names safe for translating? (e.g. can they be > translatable without affecting private API string, which aren't > translatable)? Yes, the N_() macro is always a noop. It's just there so the i18n tooling knows to pick up these strings and drop them into po/git.pot. See po/README.md for details. It does change the behavior of any code that later does _(type_name(type)), as the string will then (potentially) be found in the *.mo files, but as shown in 2/2 that needs to be added to each callsite manually. So we're not going to translate "ls-tree" output or whatever just because it has "tree" etc. in it. >> +/* >> + * TRANSLATORS: This is the short type name of an object that's not >> + * one of Git's known object types, as opposed to "commit", "tree", >> + * "blob" and "tag" above. >> + * >> + * A user is unlikely to ever encounter these, but they can be >> + * manually created with "git hash-object --literally". >> + */ >> +const char *unknown_type = N_("unknown type"); >> + >> const char *type_name(unsigned int type) > > Did you mean that "unknown type" is generic shorthand? Yes, we could get the actual type name here, but it's a bit of a pain, and as noted in 2/2 this code doesn't work anyway (which pre-dates this series). But I'll see if I'll remember to loop around to fixing it after my fsck/object library fixes related to this land, but for now just marking this for translation makes senes I think.
On Mon, Oct 04, 2021 at 04:27:01PM +0200, Ævar Arnfjörð Bjarmason wrote: > Mark the "commit", "tree", "blob" and "tag" types for translation, and > add an extern "unknown type" string for the OBJ_NONE case. > > It is usually bad practice to translate individual words like this, > but for e.g. the list list output emitted by the "short object ID dead > is ambiguous" advice it makes sense. We already seem to have a translatable string for "commit", but if I look at say es.po, the translation is "confirmar", which is considering it a verb. Now my Spanish is pretty rusty, so it's possible this works as a noun, too. But if I look at other messages, like: #: builtin/commit.c:1623 msgid "override date for commit" msgstr "sobrescribe la fecha del commit" #: builtin/commit.c:1626 msgid "reuse message from specified commit" msgstr "reusar el mensaje de un commit específico" then it's clear that "commit" as a noun is translated as "commit". I'm not sure what facilities (if any) there are in gettext for having the same string in different contexts. I do note that this is already a problem. Of the five spots listed: #: builtin/commit.c:1625 builtin/commit.c:1626 builtin/commit.c:1632 #: parse-options.h:329 ref-filter.h:90 msgid "commit" msgstr "confirmar" They all appear to want is as a noun. So maybe this is just mis-translated for Spanish. It does feel like an accident in the making, though. > A subsequent commit will make that output translatable, and use these > translation markings to do so. Well, we won't use "commit", but let's > mark it up anyway for consistency. It'll probably come in handy sooner > than later to have it already be translated, and it's to much of a > burden to place on translators if they're translating the other three > object types anyway. I do wonder how useful it is to translate these type names in general. Especially as used in this series, they're really technical terms, and you are not going to escape the name "git commit" as a command. But I don't ever use translated Git, so I'm not sure my opinion is all that meaningful there. > Aside: I think it would probably make sense to change the "NULL" entry > for type_name() to be the "unknown type". I've ran into cases where > type_name() was unconditionally interpolated in e.g. an sprintf() > format, but let's leave that for #leftoverbits as that would be > changing the behavior of the type_name() function. IMHO this would be a bad idea. Even if there is a spot that uses the result without checking for NULL, I'd much rather have Git segfault than say, write out an object with a bogus name (as it would in index_mem(), for example). So you really have to look over every caller, at which point you may as well adjust the ones that aren't checking for NULL. Now if you introduced type_name_human(), which auto-translated and converted NULL to "unknown", then that would be easy to plug in appropriately as you audited the callers. > static const char *object_type_strings[] = { > NULL, /* OBJ_NONE = 0 */ > - "commit", /* OBJ_COMMIT = 1 */ > - "tree", /* OBJ_TREE = 2 */ > - "blob", /* OBJ_BLOB = 3 */ > - "tag", /* OBJ_TAG = 4 */ > + /* > + * TRANSLATORS: "commit", "tree", "blob" and "tag" are the > + * name of Git's object types. These names are interpolated > + * stand-alone when doing so is unambiguous for translation > + * and doesn't require extra context. E.g. as part of an > + * already-translated string that needs to have a type name > + * quoted verbatim, or the short description of a command-line > + * option expecting a given type. > + */ > + N_("commit"), /* OBJ_COMMIT = 1 */ > + N_("tree"), /* OBJ_TREE = 2 */ > + N_("blob"), /* OBJ_BLOB = 3 */ > + N_("tag"), /* OBJ_TAG = 4 */ > }; This does make me feel slightly uneasy, just because so many parts of Git rely on these _not_ being translated. But I see in your other response that N_() really does nothing. So aside from possibly misleading readers of the code, I think this is probably OK. -Peff
Jeff King <peff@peff.net> writes: > They all appear to want is as a noun. So maybe this is just > mis-translated for Spanish. It does feel like an accident in the making, > though. Probably we need pgettext(). https://www.gnu.org/software/gettext/manual/html_node/Contexts.html > I do wonder how useful it is to translate these type names in general. > Especially as used in this series, they're really technical terms, and > you are not going to escape the name "git commit" as a command. I share the same feeling (I do not use translated git, either). > Now if you introduced type_name_human(), which auto-translated and > converted NULL to "unknown", then that would be easy to plug in > appropriately as you audited the callers. Yes. > >> static const char *object_type_strings[] = { >> ... >> + N_("commit"), /* OBJ_COMMIT = 1 */ >> + N_("tree"), /* OBJ_TREE = 2 */ >> + N_("blob"), /* OBJ_BLOB = 3 */ >> + N_("tag"), /* OBJ_TAG = 4 */ >> }; > > This does make me feel slightly uneasy, just because so many parts of > Git rely on these _not_ being translated. But I see in your other > response that N_() really does nothing. So aside from possibly > misleading readers of the code, I think this is probably OK. Yes, this may be scary looking but the least risky part of this patch, as N_() is no-op at runtime ;-).
On Wed, Oct 06, 2021 at 12:46:12PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > They all appear to want is as a noun. So maybe this is just > > mis-translated for Spanish. It does feel like an accident in the making, > > though. > > Probably we need pgettext(). > > https://www.gnu.org/software/gettext/manual/html_node/Contexts.html Yeah, that make sense. I'm not sure how it interacts with N_(), though. I.e., I'd expect the "context" to ride along with the original string, but I guess it is really in the caller who's translating it. So the real spot becomes: printf(_("my type is %s"), pgettext("object-type", type_name(type))); It's a little unfortunate that every caller has to do it rather than putting it near the source string. But I guess a type_name_human() would solve that, too. ;) -Peff
Jeff King <peff@peff.net> writes: > On Wed, Oct 06, 2021 at 12:46:12PM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >> > They all appear to want is as a noun. So maybe this is just >> > mis-translated for Spanish. It does feel like an accident in the making, >> > though. >> >> Probably we need pgettext(). >> >> https://www.gnu.org/software/gettext/manual/html_node/Contexts.html > > Yeah, that make sense. I'm not sure how it interacts with N_(), though. > I.e., I'd expect the "context" to ride along with the original string, > but I guess it is really in the caller who's translating it. So the real > spot becomes: > > printf(_("my type is %s"), pgettext("object-type", type_name(type))); > > It's a little unfortunate that every caller has to do it rather than > putting it near the source string. But I guess a type_name_human() would > solve that, too. ;) Yes, I agree the need for pgettext() is annoying but I do not see an easy alternative. Introducing a wrapper like type_name_human() to limit the damage sounds like the best we could do. Thanks.
diff --git a/object.c b/object.c index 4e85955a941..47dbe0d8a2a 100644 --- a/object.c +++ b/object.c @@ -22,12 +22,31 @@ struct object *get_indexed_object(unsigned int idx) static const char *object_type_strings[] = { NULL, /* OBJ_NONE = 0 */ - "commit", /* OBJ_COMMIT = 1 */ - "tree", /* OBJ_TREE = 2 */ - "blob", /* OBJ_BLOB = 3 */ - "tag", /* OBJ_TAG = 4 */ + /* + * TRANSLATORS: "commit", "tree", "blob" and "tag" are the + * name of Git's object types. These names are interpolated + * stand-alone when doing so is unambiguous for translation + * and doesn't require extra context. E.g. as part of an + * already-translated string that needs to have a type name + * quoted verbatim, or the short description of a command-line + * option expecting a given type. + */ + N_("commit"), /* OBJ_COMMIT = 1 */ + N_("tree"), /* OBJ_TREE = 2 */ + N_("blob"), /* OBJ_BLOB = 3 */ + N_("tag"), /* OBJ_TAG = 4 */ }; +/* + * TRANSLATORS: This is the short type name of an object that's not + * one of Git's known object types, as opposed to "commit", "tree", + * "blob" and "tag" above. + * + * A user is unlikely to ever encounter these, but they can be + * manually created with "git hash-object --literally". + */ +const char *unknown_type = N_("unknown type"); + const char *type_name(unsigned int type) { if (type >= ARRAY_SIZE(object_type_strings)) diff --git a/object.h b/object.h index 549f2d256bc..0510dc4b3ea 100644 --- a/object.h +++ b/object.h @@ -91,6 +91,7 @@ struct object { struct object_id oid; }; +extern const char *unknown_type; const char *type_name(unsigned int type); int type_from_string_gently(const char *str, ssize_t, int gentle); #define type_from_string(str) type_from_string_gently(str, -1, 0)
Mark the "commit", "tree", "blob" and "tag" types for translation, and add an extern "unknown type" string for the OBJ_NONE case. It is usually bad practice to translate individual words like this, but for e.g. the list list output emitted by the "short object ID dead is ambiguous" advice it makes sense. A subsequent commit will make that output translatable, and use these translation markings to do so. Well, we won't use "commit", but let's mark it up anyway for consistency. It'll probably come in handy sooner than later to have it already be translated, and it's to much of a burden to place on translators if they're translating the other three object types anyway. Aside: I think it would probably make sense to change the "NULL" entry for type_name() to be the "unknown type". I've ran into cases where type_name() was unconditionally interpolated in e.g. an sprintf() format, but let's leave that for #leftoverbits as that would be changing the behavior of the type_name() function. All of these will be new in the git.pot file, except "blob" which will be shared with a "cat-file" command-line option, see 7bcf3414535 (cat-file --textconv/--filters: allow specifying the path separately, 2016-09-09) for its introduction. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- object.c | 27 +++++++++++++++++++++++---- object.h | 1 + 2 files changed, 24 insertions(+), 4 deletions(-)