diff mbox series

[v2,1/2] object.[ch]: mark object type names for translation

Message ID patch-v2-1.2-55bde16aa23-20211004T142523Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series i18n: improve translatability of ambiguous object output | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 4, 2021, 2:27 p.m. UTC
Mark the "commit", "tree", "blob" and "tag" types for translation, and
add an extern "unknown type" string for the OBJ_NONE case.

It is usually bad practice to translate individual words like this,
but for e.g. the list list output emitted by the "short object ID dead
is ambiguous" advice it makes sense.

A subsequent commit will make that output translatable, and use these
translation markings to do so. Well, we won't use "commit", but let's
mark it up anyway for consistency. It'll probably come in handy sooner
than later to have it already be translated, and it's to much of a
burden to place on translators if they're translating the other three
object types anyway.

Aside: I think it would probably make sense to change the "NULL" entry
for type_name() to be the "unknown type". I've ran into cases where
type_name() was unconditionally interpolated in e.g. an sprintf()
format, but let's leave that for #leftoverbits as that would be
changing the behavior of the type_name() function.

All of these will be new in the git.pot file, except "blob" which will
be shared with a "cat-file" command-line option, see
7bcf3414535 (cat-file --textconv/--filters: allow specifying the path
separately, 2016-09-09) for its introduction.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 27 +++++++++++++++++++++++----
 object.h |  1 +
 2 files changed, 24 insertions(+), 4 deletions(-)

Comments

Eric Sunshine Oct. 4, 2021, 6:54 p.m. UTC | #1
On Mon, Oct 4, 2021 at 10:27 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Mark the "commit", "tree", "blob" and "tag" types for translation, and
> add an extern "unknown type" string for the OBJ_NONE case.
>
> It is usually bad practice to translate individual words like this,
> but for e.g. the list list output emitted by the "short object ID dead

"list list"?

> is ambiguous" advice it makes sense.
>
> A subsequent commit will make that output translatable, and use these
> translation markings to do so. Well, we won't use "commit", but let's
> mark it up anyway for consistency. It'll probably come in handy sooner
> than later to have it already be translated, and it's to much of a
> burden to place on translators if they're translating the other three
> object types anyway.

At first I thought you meant s/to much/too much/, but that doesn't
seem to make sense (unless I'm misunderstanding), so perhaps you mean
s/to/not/.

> Aside: I think it would probably make sense to change the "NULL" entry
> for type_name() to be the "unknown type". I've ran into cases where
> type_name() was unconditionally interpolated in e.g. an sprintf()
> format, but let's leave that for #leftoverbits as that would be
> changing the behavior of the type_name() function.
>
> All of these will be new in the git.pot file, except "blob" which will
> be shared with a "cat-file" command-line option, see
> 7bcf3414535 (cat-file --textconv/--filters: allow specifying the path
> separately, 2016-09-09) for its introduction.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Bagas Sanjaya Oct. 5, 2021, 9:37 a.m. UTC | #2
On 04/10/21 21.27, Ævar Arnfjörð Bjarmason wrote:
>   static const char *object_type_strings[] = {
>   	NULL,		/* OBJ_NONE = 0 */
> -	"commit",	/* OBJ_COMMIT = 1 */
> -	"tree",		/* OBJ_TREE = 2 */
> -	"blob",		/* OBJ_BLOB = 3 */
> -	"tag",		/* OBJ_TAG = 4 */
> +	/*
> +	 * TRANSLATORS: "commit", "tree", "blob" and "tag" are the
> +	 * name of Git's object types. These names are interpolated
> +	 * stand-alone when doing so is unambiguous for translation
> +	 * and doesn't require extra context. E.g. as part of an
> +	 * already-translated string that needs to have a type name
> +	 * quoted verbatim, or the short description of a command-line
> +	 * option expecting a given type.
> +	 */
> +	N_("commit"),	/* OBJ_COMMIT = 1 */
> +	N_("tree"),	/* OBJ_TREE = 2 */
> +	N_("blob"),	/* OBJ_BLOB = 3 */
> +	N_("tag"),	/* OBJ_TAG = 4 */
>   };
>   

Are these object type names safe for translating? (e.g. can they be 
translatable without affecting private API string, which aren't 
translatable)?

> +/*
> + * TRANSLATORS: This is the short type name of an object that's not
> + * one of Git's known object types, as opposed to "commit", "tree",
> + * "blob" and "tag" above.
> + *
> + * A user is unlikely to ever encounter these, but they can be
> + * manually created with "git hash-object --literally".
> + */
> +const char *unknown_type = N_("unknown type");
> +
>   const char *type_name(unsigned int type)

Did you mean that "unknown type" is generic shorthand?
Ævar Arnfjörð Bjarmason Oct. 5, 2021, 3:52 p.m. UTC | #3
On Tue, Oct 05 2021, Bagas Sanjaya wrote:

> On 04/10/21 21.27, Ævar Arnfjörð Bjarmason wrote:
>>   static const char *object_type_strings[] = {
>>   	NULL,		/* OBJ_NONE = 0 */
>> -	"commit",	/* OBJ_COMMIT = 1 */
>> -	"tree",		/* OBJ_TREE = 2 */
>> -	"blob",		/* OBJ_BLOB = 3 */
>> -	"tag",		/* OBJ_TAG = 4 */
>> +	/*
>> +	 * TRANSLATORS: "commit", "tree", "blob" and "tag" are the
>> +	 * name of Git's object types. These names are interpolated
>> +	 * stand-alone when doing so is unambiguous for translation
>> +	 * and doesn't require extra context. E.g. as part of an
>> +	 * already-translated string that needs to have a type name
>> +	 * quoted verbatim, or the short description of a command-line
>> +	 * option expecting a given type.
>> +	 */
>> +	N_("commit"),	/* OBJ_COMMIT = 1 */
>> +	N_("tree"),	/* OBJ_TREE = 2 */
>> +	N_("blob"),	/* OBJ_BLOB = 3 */
>> +	N_("tag"),	/* OBJ_TAG = 4 */
>>   };
>>   
>
> Are these object type names safe for translating? (e.g. can they be
> translatable without affecting private API string, which aren't 
> translatable)?

Yes, the N_() macro is always a noop. It's just there so the i18n
tooling knows to pick up these strings and drop them into
po/git.pot. See po/README.md for details.

It does change the behavior of any code that later does
_(type_name(type)), as the string will then (potentially) be found in
the *.mo files, but as shown in 2/2 that needs to be added to each
callsite manually. So we're not going to translate "ls-tree" output or
whatever just because it has "tree" etc. in it.

>> +/*
>> + * TRANSLATORS: This is the short type name of an object that's not
>> + * one of Git's known object types, as opposed to "commit", "tree",
>> + * "blob" and "tag" above.
>> + *
>> + * A user is unlikely to ever encounter these, but they can be
>> + * manually created with "git hash-object --literally".
>> + */
>> +const char *unknown_type = N_("unknown type");
>> +
>>   const char *type_name(unsigned int type)
>
> Did you mean that "unknown type" is generic shorthand?

Yes, we could get the actual type name here, but it's a bit of a pain,
and as noted in 2/2 this code doesn't work anyway (which pre-dates this
series).

But I'll see if I'll remember to loop around to fixing it after my
fsck/object library fixes related to this land, but for now just marking
this for translation makes senes I think.
Jeff King Oct. 6, 2021, 7:05 p.m. UTC | #4
On Mon, Oct 04, 2021 at 04:27:01PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Mark the "commit", "tree", "blob" and "tag" types for translation, and
> add an extern "unknown type" string for the OBJ_NONE case.
> 
> It is usually bad practice to translate individual words like this,
> but for e.g. the list list output emitted by the "short object ID dead
> is ambiguous" advice it makes sense.

We already seem to have a translatable string for "commit", but if I
look at say es.po, the translation is "confirmar", which is considering
it a verb. Now my Spanish is pretty rusty, so it's possible this works
as a noun, too. But if I look at other messages, like:

  #: builtin/commit.c:1623
  msgid "override date for commit"
  msgstr "sobrescribe la fecha del commit"

  #: builtin/commit.c:1626
  msgid "reuse message from specified commit"
  msgstr "reusar el mensaje de un commit específico"

then it's clear that "commit" as a noun is translated as "commit". I'm
not sure what facilities (if any) there are in gettext for having the
same string in different contexts.

I do note that this is already a problem. Of the five spots listed:

  #: builtin/commit.c:1625 builtin/commit.c:1626 builtin/commit.c:1632
  #: parse-options.h:329 ref-filter.h:90
  msgid "commit"
  msgstr "confirmar"

They all appear to want is as a noun. So maybe this is just
mis-translated for Spanish. It does feel like an accident in the making,
though.

> A subsequent commit will make that output translatable, and use these
> translation markings to do so. Well, we won't use "commit", but let's
> mark it up anyway for consistency. It'll probably come in handy sooner
> than later to have it already be translated, and it's to much of a
> burden to place on translators if they're translating the other three
> object types anyway.

I do wonder how useful it is to translate these type names in general.
Especially as used in this series, they're really technical terms, and
you are not going to escape the name "git commit" as a command. But I
don't ever use translated Git, so I'm not sure my opinion is all that
meaningful there.

> Aside: I think it would probably make sense to change the "NULL" entry
> for type_name() to be the "unknown type". I've ran into cases where
> type_name() was unconditionally interpolated in e.g. an sprintf()
> format, but let's leave that for #leftoverbits as that would be
> changing the behavior of the type_name() function.

IMHO this would be a bad idea. Even if there is a spot that uses the
result without checking for NULL, I'd much rather have Git segfault than
say, write out an object with a bogus name (as it would in index_mem(),
for example). So you really have to look over every caller, at which
point you may as well adjust the ones that aren't checking for NULL.

Now if you introduced type_name_human(), which auto-translated and
converted NULL to "unknown", then that would be easy to plug in
appropriately as you audited the callers.

>  static const char *object_type_strings[] = {
>  	NULL,		/* OBJ_NONE = 0 */
> -	"commit",	/* OBJ_COMMIT = 1 */
> -	"tree",		/* OBJ_TREE = 2 */
> -	"blob",		/* OBJ_BLOB = 3 */
> -	"tag",		/* OBJ_TAG = 4 */
> +	/*
> +	 * TRANSLATORS: "commit", "tree", "blob" and "tag" are the
> +	 * name of Git's object types. These names are interpolated
> +	 * stand-alone when doing so is unambiguous for translation
> +	 * and doesn't require extra context. E.g. as part of an
> +	 * already-translated string that needs to have a type name
> +	 * quoted verbatim, or the short description of a command-line
> +	 * option expecting a given type.
> +	 */
> +	N_("commit"),	/* OBJ_COMMIT = 1 */
> +	N_("tree"),	/* OBJ_TREE = 2 */
> +	N_("blob"),	/* OBJ_BLOB = 3 */
> +	N_("tag"),	/* OBJ_TAG = 4 */
>  };

This does make me feel slightly uneasy, just because so many parts of
Git rely on these _not_ being translated. But I see in your other
response that N_() really does nothing. So aside from possibly
misleading readers of the code, I think this is probably OK.

-Peff
Junio C Hamano Oct. 6, 2021, 7:46 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

> They all appear to want is as a noun. So maybe this is just
> mis-translated for Spanish. It does feel like an accident in the making,
> though.

Probably we need pgettext().

https://www.gnu.org/software/gettext/manual/html_node/Contexts.html

> I do wonder how useful it is to translate these type names in general.
> Especially as used in this series, they're really technical terms, and
> you are not going to escape the name "git commit" as a command.

I share the same feeling (I do not use translated git, either).

> Now if you introduced type_name_human(), which auto-translated and
> converted NULL to "unknown", then that would be easy to plug in
> appropriately as you audited the callers.

Yes.

>
>>  static const char *object_type_strings[] = {
>> ...
>> +	N_("commit"),	/* OBJ_COMMIT = 1 */
>> +	N_("tree"),	/* OBJ_TREE = 2 */
>> +	N_("blob"),	/* OBJ_BLOB = 3 */
>> +	N_("tag"),	/* OBJ_TAG = 4 */
>>  };
>
> This does make me feel slightly uneasy, just because so many parts of
> Git rely on these _not_ being translated. But I see in your other
> response that N_() really does nothing. So aside from possibly
> misleading readers of the code, I think this is probably OK.

Yes, this may be scary looking but the least risky part of this
patch, as N_() is no-op at runtime ;-).
Jeff King Oct. 6, 2021, 8:38 p.m. UTC | #6
On Wed, Oct 06, 2021 at 12:46:12PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > They all appear to want is as a noun. So maybe this is just
> > mis-translated for Spanish. It does feel like an accident in the making,
> > though.
> 
> Probably we need pgettext().
> 
> https://www.gnu.org/software/gettext/manual/html_node/Contexts.html

Yeah, that make sense. I'm not sure how it interacts with N_(), though.
I.e., I'd expect the "context" to ride along with the original string,
but I guess it is really in the caller who's translating it. So the real
spot becomes:

  printf(_("my type is %s"), pgettext("object-type", type_name(type)));

It's a little unfortunate that every caller has to do it rather than
putting it near the source string. But I guess a type_name_human() would
solve that, too. ;)

-Peff
Junio C Hamano Oct. 7, 2021, 6:06 p.m. UTC | #7
Jeff King <peff@peff.net> writes:

> On Wed, Oct 06, 2021 at 12:46:12PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > They all appear to want is as a noun. So maybe this is just
>> > mis-translated for Spanish. It does feel like an accident in the making,
>> > though.
>> 
>> Probably we need pgettext().
>> 
>> https://www.gnu.org/software/gettext/manual/html_node/Contexts.html
>
> Yeah, that make sense. I'm not sure how it interacts with N_(), though.
> I.e., I'd expect the "context" to ride along with the original string,
> but I guess it is really in the caller who's translating it. So the real
> spot becomes:
>
>   printf(_("my type is %s"), pgettext("object-type", type_name(type)));
>
> It's a little unfortunate that every caller has to do it rather than
> putting it near the source string. But I guess a type_name_human() would
> solve that, too. ;)

Yes, I agree the need for pgettext() is annoying but I do not see an
easy alternative.  Introducing a wrapper like type_name_human() to
limit the damage sounds like the best we could do.

Thanks.
diff mbox series

Patch

diff --git a/object.c b/object.c
index 4e85955a941..47dbe0d8a2a 100644
--- a/object.c
+++ b/object.c
@@ -22,12 +22,31 @@  struct object *get_indexed_object(unsigned int idx)
 
 static const char *object_type_strings[] = {
 	NULL,		/* OBJ_NONE = 0 */
-	"commit",	/* OBJ_COMMIT = 1 */
-	"tree",		/* OBJ_TREE = 2 */
-	"blob",		/* OBJ_BLOB = 3 */
-	"tag",		/* OBJ_TAG = 4 */
+	/*
+	 * TRANSLATORS: "commit", "tree", "blob" and "tag" are the
+	 * name of Git's object types. These names are interpolated
+	 * stand-alone when doing so is unambiguous for translation
+	 * and doesn't require extra context. E.g. as part of an
+	 * already-translated string that needs to have a type name
+	 * quoted verbatim, or the short description of a command-line
+	 * option expecting a given type.
+	 */
+	N_("commit"),	/* OBJ_COMMIT = 1 */
+	N_("tree"),	/* OBJ_TREE = 2 */
+	N_("blob"),	/* OBJ_BLOB = 3 */
+	N_("tag"),	/* OBJ_TAG = 4 */
 };
 
+/*
+ * TRANSLATORS: This is the short type name of an object that's not
+ * one of Git's known object types, as opposed to "commit", "tree",
+ * "blob" and "tag" above.
+ *
+ * A user is unlikely to ever encounter these, but they can be
+ * manually created with "git hash-object --literally".
+ */
+const char *unknown_type = N_("unknown type");
+
 const char *type_name(unsigned int type)
 {
 	if (type >= ARRAY_SIZE(object_type_strings))
diff --git a/object.h b/object.h
index 549f2d256bc..0510dc4b3ea 100644
--- a/object.h
+++ b/object.h
@@ -91,6 +91,7 @@  struct object {
 	struct object_id oid;
 };
 
+extern const char *unknown_type;
 const char *type_name(unsigned int type);
 int type_from_string_gently(const char *str, ssize_t, int gentle);
 #define type_from_string(str) type_from_string_gently(str, -1, 0)