diff mbox series

[7/8] gettext: introduce GIT_GETTEXT_POISON=scrambled

Message ID 20181022202241.18629-8-szeder.dev@gmail.com
State New, archived
Headers show
Series [1/8] test-lib.sh: preserve GIT_GETTEXT_POISON from the environment | expand

Commit Message

SZEDER Gábor Oct. 22, 2018, 8:22 p.m. UTC
Sometimes tests run with GETTEXT POISON fail because of a reason other
than a translated string that should not have been translated.  In
such a case an error message from a git command in the test's verbose
output is usually, well, less than helpful:

  error: # GETTEXT POISON #

or

  fatal: # GETTEXT POISON #: No such file or directory

It's especially annoying on those rare occasions when a heisenbug
decides it's a good time to suddenly reveal its presence during a
GETTEXT POISON test run, and all we get is an error message like these
(yes, I did actually see both of the above error messages only once).

Make builtin commands' GETTEXT POISON-ed error messages more useful
for debugging failures by introducing a new mode of poisoning: if
$GIT_GETTEXT_POISON is set to 'scrambled', then include the original
untranslated message after that "# GETTEXT_POISON #" string in a
scrambled form, interspersing a '.' after each character.  This way
the messages will remain gibberish enough for machine consumption as
they were before, but at the same time they will be relatively easily
legible for humans.  Take extra care to preserve printf() format
conversion specifiers unaltered when inserting those dots.

Leave 'git-sh-i18n.sh' unchanged, because translatable messages in
scripts often include shell variables, and they could (though
currently they don't) include printf format specifiers, parameter
expansions, command substitutions and whatnot, too.  Dealing with
those in a shell script would be too much hassle without its worth.

There is an additional benefit: as this change considerably increases
the size of translated messages, it could detect cases when we try to
format a translated string into a too small buffer.  E.g. this change
applied on old versions causes test failures because of the bug that
was fixed in 2cfa83574c (bisect_next_all: convert xsnprintf to
xstrfmt, 2017-02-16).

[TODO: Fallout?
       A 'printf(_("foo: %s"), var);' call includes the contents of
       'var' unscrambled in the output.  Could that hide the
       translation of a string that should not have been translated?
       I'm afraid yes: to check the output of that printf() a sloppy
       test could do:

         git plumbing-cmd >out && grep "var's content" out

       which would fail in a regular GETTEXT_POISON test run, but
       would succeed in a scrambled test run.  Does this matter in
       practice, do we care at all?

       Does gettext_scramble() need a FORMAT_PRESERVING annotation?
       Seems to work fine without it so far...]

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 gettext.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 gettext.h | 11 +++++++++--
 2 files changed, 60 insertions(+), 5 deletions(-)

Comments

Duy Nguyen Oct. 23, 2018, 2:44 p.m. UTC | #1
On Mon, Oct 22, 2018 at 10:23 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
> [TODO: Fallout?
>        A 'printf(_("foo: %s"), var);' call includes the contents of
>        'var' unscrambled in the output.  Could that hide the
>        translation of a string that should not have been translated?
>        I'm afraid yes: to check the output of that printf() a sloppy
>        test could do:
>
>          git plumbing-cmd >out && grep "var's content" out
>
>        which would fail in a regular GETTEXT_POISON test run, but
>        would succeed in a scrambled test run.  Does this matter in
>        practice, do we care at all?

If var is supposed to be translated, _() must have been called before
the final string is stored in var and the content is already
scrambled. Whatever left unscrambled is machine-generated and should
be ok to grep, or we have found new strings that should be _() but
not.

PS. Another thing I'd like to have is to mark the beginning and end of
a scrambled text. For example, _("foo") produces "[f.o.o]" or
something like that. If we have it, it's easier to see "text legos"
(instead of full sentences) that makes translator's life harder. But
it could interfere with stuff (e.g. some strings must start with '#')
so let's forget it for now.

>
>        Does gettext_scramble() need a FORMAT_PRESERVING annotation?
>        Seems to work fine without it so far...]

I don't think you can. _() can be called on plain strings that just
happen to have '%' in them.
diff mbox series

Patch

diff --git a/gettext.c b/gettext.c
index c50d1e0377..8ba7fd0bea 100644
--- a/gettext.c
+++ b/gettext.c
@@ -52,13 +52,61 @@  enum poison_mode use_gettext_poison(void)
 	static enum poison_mode poison_mode = poison_mode_uninitialized;
 	if (poison_mode == poison_mode_uninitialized) {
 		const char *v = getenv("GIT_GETTEXT_POISON");
-		if (v && *v)
-			poison_mode = poison_mode_default;
-		else
+		if (v && *v) {
+			if (!strcmp(v, "scrambled"))
+				poison_mode = poison_mode_scrambled;
+			else
+				poison_mode = poison_mode_default;
+		} else
 			poison_mode = poison_mode_none;
 	}
 	return poison_mode;
 }
+
+static int conversion_specifier_len(const char *s)
+{
+	const char printf_conversion_specifiers[] = "diouxXeEfFgGaAcsCSpnm%";
+	const char *format_end;
+
+	if (*s != '%')
+		return 0;
+
+	format_end = strpbrk(s + 1, printf_conversion_specifiers);
+	if (format_end)
+		return format_end - s;
+	else
+		return 0;
+}
+
+const char *gettext_scramble(const char *msg)
+{
+	struct strbuf sb;
+
+	strbuf_init(&sb,
+		    /* "# GETTEXT_POISON #" + ' ' + "m.e.s.s.a.g.e." + '\0' */
+		    strlen(GETTEXT_POISON_MAGIC) + 1 + 2 * strlen(msg) + 1);
+
+	strbuf_addch(&sb, ' ');
+	while (*msg) {
+		if (*msg == '\n') {
+			strbuf_addch(&sb, *(msg++));
+			continue;
+		} else if (*msg == '%') {
+			int spec_len = conversion_specifier_len(msg);
+			if (spec_len) {
+				strbuf_add(&sb, msg, spec_len);
+				msg += spec_len;
+				continue;
+			}
+		}
+
+		strbuf_addch(&sb, *(msg++));
+		strbuf_addch(&sb, '.');
+	}
+
+	/* This will be leaked... */
+	return strbuf_detach(&sb, NULL);
+}
 #endif
 
 #ifndef NO_GETTEXT
diff --git a/gettext.h b/gettext.h
index fcb6bfaa2c..d21346d9fa 100644
--- a/gettext.h
+++ b/gettext.h
@@ -45,10 +45,12 @@  static inline int gettext_width(const char *s)
 enum poison_mode {
 	poison_mode_uninitialized = -1,
 	poison_mode_none = 0,
-	poison_mode_default
+	poison_mode_default,
+	poison_mode_scrambled
 };
 
 extern enum poison_mode use_gettext_poison(void);
+extern const char *gettext_scramble(const char *msg);
 
 #define GETTEXT_POISON_MAGIC "# GETTEXT POISON #"
 #endif
@@ -60,6 +62,8 @@  static inline FORMAT_PRESERVING(1) const char *_(const char *msgid)
 #ifdef GETTEXT_POISON
 	if (use_gettext_poison() == poison_mode_default)
 		return GETTEXT_POISON_MAGIC;
+	else if (use_gettext_poison() == poison_mode_scrambled)
+		return gettext_scramble(gettext(msgid));
 #endif
 	return gettext(msgid);
 }
@@ -67,11 +71,14 @@  static inline FORMAT_PRESERVING(1) const char *_(const char *msgid)
 static inline FORMAT_PRESERVING(1) FORMAT_PRESERVING(2)
 const char *Q_(const char *msgid, const char *plu, unsigned long n)
 {
+	const char *msg = ngettext(msgid, plu, n);
 #ifdef GETTEXT_POISON
 	if (use_gettext_poison() == poison_mode_default)
 		return GETTEXT_POISON_MAGIC;
+	else if (use_gettext_poison() == poison_mode_scrambled)
+		return gettext_scramble(msg);
 #endif
-	return ngettext(msgid, plu, n);
+	return msg;
 }
 
 /* Mark msgid for translation but do not translate it. */