diff mbox series

[09/11] GETTEXT_POISON=rot13: do compare the output in `test_i18ncmp`

Message ID 4669ccbb1ae40f0c58a2d8e3c8b3a34d82176c7a.1610441263.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series Introduce support for GETTEXT_POISON=rot13 | expand

Commit Message

Johannes Schindelin Jan. 12, 2021, 8:47 a.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 Makefile                |  1 +
 t/helper/test-i18n.c    | 89 +++++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c    |  1 +
 t/helper/test-tool.h    |  1 +
 t/test-lib-functions.sh |  8 +++-
 t/test-lib.sh           |  1 +
 6 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-i18n.c

Comments

Junio C Hamano Jan. 12, 2021, 7:47 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +static size_t unrot13(char *buf)
> +{
> +	char *p = buf, *q = buf;
> +
> +	while (*p) {
> +		const char *begin = strstr(p, "<rot13>"), *end;

AFAIR from my reading of [02/11], the encoding side did not special
case the payload that has <ebg13> or </ebg13>; if we want to make it
reversible conversion (which is excellent improvement over the
current "# GETTEXT_POISON #" obfuscation), we'd need to do something
about it, I think.

But on second thought, nobody can prevent a caller to die(_("%s", msg));
to have "<rot13>" in the msg part, so perhaps punting like this
series does is sufficient.  I dunno.

> +		if (!begin)
> +			break;
> +
> +		while (p != begin)
> +			*(q++) = *(p++);
> +
> +		p += strlen("<rot13>");
> +		end = strstr(p, "</rot13>");
> +		if (!end)
> +			BUG("could not find </rot13> in\n%s", buf);

And the user of this looks quite straightforward and nice.

>  test_i18ncmp () {
> -	! test_have_prereq C_LOCALE_OUTPUT || test_cmp "$@"
> +	if test rot13 = "$GIT_TEST_GETTEXT_POISON"
> +	then
> +		test-tool i18n cmp "$@"
> +	elif test_have_prereq C_LOCALE_OUTPUT
> +	then
> +		test_cmp "$@"
> +	fi
>  }
Johannes Schindelin Jan. 15, 2021, 3:44 p.m. UTC | #2
Hi Junio,

On Tue, 12 Jan 2021, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > +static size_t unrot13(char *buf)
> > +{
> > +	char *p = buf, *q = buf;
> > +
> > +	while (*p) {
> > +		const char *begin = strstr(p, "<rot13>"), *end;
>
> AFAIR from my reading of [02/11], the encoding side did not special
> case the payload that has <ebg13> or </ebg13>; if we want to make it
> reversible conversion (which is excellent improvement over the
> current "# GETTEXT_POISON #" obfuscation), we'd need to do something
> about it, I think.

Do you expect any message to be translated _twice_?

Ciao,
Dscho

> But on second thought, nobody can prevent a caller to die(_("%s", msg));
> to have "<rot13>" in the msg part, so perhaps punting like this
> series does is sufficient.  I dunno.
>
> > +		if (!begin)
> > +			break;
> > +
> > +		while (p != begin)
> > +			*(q++) = *(p++);
> > +
> > +		p += strlen("<rot13>");
> > +		end = strstr(p, "</rot13>");
> > +		if (!end)
> > +			BUG("could not find </rot13> in\n%s", buf);
>
> And the user of this looks quite straightforward and nice.
>
> >  test_i18ncmp () {
> > -	! test_have_prereq C_LOCALE_OUTPUT || test_cmp "$@"
> > +	if test rot13 = "$GIT_TEST_GETTEXT_POISON"
> > +	then
> > +		test-tool i18n cmp "$@"
> > +	elif test_have_prereq C_LOCALE_OUTPUT
> > +	then
> > +		test_cmp "$@"
> > +	fi
> >  }
>
>
Junio C Hamano Jan. 15, 2021, 7:58 p.m. UTC | #3
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Tue, 12 Jan 2021, Junio C Hamano wrote:
>
>> > +static size_t unrot13(char *buf)
>> > +{
>> > +	char *p = buf, *q = buf;
>> > +
>> > +	while (*p) {
>> > +		const char *begin = strstr(p, "<rot13>"), *end;
>>
>> AFAIR from my reading of [02/11], the encoding side did not special
>> case the payload that has <ebg13> or </ebg13>; if we want to make it
>> reversible conversion (which is excellent improvement over the
>> current "# GETTEXT_POISON #" obfuscation), we'd need to do something
>> about it, I think.
>
> Do you expect any message to be translated _twice_?

Not at all.

But what I had in mind, when I wrote the above, was that the
programmers are entitled to expect that they are allowed to say:

	die(_("message with <ebg13/>, <ebg13> and <rot13> in it"));

This will be rot13'd, and the entire thing will be enclosed inside
"<rot13>...</rot13>"; I would expect that somewhere inside "..." the
receiving end that attempts to parse it by relying on these markers
will be confused.

>> But on second thought, nobody can prevent a caller to die(_("%s", msg));
>> to have "<rot13>" in the msg part, so perhaps punting like this
>> series does is sufficient.  I dunno.

And this comment still stands.
Johannes Schindelin Jan. 18, 2021, 2:36 p.m. UTC | #4
Hi Junio,

On Fri, 15 Jan 2021, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > On Tue, 12 Jan 2021, Junio C Hamano wrote:
> >
> >> > +static size_t unrot13(char *buf)
> >> > +{
> >> > +	char *p = buf, *q = buf;
> >> > +
> >> > +	while (*p) {
> >> > +		const char *begin = strstr(p, "<rot13>"), *end;
> >>
> >> AFAIR from my reading of [02/11], the encoding side did not special
> >> case the payload that has <ebg13> or </ebg13>; if we want to make it
> >> reversible conversion (which is excellent improvement over the
> >> current "# GETTEXT_POISON #" obfuscation), we'd need to do something
> >> about it, I think.
> >
> > Do you expect any message to be translated _twice_?
>
> Not at all.
>
> But what I had in mind, when I wrote the above, was that the
> programmers are entitled to expect that they are allowed to say:
>
> 	die(_("message with <ebg13/>, <ebg13> and <rot13> in it"));
>
> This will be rot13'd, and the entire thing will be enclosed inside
> "<rot13>...</rot13>"; I would expect that somewhere inside "..." the
> receiving end that attempts to parse it by relying on these markers
> will be confused.

Oh, _that_ is what you meant.

Yep, I don't expect any of Git's messages to contain "<ebg13>", ever.

> >> But on second thought, nobody can prevent a caller to die(_("%s", msg));
> >> to have "<rot13>" in the msg part, so perhaps punting like this
> >> series does is sufficient.  I dunno.
>
> And this comment still stands.

Slightly pedantic: the call would be `die(_("%s"), msg)` (note that the
`msg` is not inside the `_(...)`).

Similar to my point above, I do not expect any of Git's tests to emit a
message like that, and certainly not once/if we turn on rot13 mode in the
GETTEXT_POISON job of the CI build.

A blocker would have been if there might be the possibility that we _must_
add a test in the future that wants to emit a `msg` containing `<rot13>`.
But that seems utterly unlikely. And even if: then we'd just change the
rot13 mode to use a different needle.

Ciao,
Dscho
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 7b64106930a..d49f80cb524 100644
--- a/Makefile
+++ b/Makefile
@@ -710,6 +710,7 @@  TEST_BUILTINS_OBJS += test-genzeros.o
 TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-i18n.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-i18n.c b/t/helper/test-i18n.c
new file mode 100644
index 00000000000..4b572e6efad
--- /dev/null
+++ b/t/helper/test-i18n.c
@@ -0,0 +1,89 @@ 
+#include "test-tool.h"
+#include "cache.h"
+
+static const char *usage_msg = "\n"
+"  test-tool i18n cmp <file1> <file2>\n";
+
+static inline char do_rot13(char c)
+{
+	if (c >= 'a' && c <= 'm')
+		return c + 'n' - 'a';
+	if (c >= 'n' && c <= 'z')
+		return c + 'a' - 'n';
+	if (c >= 'A' && c <= 'M')
+		return c + 'N' - 'A';
+	if (c >= 'N' && c <= 'Z')
+		return c + 'A' - 'N';
+	return c;
+}
+
+static size_t unrot13(char *buf)
+{
+	char *p = buf, *q = buf;
+
+	while (*p) {
+		const char *begin = strstr(p, "<rot13>"), *end;
+
+		if (!begin)
+			break;
+
+		while (p != begin)
+			*(q++) = *(p++);
+
+		p += strlen("<rot13>");
+		end = strstr(p, "</rot13>");
+		if (!end)
+			BUG("could not find </rot13> in\n%s", buf);
+
+		while (p != end)
+			*(q++) = do_rot13(*(p++));
+		p += strlen("</rot13>");
+	}
+
+	while (*p)
+		*(q++) = *(p++);
+
+	return q - buf;
+}
+
+static void unrot13_strbuf(struct strbuf *buf)
+{
+	size_t len = unrot13(buf->buf);
+
+	if (len == buf->len)
+		die("not ROT13'ed:\n%s", buf->buf);
+	buf->len = len;
+}
+
+static int i18n_cmp(const char **argv)
+{
+	const char *path1 = *(argv++), *path2 = *(argv++);
+	struct strbuf a = STRBUF_INIT, b = STRBUF_INIT;
+
+	if (!path1 || !path2 || *argv)
+		usage(usage_msg);
+
+	if (strbuf_read_file(&a, path1, 0) < 0)
+		die_errno("could not read %s", path1);
+	if (strbuf_read_file(&b, path2, 0) < 0)
+		die_errno("could not read %s", path2);
+	unrot13_strbuf(&b);
+
+	if (a.len != b.len || memcmp(a.buf, b.buf, a.len))
+		return 1;
+
+	return 0;
+}
+
+int cmd__i18n(int argc, const char **argv)
+{
+	argv++;
+	if (!*argv)
+		usage(usage_msg);
+	if (!strcmp(*argv, "cmp"))
+		return i18n_cmp(argv+1);
+	else
+		usage(usage_msg);
+
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 9d6d14d9293..7e1680ac108 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -34,6 +34,7 @@  static struct test_cmd cmds[] = {
 	{ "genzeros", cmd__genzeros },
 	{ "hashmap", cmd__hashmap },
 	{ "hash-speed", cmd__hash_speed },
+	{ "i18n", cmd__i18n },
 	{ "index-version", cmd__index_version },
 	{ "json-writer", cmd__json_writer },
 	{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index a6470ff62c4..43282a520ea 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -24,6 +24,7 @@  int cmd__genrandom(int argc, const char **argv);
 int cmd__genzeros(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
 int cmd__hash_speed(int argc, const char **argv);
+int cmd__i18n(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 999982fe4a9..08731bae854 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -993,7 +993,13 @@  test_cmp_bin () {
 # under GIT_TEST_GETTEXT_POISON this pretends that the command produced expected
 # results.
 test_i18ncmp () {
-	! test_have_prereq C_LOCALE_OUTPUT || test_cmp "$@"
+	if test rot13 = "$GIT_TEST_GETTEXT_POISON"
+	then
+		test-tool i18n cmp "$@"
+	elif test_have_prereq C_LOCALE_OUTPUT
+	then
+		test_cmp "$@"
+	fi
 }
 
 # Use this instead of "grep expected-string actual" to see if the
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 9fa7c1d0f6d..c9f9e2804fd 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1537,6 +1537,7 @@  then
 fi
 
 test_lazy_prereq C_LOCALE_OUTPUT '
+	test rot13 != "$GIT_TEST_GETTEXT_POISON" &&
 	! test_bool_env GIT_TEST_GETTEXT_POISON false
 '