Message ID | 4669ccbb1ae40f0c58a2d8e3c8b3a34d82176c7a.1610441263.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce support for GETTEXT_POISON=rot13 | expand |
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes: > +static size_t unrot13(char *buf) > +{ > + char *p = buf, *q = buf; > + > + while (*p) { > + const char *begin = strstr(p, "<rot13>"), *end; AFAIR from my reading of [02/11], the encoding side did not special case the payload that has <ebg13> or </ebg13>; if we want to make it reversible conversion (which is excellent improvement over the current "# GETTEXT_POISON #" obfuscation), we'd need to do something about it, I think. But on second thought, nobody can prevent a caller to die(_("%s", msg)); to have "<rot13>" in the msg part, so perhaps punting like this series does is sufficient. I dunno. > + if (!begin) > + break; > + > + while (p != begin) > + *(q++) = *(p++); > + > + p += strlen("<rot13>"); > + end = strstr(p, "</rot13>"); > + if (!end) > + BUG("could not find </rot13> in\n%s", buf); And the user of this looks quite straightforward and nice. > test_i18ncmp () { > - ! test_have_prereq C_LOCALE_OUTPUT || test_cmp "$@" > + if test rot13 = "$GIT_TEST_GETTEXT_POISON" > + then > + test-tool i18n cmp "$@" > + elif test_have_prereq C_LOCALE_OUTPUT > + then > + test_cmp "$@" > + fi > }
Hi Junio, On Tue, 12 Jan 2021, Junio C Hamano wrote: > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> > writes: > > > +static size_t unrot13(char *buf) > > +{ > > + char *p = buf, *q = buf; > > + > > + while (*p) { > > + const char *begin = strstr(p, "<rot13>"), *end; > > AFAIR from my reading of [02/11], the encoding side did not special > case the payload that has <ebg13> or </ebg13>; if we want to make it > reversible conversion (which is excellent improvement over the > current "# GETTEXT_POISON #" obfuscation), we'd need to do something > about it, I think. Do you expect any message to be translated _twice_? Ciao, Dscho > But on second thought, nobody can prevent a caller to die(_("%s", msg)); > to have "<rot13>" in the msg part, so perhaps punting like this > series does is sufficient. I dunno. > > > + if (!begin) > > + break; > > + > > + while (p != begin) > > + *(q++) = *(p++); > > + > > + p += strlen("<rot13>"); > > + end = strstr(p, "</rot13>"); > > + if (!end) > > + BUG("could not find </rot13> in\n%s", buf); > > And the user of this looks quite straightforward and nice. > > > test_i18ncmp () { > > - ! test_have_prereq C_LOCALE_OUTPUT || test_cmp "$@" > > + if test rot13 = "$GIT_TEST_GETTEXT_POISON" > > + then > > + test-tool i18n cmp "$@" > > + elif test_have_prereq C_LOCALE_OUTPUT > > + then > > + test_cmp "$@" > > + fi > > } > >
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Tue, 12 Jan 2021, Junio C Hamano wrote: > >> > +static size_t unrot13(char *buf) >> > +{ >> > + char *p = buf, *q = buf; >> > + >> > + while (*p) { >> > + const char *begin = strstr(p, "<rot13>"), *end; >> >> AFAIR from my reading of [02/11], the encoding side did not special >> case the payload that has <ebg13> or </ebg13>; if we want to make it >> reversible conversion (which is excellent improvement over the >> current "# GETTEXT_POISON #" obfuscation), we'd need to do something >> about it, I think. > > Do you expect any message to be translated _twice_? Not at all. But what I had in mind, when I wrote the above, was that the programmers are entitled to expect that they are allowed to say: die(_("message with <ebg13/>, <ebg13> and <rot13> in it")); This will be rot13'd, and the entire thing will be enclosed inside "<rot13>...</rot13>"; I would expect that somewhere inside "..." the receiving end that attempts to parse it by relying on these markers will be confused. >> But on second thought, nobody can prevent a caller to die(_("%s", msg)); >> to have "<rot13>" in the msg part, so perhaps punting like this >> series does is sufficient. I dunno. And this comment still stands.
Hi Junio, On Fri, 15 Jan 2021, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > On Tue, 12 Jan 2021, Junio C Hamano wrote: > > > >> > +static size_t unrot13(char *buf) > >> > +{ > >> > + char *p = buf, *q = buf; > >> > + > >> > + while (*p) { > >> > + const char *begin = strstr(p, "<rot13>"), *end; > >> > >> AFAIR from my reading of [02/11], the encoding side did not special > >> case the payload that has <ebg13> or </ebg13>; if we want to make it > >> reversible conversion (which is excellent improvement over the > >> current "# GETTEXT_POISON #" obfuscation), we'd need to do something > >> about it, I think. > > > > Do you expect any message to be translated _twice_? > > Not at all. > > But what I had in mind, when I wrote the above, was that the > programmers are entitled to expect that they are allowed to say: > > die(_("message with <ebg13/>, <ebg13> and <rot13> in it")); > > This will be rot13'd, and the entire thing will be enclosed inside > "<rot13>...</rot13>"; I would expect that somewhere inside "..." the > receiving end that attempts to parse it by relying on these markers > will be confused. Oh, _that_ is what you meant. Yep, I don't expect any of Git's messages to contain "<ebg13>", ever. > >> But on second thought, nobody can prevent a caller to die(_("%s", msg)); > >> to have "<rot13>" in the msg part, so perhaps punting like this > >> series does is sufficient. I dunno. > > And this comment still stands. Slightly pedantic: the call would be `die(_("%s"), msg)` (note that the `msg` is not inside the `_(...)`). Similar to my point above, I do not expect any of Git's tests to emit a message like that, and certainly not once/if we turn on rot13 mode in the GETTEXT_POISON job of the CI build. A blocker would have been if there might be the possibility that we _must_ add a test in the future that wants to emit a `msg` containing `<rot13>`. But that seems utterly unlikely. And even if: then we'd just change the rot13 mode to use a different needle. Ciao, Dscho
diff --git a/Makefile b/Makefile index 7b64106930a..d49f80cb524 100644 --- a/Makefile +++ b/Makefile @@ -710,6 +710,7 @@ TEST_BUILTINS_OBJS += test-genzeros.o TEST_BUILTINS_OBJS += test-hash-speed.o TEST_BUILTINS_OBJS += test-hash.o TEST_BUILTINS_OBJS += test-hashmap.o +TEST_BUILTINS_OBJS += test-i18n.o TEST_BUILTINS_OBJS += test-index-version.o TEST_BUILTINS_OBJS += test-json-writer.o TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o diff --git a/t/helper/test-i18n.c b/t/helper/test-i18n.c new file mode 100644 index 00000000000..4b572e6efad --- /dev/null +++ b/t/helper/test-i18n.c @@ -0,0 +1,89 @@ +#include "test-tool.h" +#include "cache.h" + +static const char *usage_msg = "\n" +" test-tool i18n cmp <file1> <file2>\n"; + +static inline char do_rot13(char c) +{ + if (c >= 'a' && c <= 'm') + return c + 'n' - 'a'; + if (c >= 'n' && c <= 'z') + return c + 'a' - 'n'; + if (c >= 'A' && c <= 'M') + return c + 'N' - 'A'; + if (c >= 'N' && c <= 'Z') + return c + 'A' - 'N'; + return c; +} + +static size_t unrot13(char *buf) +{ + char *p = buf, *q = buf; + + while (*p) { + const char *begin = strstr(p, "<rot13>"), *end; + + if (!begin) + break; + + while (p != begin) + *(q++) = *(p++); + + p += strlen("<rot13>"); + end = strstr(p, "</rot13>"); + if (!end) + BUG("could not find </rot13> in\n%s", buf); + + while (p != end) + *(q++) = do_rot13(*(p++)); + p += strlen("</rot13>"); + } + + while (*p) + *(q++) = *(p++); + + return q - buf; +} + +static void unrot13_strbuf(struct strbuf *buf) +{ + size_t len = unrot13(buf->buf); + + if (len == buf->len) + die("not ROT13'ed:\n%s", buf->buf); + buf->len = len; +} + +static int i18n_cmp(const char **argv) +{ + const char *path1 = *(argv++), *path2 = *(argv++); + struct strbuf a = STRBUF_INIT, b = STRBUF_INIT; + + if (!path1 || !path2 || *argv) + usage(usage_msg); + + if (strbuf_read_file(&a, path1, 0) < 0) + die_errno("could not read %s", path1); + if (strbuf_read_file(&b, path2, 0) < 0) + die_errno("could not read %s", path2); + unrot13_strbuf(&b); + + if (a.len != b.len || memcmp(a.buf, b.buf, a.len)) + return 1; + + return 0; +} + +int cmd__i18n(int argc, const char **argv) +{ + argv++; + if (!*argv) + usage(usage_msg); + if (!strcmp(*argv, "cmp")) + return i18n_cmp(argv+1); + else + usage(usage_msg); + + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 9d6d14d9293..7e1680ac108 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -34,6 +34,7 @@ static struct test_cmd cmds[] = { { "genzeros", cmd__genzeros }, { "hashmap", cmd__hashmap }, { "hash-speed", cmd__hash_speed }, + { "i18n", cmd__i18n }, { "index-version", cmd__index_version }, { "json-writer", cmd__json_writer }, { "lazy-init-name-hash", cmd__lazy_init_name_hash }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index a6470ff62c4..43282a520ea 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -24,6 +24,7 @@ int cmd__genrandom(int argc, const char **argv); int cmd__genzeros(int argc, const char **argv); int cmd__hashmap(int argc, const char **argv); int cmd__hash_speed(int argc, const char **argv); +int cmd__i18n(int argc, const char **argv); int cmd__index_version(int argc, const char **argv); int cmd__json_writer(int argc, const char **argv); int cmd__lazy_init_name_hash(int argc, const char **argv); diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 999982fe4a9..08731bae854 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -993,7 +993,13 @@ test_cmp_bin () { # under GIT_TEST_GETTEXT_POISON this pretends that the command produced expected # results. test_i18ncmp () { - ! test_have_prereq C_LOCALE_OUTPUT || test_cmp "$@" + if test rot13 = "$GIT_TEST_GETTEXT_POISON" + then + test-tool i18n cmp "$@" + elif test_have_prereq C_LOCALE_OUTPUT + then + test_cmp "$@" + fi } # Use this instead of "grep expected-string actual" to see if the diff --git a/t/test-lib.sh b/t/test-lib.sh index 9fa7c1d0f6d..c9f9e2804fd 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1537,6 +1537,7 @@ then fi test_lazy_prereq C_LOCALE_OUTPUT ' + test rot13 != "$GIT_TEST_GETTEXT_POISON" && ! test_bool_env GIT_TEST_GETTEXT_POISON false '