Message ID | pull.836.git.1610441262.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce support for GETTEXT_POISON=rot13 | expand |
On Tue, Jan 12, 2021 at 08:47:31AM +0000, Johannes Schindelin via GitGitGadget wrote: > Ævar suggested recently > [https://lore.kernel.org/git/xmqqim836v6m.fsf@gitster.c.googlers.com/T/#m6fdc43d4f1eb3f20f841096c59e985b69c84875e] > that this whole GETTEXT_POISON business is totally useless. > > I do not believe that it is useless. To back up my belief with something > tangible, I implemented a GETTEXT_POISON=rot13 mode and ran the test suite > to see whether we erroneously expect translated messages where they aren't, > and related problems. > > And the experiment delivered. It is just a demonstration (I only addressed a > handful of the test suite failures, 124 test scripts still need to be > inspected to find out why they fail), of course. Yet I think that finding > e.g. the missing translations of sha1dc-cb and parse-options show that it > would be very useful to complete this patch series and then use the rot13 > mode in our CI jobs (instead of the current GETTEXT_POISON=true mode, which > really is less useful). I'm not entirely convinced by this. The original point of the poison code was not to find opportunities to translate strings, but to make sure we did not accidentally translate a string that some script was relying on. And I don't see any fixes for the latter here (and as Ævar suggested in the linked thread, the fact that we're not combing through existing code looking for translations makes such an error a lot less likely). Which isn't to say repurposing it in the other direction might not be worthwhile. But I suspect a lot of the test failures are just false positives. Until now it was always reasonable to conservatively use test_i18ngrep for cases which could reasonably translated, even if they were not yet. Likewise, I'm not sure that one can reliably rot13 an output for test_i18ncmp. It could contain mixed translated and untranslated bits from different messages. So I dunno. You did find two spots where translations could be used. But if nobody actually saw them to care that they got translated, were they important? I may be a bit biased as somebody who would not use the translations in the first place, of course. -Peff
On Tue, Jan 12 2021, Johannes Schindelin via GitGitGadget wrote: > This patch series is incomplete because I lack the time to finish it, and I > hope that there are interested developers who can help with it. > > In https://lore.kernel.org/git/20181022153633.31757-1-pclouds@gmail.com/, > Duy proposed introducing a fake language they called "Ook" to be able to > test translations better. "Ook" would "translate" messages without any > printf format specifiers to "Eek", otherwise convert all upper-case letters > to "Ook" and lower-case letters to "ook". > > Gábor replied with a different proposal that has the advantage of being > bijective, i.e. it is possible to reconstruct the untranslated message from > the translated one. > > Ævar suggested recently > [https://lore.kernel.org/git/xmqqim836v6m.fsf@gitster.c.googlers.com/T/#m6fdc43d4f1eb3f20f841096c59e985b69c84875e] > that this whole GETTEXT_POISON business is totally useless. To clarify what I really mean, but admittedly am not always the best at articulating: I do not think GETTEXT_POISON is useless, it's useful. The interesting question is "useful enough to bother?", or in economics terms "is this investment in time/money worth the opportunity cost?". Any hoops we make people jump through when writing tests takes away time and attention from other things. My motivation here is that I feel this whole poison business is all my fault. Every time some newbie submits a patch and needs to re-roll a v2 because they used "grep" instead of "test_i18ngrep" in a case that clearly didn't involve a plumbing string I cringe a little about the technical debt. So while I still think my series (just nuke it) would be better overall, I'm secretly rooting for yours. If the consensus is that this is worth keeping and improving perhaps we haven't been mostly wasting our time :) On to discussing this series: > I do not believe that it is useless. To back up my belief with something > tangible, I implemented a GETTEXT_POISON=rot13 mode and ran the test suite > to see whether we erroneously expect translated messages where they aren't, > and related problems. I agree that rot13ing it makes it much more useful in the sense that before we could over-tag things with test_i18n* and we'd just "return 0" there in the poison mode, now we actually check if the string is poisoned. We could get most of the way there by checking e.g. if "GETTEXT POISON" is in the output, but it wouldn't assert that the message is 100% translated as this WIP series does. In our off-list discussion you maintained that "GIT_TEST_GETTEXT_POISON=false <cmd>" was an anti-pattern. Was it because you thought "test_i18n*" et al was actually doing some "is poison?" comparison as we now do, or just because you didn't conceptually like a pattern like: GIT_TEST_GETTEXT_POISON=false git cmd >out && grep string out Being changed later to: GIT_TEST_GETTEXT_POISON=false git cmd >out && grep string out && grep plumbing-string out Or whatever, as opposed to: git cmd >out && test_i18ngrep string out && test_i18ngrep plumbing-string out I get the aesthetic preference, I'm just wondering if I'm missing something else about it... > And the experiment delivered. It is just a demonstration (I only addressed a > handful of the test suite failures, 124 test scripts still need to be > inspected to find out why they fail), of course. Yet I think that finding > e.g. the missing translations of sha1dc-cb and parse-options show that it > would be very useful to complete this patch series and then use the rot13 > mode in our CI jobs (instead of the current GETTEXT_POISON=true mode, which > really is less useful). The following patch on top cuts down on the failures by more than half: ----------------- diff --git a/git-sh-i18n.sh b/git-sh-i18n.sh index 8eef60b43f..f9239d2917 100644 --- a/git-sh-i18n.sh +++ b/git-sh-i18n.sh @@ -17,7 +17,10 @@ export TEXTDOMAINDIR # First decide what scheme to use... GIT_INTERNAL_GETTEXT_SH_SCHEME=fallthrough -if test -n "$GIT_TEST_GETTEXT_POISON" && +if test -n "$GIT_TEST_GETTEXT_POISON" -a "$GIT_TEST_GETTEXT_POISON" = "rot13" +then + GIT_INTERNAL_GETTEXT_SH_SCHEME=rot13poison +elif test -n "$GIT_TEST_GETTEXT_POISON" && git env--helper --type=bool --default=0 --exit-code \ GIT_TEST_GETTEXT_POISON then @@ -63,6 +66,21 @@ gettext_without_eval_gettext) ) } ;; +rot13poison) + # Emit garbage so that tests that incorrectly rely on translatable + # strings will fail. + gettext () { + printf "%s" "# SHELL GETTEXT POISON #" + } + + eval_gettext () { + printf "%s" "# SHELL GETTEXT POISON #" + } + + eval_ngettext () { + printf "%s" "# SHELL GETTEXT POISON #" + } + ;; poison) # Emit garbage so that tests that incorrectly rely on translatable # strings will fail. diff --git a/t/helper/test-i18n.c b/t/helper/test-i18n.c index 82efc66f1f..1f0fa3d041 100644 --- a/t/helper/test-i18n.c +++ b/t/helper/test-i18n.c @@ -1,6 +1,8 @@ #include "test-tool.h" #include "cache.h" #include "grep.h" +#include "config.h" static const char *usage_msg = "\n" " test-tool i18n cmp <file1> <file2>\n" @@ -34,8 +36,12 @@ static size_t unrot13(char *buf) p += strlen("<rot13>"); end = strstr(p, "</rot13>"); - if (!end) - BUG("could not find </rot13> in\n%s", buf); + if (!end) { + if (git_env_bool("GIT_TEST_GETTEXT_POISON_PEDANTIC", 0)) + BUG("could not find </rot13> in\n%s", buf); + else + break; + } while (p != end) *(q++) = do_rot13(*(p++)); @@ -67,8 +73,12 @@ static int i18n_cmp(const char **argv) if (strbuf_read_file(&a, path1, 0) < 0) die_errno("could not read %s", path1); + if (strstr(a.buf, "# SHELL GETTEXT POISON #")) + return 0; if (strbuf_read_file(&b, path2, 0) < 0) die_errno("could not read %s", path2); + if (strstr(b.buf, "# SHELL GETTEXT POISON #")) + return 0; unrot13_strbuf(&b); if (a.len != b.len || memcmp(a.buf, b.buf, a.len)) @@ -79,7 +89,6 @@ static int i18n_cmp(const char **argv) static int i18n_grep(const char **argv) { - int dont_match = 0; const char *pattern, *path; struct grep_opt opt; @@ -87,11 +96,6 @@ static int i18n_grep(const char **argv) struct strbuf buf = STRBUF_INIT; int hit; - if (*argv && !strcmp("!", *argv)) { - dont_match = 1; - argv++; - } - pattern = *(argv++); path = *(argv++); @@ -104,13 +108,15 @@ static int i18n_grep(const char **argv) if (strbuf_read_file(&buf, path, 0) < 0) die_errno("could not read %s", path); + if (strstr(buf.buf, "# SHELL GETTEXT POISON #")) + return 0; unrot13_strbuf(&buf); grep_source_init(&source, GREP_SOURCE_BUF, path, path, path); source.buf = buf.buf; source.size = buf.len; hit = grep_source(&opt, &source); strbuf_release(&buf); - return dont_match ^ !hit; + return !hit; } int cmd__i18n(int argc, const char **argv) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 394fd2ef5b..6c580a3000 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1019,14 +1019,12 @@ test_i18ngrep () { BUG "too few parameters to test_i18ngrep" fi - if test_have_prereq !C_LOCALE_OUTPUT + grep_cmd=grep + if test "$GIT_TEST_GETTEXT_POISON" == "rot13" + then + grep_cmd="test-tool i18n grep" + elif test_have_prereq !C_LOCALE_OUTPUT then - if test rot13 = "$GIT_TEST_GETTEXT_POISON" - then - test-tool i18n grep "$@" - return $? - fi - # pretend success return 0 fi @@ -1034,13 +1032,13 @@ test_i18ngrep () { if test "x!" = "x$1" then shift - ! grep "$@" && return 0 + ! $grep_cmd "$@" && return 0 - echo >&4 "error: '! grep $@' did find a match in:" + echo >&4 "error: '! $grep_cmd $@' did find a match in:" else - grep "$@" && return 0 + $grep_cmd "$@" && return 0 - echo >&4 "error: 'grep $@' didn't find a match in:" + echo >&4 "error: '$grep_cmd $@' didn't find a match in:" fi if test -s "$last_arg" ----------------- I.e. most of this was because you didn't add any support for this to the shellscript translations. We still punt on that here, it should ideally preserve the shell variables like the format specifiers, but I think that's not worth the effort with us actively cutting down our shell code to nothing. We still have failures e.g. due to "test_i18ngrep -F fixed file" not being supported. Wouldn't a better/simpler approach be to just have a light rot13 helper, and then call the actual "cmp"/"grep" with the munged input/output?
Jeff King <peff@peff.net> writes: > Likewise, I'm not sure that one can reliably rot13 an output for > test_i18ncmp. It could contain mixed translated and untranslated bits > from different messages. As long as <rot13>...</rot13> marking can be trusted (and it cannot be, but if we declare it is OK to punt, then by definition we can), I think the messages can be compared. > So I dunno. You did find two spots where translations could be used. > But if nobody actually saw them to care that they got translated, were > they important? I may be a bit biased as somebody who would not use the > translations in the first place, of course. I viewed the series with interest, mostly (i.e. 80%+) for its "fun value"; I tend to agree with you that I doubt its usefulness.
On Tue, Jan 12, 2021 at 11:50:52AM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Likewise, I'm not sure that one can reliably rot13 an output for > > test_i18ncmp. It could contain mixed translated and untranslated bits > > from different messages. > > As long as <rot13>...</rot13> marking can be trusted (and it cannot > be, but if we declare it is OK to punt, then by definition we can), > I think the messages can be compared. Good point. I should have looked at the implementation more carefully before responding (and I agree that while not foolproof, tagging like that would work OK in practice). -Peff
Hi Ævar, On Tue, 12 Jan 2021, Ævar Arnfjörð Bjarmason wrote: > On Tue, Jan 12 2021, Johannes Schindelin via GitGitGadget wrote: > > > I implemented a GETTEXT_POISON=rot13 mode and ran the test suite to > > see whether we erroneously expect translated messages where they > > aren't, and related problems. > > I agree that rot13ing it makes it much more useful in the sense that > before we could over-tag things with test_i18n* and we'd just "return 0" > there in the poison mode, now we actually check if the string is > poisoned. Precisely. > In our off-list discussion you maintained that > "GIT_TEST_GETTEXT_POISON=false <cmd>" was an anti-pattern. Was it > because you thought "test_i18n*" et al was actually doing some "is > poison?" [...] I maintain this view because the whole point of having a GETTEXT_POISON job is to have the entire test suite run in that mode. Using this anti-pattern says to me "the author could not make this work correctly in GETTEXT_POISON mode". Even more, _iff_ we expect certain validations to be skipped in the GETTEXT_POISON job, using `GIT_TEST_GETTEXT_POISON=false` opens the possibility that this job fails when it never should have failed. > > And the experiment delivered. It is just a demonstration (I only addressed a > > handful of the test suite failures, 124 test scripts still need to be > > inspected to find out why they fail), of course. Yet I think that finding > > e.g. the missing translations of sha1dc-cb and parse-options show that it > > would be very useful to complete this patch series and then use the rot13 > > mode in our CI jobs (instead of the current GETTEXT_POISON=true mode, which > > really is less useful). > > The following patch on top cuts down on the failures by more than half: Interesting! I forgot the shell half... Curious that the scripted parts _still_ affect so much of our test suite :-( > ----------------- > diff --git a/git-sh-i18n.sh b/git-sh-i18n.sh > index 8eef60b43f..f9239d2917 100644 > --- a/git-sh-i18n.sh > +++ b/git-sh-i18n.sh > @@ -17,7 +17,10 @@ export TEXTDOMAINDIR > > # First decide what scheme to use... > GIT_INTERNAL_GETTEXT_SH_SCHEME=fallthrough > -if test -n "$GIT_TEST_GETTEXT_POISON" && > +if test -n "$GIT_TEST_GETTEXT_POISON" -a "$GIT_TEST_GETTEXT_POISON" = "rot13" > +then > + GIT_INTERNAL_GETTEXT_SH_SCHEME=rot13poison > +elif test -n "$GIT_TEST_GETTEXT_POISON" && > git env--helper --type=bool --default=0 --exit-code \ > GIT_TEST_GETTEXT_POISON > then > @@ -63,6 +66,21 @@ gettext_without_eval_gettext) > ) > } > ;; > +rot13poison) > + # Emit garbage so that tests that incorrectly rely on translatable > + # strings will fail. > + gettext () { > + printf "%s" "# SHELL GETTEXT POISON #" > + } > + > + eval_gettext () { > + printf "%s" "# SHELL GETTEXT POISON #" > + } > + > + eval_ngettext () { > + printf "%s" "# SHELL GETTEXT POISON #" > + } Right, and for that, I'd rather use the `test-tool i18n` helper (it is appropriate to expect `test-tool` to be in the `PATH` in GETTEXT_POISON mode, right?) > + ;; > poison) > # Emit garbage so that tests that incorrectly rely on translatable > # strings will fail. > diff --git a/t/helper/test-i18n.c b/t/helper/test-i18n.c > index 82efc66f1f..1f0fa3d041 100644 > --- a/t/helper/test-i18n.c > +++ b/t/helper/test-i18n.c > @@ -1,6 +1,8 @@ > #include "test-tool.h" > #include "cache.h" > #include "grep.h" > +#include "config.h" > > static const char *usage_msg = "\n" > " test-tool i18n cmp <file1> <file2>\n" > @@ -34,8 +36,12 @@ static size_t unrot13(char *buf) > > p += strlen("<rot13>"); > end = strstr(p, "</rot13>"); > - if (!end) > - BUG("could not find </rot13> in\n%s", buf); > + if (!end) { > + if (git_env_bool("GIT_TEST_GETTEXT_POISON_PEDANTIC", 0)) > + BUG("could not find </rot13> in\n%s", buf); > + else > + break; > + } > > while (p != end) > *(q++) = do_rot13(*(p++)); > @@ -67,8 +73,12 @@ static int i18n_cmp(const char **argv) > > if (strbuf_read_file(&a, path1, 0) < 0) > die_errno("could not read %s", path1); > + if (strstr(a.buf, "# SHELL GETTEXT POISON #")) > + return 0; > if (strbuf_read_file(&b, path2, 0) < 0) > die_errno("could not read %s", path2); > + if (strstr(b.buf, "# SHELL GETTEXT POISON #")) > + return 0; > unrot13_strbuf(&b); > > if (a.len != b.len || memcmp(a.buf, b.buf, a.len)) > @@ -79,7 +89,6 @@ static int i18n_cmp(const char **argv) > > static int i18n_grep(const char **argv) > { > - int dont_match = 0; > const char *pattern, *path; > > struct grep_opt opt; > @@ -87,11 +96,6 @@ static int i18n_grep(const char **argv) > struct strbuf buf = STRBUF_INIT; > int hit; > > - if (*argv && !strcmp("!", *argv)) { > - dont_match = 1; > - argv++; > - } > - > pattern = *(argv++); > path = *(argv++); > > @@ -104,13 +108,15 @@ static int i18n_grep(const char **argv) > > if (strbuf_read_file(&buf, path, 0) < 0) > die_errno("could not read %s", path); > + if (strstr(buf.buf, "# SHELL GETTEXT POISON #")) > + return 0; > unrot13_strbuf(&buf); > grep_source_init(&source, GREP_SOURCE_BUF, path, path, path); > source.buf = buf.buf; > source.size = buf.len; > hit = grep_source(&opt, &source); > strbuf_release(&buf); > - return dont_match ^ !hit; > + return !hit; > } > > int cmd__i18n(int argc, const char **argv) > diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh > index 394fd2ef5b..6c580a3000 100644 > --- a/t/test-lib-functions.sh > +++ b/t/test-lib-functions.sh > @@ -1019,14 +1019,12 @@ test_i18ngrep () { > BUG "too few parameters to test_i18ngrep" > fi > > - if test_have_prereq !C_LOCALE_OUTPUT > + grep_cmd=grep > + if test "$GIT_TEST_GETTEXT_POISON" == "rot13" > + then > + grep_cmd="test-tool i18n grep" > + elif test_have_prereq !C_LOCALE_OUTPUT > then > - if test rot13 = "$GIT_TEST_GETTEXT_POISON" > - then > - test-tool i18n grep "$@" > - return $? > - fi > - > # pretend success > return 0 > fi > @@ -1034,13 +1032,13 @@ test_i18ngrep () { > if test "x!" = "x$1" > then > shift > - ! grep "$@" && return 0 > + ! $grep_cmd "$@" && return 0 > > - echo >&4 "error: '! grep $@' did find a match in:" > + echo >&4 "error: '! $grep_cmd $@' did find a match in:" > else > - grep "$@" && return 0 > + $grep_cmd "$@" && return 0 > > - echo >&4 "error: 'grep $@' didn't find a match in:" > + echo >&4 "error: '$grep_cmd $@' didn't find a match in:" > fi > > if test -s "$last_arg" > ----------------- > > I.e. most of this was because you didn't add any support for this to the > shellscript translations. > > We still punt on that here, it should ideally preserve the shell > variables like the format specifiers, but I think that's not worth the > effort with us actively cutting down our shell code to nothing. I'm not sure how fast we'll get there, what with `git submodule` being slow to get over the finish line, and with `mergetool`/`difftool` still being scripted, and probably for a long time, too. > We still have failures e.g. due to "test_i18ngrep -F fixed file" not > being supported. Wouldn't a better/simpler approach be to just have a > light rot13 helper, and then call the actual "cmp"/"grep" with the > munged input/output? Hmm. I hoped that this work would open the door to introducing more C helpers in the test suite, thereby accelerating it on Windows. We could also potentially move the `cmp`/`grep` parts into separate helpers that give more useful output in case of a failure. (Right now, if anything fails in CI involving `grep` or `cmp`, the user investigating this pretty much _has_ to try to reproduce the issue locally because the output is so not helpful, and reproducing it might be a challenge e.g. when the failure is macOS-specific and the developer does not have access to a macOS setup). Ciao, Dscho
Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > ... >> So I dunno. You did find two spots where translations could be used. >> But if nobody actually saw them to care that they got translated, were >> they important? I may be a bit biased as somebody who would not use the >> translations in the first place, of course. > > I viewed the series with interest, mostly (i.e. 80%+) for its "fun > value"; I tend to agree with you that I doubt its usefulness. Actually, "I doubt its usefulness" is way too strong than what I really wanted to say. I was just skeptical, and I still somewhat am. I'd just need a bit more convincing ;-)