mbox series

[v3,0/9] Incremental po/git.pot update and new l10n workflow

Message ID 20220523012531.4505-1-worldhello.net@gmail.com (mailing list archive)
Headers show
Series Incremental po/git.pot update and new l10n workflow | expand

Message

Jiang Xin May 23, 2022, 1:25 a.m. UTC
From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

A workflow change for translators are being proposed.

Changes since v2:

 1. Patch 1/9: reword.
 2. Patch 2/9: reword.
 3. Patch 3/9: reword, and add "FORCE" to prerequisites of "po/git.pot".
 4. Patch 6/9: remove "FORCE" from prerequisites of "po/git.pot".
 5. Patch 8/9: reword, and reuse "$(gen_pot_header)" to prepare pot
               header for "po/git-core.pot".
 6. Patch 9/9: various updates on po/README.md.


Range-diff vs v2:

 1:  c45f34f233 !  1:  362cd0cbe1 Makefile: sort "po/git.pot" by file location
    @@ Metadata
      ## Commit message ##
         Makefile: sort "po/git.pot" by file location
     
    -    Before feeding xgettext with more C souce files which may be ignored
    -    by various compiler conditions, add new option "--sort-by-file" to
    -    xgettext program to create stable message template file "po/git.pot".
    +    We will feed xgettext with more C souce files and in different order in
    +    subsequent commit. To generate a stable "po/git.pot" regardless of the
    +    number and order of input source files, we add a new option
    +    "--sort-by-file" to xgettext program.
     
         With this update, the newly generated "po/git.pot" will has the same
    -    entries while in a different order. We won't checkin the newly generated
    -    "po/git.pot", because we will remove it from tree in a later commit.
    +    entries while in a different order.
    +
    +    With the help of a custom diff driver as shown below,
    +
    +        git config --global diff.gettext-fmt.textconv \
    +            "msgcat --no-location --sort-by-file"
    +
    +    and appending a new entry "*.po diff=gettext-fmt" to git attributes,
    +    we can see that there are no substantial changes in "po/git.pot".
    +
    +    We won't checkin the newly generated "po/git.pot", because we will
    +    remove it from tree in a later commit.
     
         Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
 2:  7b6e4d6b59 !  2:  096e700171 Makefile: generate "po/git.pot" from stable LOCALIZED_C
    @@ Commit message
              endif
     
         But it is much simpler to use variables "$(FOUND_C_SOURCES)" and
    -    "$(FOUND_C_SOURCES)" to form a stable "LOCALIZED_C".
    +    "$(FOUND_C_SOURCES)" to form a stable "LOCALIZED_C". We also add
    +    "$(SCALAR_SOURCES)" files, which are part of C_OBJ but not included in
    +    "$(FOUND_C_SOURCES)" because they are in the "contrib/" directory.
     
         With this update, the newly generated "po/git.pot" will have 30 new
         entries coming from the following C source files:
    @@ Makefile: XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
      XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \
      	--keyword=__ --keyword=N__ --keyword="__n:1,2"
     -LOCALIZED_C = $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H)
    -+LOCALIZED_C = $(FOUND_C_SOURCES) $(SCALAR_SOURCES) \
    -+	      $(FOUND_H_SOURCES) $(GENERATED_H)
    ++LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
    ++	      $(GENERATED_H)
      LOCALIZED_SH = $(SCRIPT_SH)
      LOCALIZED_SH += git-sh-setup.sh
      LOCALIZED_PERL = $(SCRIPT_PERL)
 3:  868a631c2f !  3:  dff3751260 Makefile: have "make pot" not "reset --hard"
    @@ Commit message
         That out of the way, the main logic change here is getting rid of the
         "reset --hard":
     
    -    We'll generate intermediate .build/pot/po/%.po files from %, which is
    -    handy to see at a glance what strings (if any) in a given file are
    +    We'll generate intermediate ".build/pot/po/%.po" files from "%", which
    +    is handy to see at a glance what strings (if any) in a given file are
         marked for translation:
     
                 $ make .build/pot/po/pretty.c.po
    @@ Commit message
                 $
     
         For these C source files which contain the PRItime macros, we will
    -    create temporary munged *.c files in a tree in ".build/pot/po"
    +    create temporary munged "*.c" files in a tree in ".build/pot/po"
         corresponding to our source tree, and have "xgettext" consider those.
         The rule needs to be careful to "(cd .build/pot/po && ...)", because
         otherwise the comments in the po/git.pot file wouldn't refer to the
         correct source locations (they'd be prefixed with ".build/pot/po").
    +    These temporary munged "*.c” files will be removed immediately after
    +    the corresponding po files are generated, because some development tools
    +    cannot ignore the duplicate source files in the ".build" directory
    +    according to the ".gitignore" file, and that may cause trouble.
     
    -    This changes the output of the generated po/git.pot file in one minor
    +    The output of the generated po/git.pot file is changed in one minor
         way: Because we're using msgcat(1) instead of xgettext(1) to
         concatenate the output we'll now disambiguate where "TRANSLATORS"
         comments come from, in cases where a message is the same in N files,
    @@ Commit message
         understandable, as msgcat(1) is better at handling these edge cases
         than xgettext(1)'s previously used "--join-existing" flag.
     
    -    While we could rename the "pot" snippets without the ".po" extention
    -    to use more intuitive filenames in the comments, but that will
    -    confuse the IDE with lots of invalid C or perl source files in
    +    But filenames in the above disambiguation lines of extracted-comments
    +    have an extra ".po" extension compared to the filenames at the file
    +    locations. While we could rename the intermediate ".build/pot/po/%.po"
    +    files without the ".po" extension to use more intuitive filenames in
    +    the disambiguation lines of extracted-comments, but that will confuse
    +    developer tools with lots of invalid C or other source files in
         ".build/pot/po" directory.
     
         The addition of "--omit-header" option for xgettext makes the "pot"
    -    snippets in ".build/pot/po/*.po" smaller. For the pot header of
    -    "po/git.pot", we use xgettext to generate a "pot" header file
    -    ".build/pot/git.header" from an empty file at runtime, and use this
    +    snippets in ".build/pot/po/*.po" smaller. But as we'll see in a
    +    subsequent commit this header behavior has been hiding an
    +    encoding-related bug from us, so let's carry it forward instead of
    +    re-generating it with xgettext(1).
    +
    +    The "po/git.pot" file should have a header entry, because a proper
    +    header entry will increase the speed of creating a new po file using
    +    msginit and set a proper "POT-Creation-Date:" field in the header
    +    entry of a "po/XX.po" file. We use xgettext to generate a separate
    +    header file at ".build/pot/git.header" from "/dev/null", and use this
         header to assemble "po/git.pot".
     
    -    But as we'll see in a subsequent commit this header behavior has been
    -    hiding an encoding-related bug from us, so let's carry it forward
    -    instead of re-generating it with xgettext(1).
    -
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
         Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
    @@ Makefile: XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
      XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \
      	--keyword=__ --keyword=N__ --keyword="__n:1,2"
     +MSGCAT_FLAGS = --sort-by-file
    - LOCALIZED_C = $(FOUND_C_SOURCES) $(SCALAR_SOURCES) \
    - 	      $(FOUND_H_SOURCES) $(GENERATED_H)
    + LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
    + 	      $(GENERATED_H)
      LOCALIZED_SH = $(SCRIPT_SH)
     @@ Makefile: LOCALIZED_SH += t/t0200/test.sh
      LOCALIZED_PERL += t/t0200/test.perl
    @@ Makefile: LOCALIZED_SH += t/t0200/test.sh
     +	$(call mkdir_p_parent_template)
     +	$(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \
     +		-o$@ $(XGETTEXT_FLAGS_PERL) $<
    ++
    ++define gen_pot_header
    ++$(XGETTEXT) $(XGETTEXT_FLAGS_C) \
    ++	-o - /dev/null | \
    ++sed -e 's|charset=CHARSET|charset=UTF-8|' \
    ++    -e 's|\(Last-Translator: \)FULL NAME <.*>|\1make by the Makefile|' \
    ++    -e 's|\(Language-Team: \)LANGUAGE <.*>|\1Git Mailing List <git@vger.kernel.org>|' \
    ++    >$@ && \
    ++echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
    ++endef
      
     -	$(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ $(XGETTEXT_FLAGS_C) $(LOCALIZED_C)
     -	$(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_SH) \
    @@ Makefile: LOCALIZED_SH += t/t0200/test.sh
     -		$(LOCALIZED_PERL)
     +.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
     +	$(call mkdir_p_parent_template)
    -+	$(QUIET_XGETTEXT)$(XGETTEXT) $(XGETTEXT_FLAGS_C) \
    -+		-o - /dev/null | \
    -+	sed -e 's|charset=CHARSET|charset=UTF-8|g' >$@ && \
    -+	echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
    ++	$(QUIET_GEN)$(gen_pot_header)
      
     -	# Reverting the munged source, leaving only the updated $@
     -	git reset --hard
     -	mv $@+ $@
    -+po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
    -+	$(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $^ >$@
    ++po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO) FORCE
    ++	$(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $(filter-out FORCE,$^) >$@
      
      .PHONY: pot
      pot: po/git.pot
 4:  31aa6ed373 !  4:  1b7efb21ae i18n CI: stop allowing non-ASCII source messages in po/git.pot
    @@ Makefile: XGETTEXT_FLAGS = \
      XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \
      	--keyword=_ --keyword=N_ --keyword="Q_:1,2"
      XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
    -@@ Makefile: po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
    +@@ Makefile: po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO) FORCE
      .PHONY: pot
      pot: po/git.pot
      
 5:  a9e4840571 !  5:  8ce274b31f po/git.pot: this is now a generated file
    @@ Commit message
     
         We no longer keep track of the contents of this file.
     
    +    Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
    +
      ## po/git.pot (deleted) ##
     @@
     -# SOME DESCRIPTIVE TITLE.
 6:  1f59007114 !  6:  4585be63f7 po/git.pot: don't check in result of "make pot"
    @@ Commit message
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## Makefile ##
    +@@ Makefile: endef
    + 	$(call mkdir_p_parent_template)
    + 	$(QUIET_GEN)$(gen_pot_header)
    + 
    +-po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO) FORCE
    +-	$(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $(filter-out FORCE,$^) >$@
    ++po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
    ++	$(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $^ >$@
    + 
    + .PHONY: pot
    + pot: po/git.pot
     @@ Makefile: dist-doc: git$X
      
      distclean: clean
 7:  cb31a4001e !  7:  b8f43b520c Makefile: add "po-update" rule to update po/XX.po
    @@ Makefile: XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \
      	--keyword=__ --keyword=N__ --keyword="__n:1,2"
      MSGCAT_FLAGS = --sort-by-file
     +MSGMERGE_FLAGS = --add-location --backup=off --update
    - LOCALIZED_C = $(FOUND_C_SOURCES) $(SCALAR_SOURCES) \
    - 	      $(FOUND_H_SOURCES) $(GENERATED_H)
    + LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
    + 	      $(GENERATED_H)
      LOCALIZED_SH = $(SCRIPT_SH)
     @@ Makefile: po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
      .PHONY: pot
 8:  f4c58f6184 !  8:  019633c7a4 Makefile: add "po-init" rule to initialize po/XX.po
    @@ Commit message
         To help new l10n contributors to initialized their "po/XX.pot" from
         "po/git-core.pot", we also add new rules "po-init":
     
    -        make po-init POT_FILE=po/XX.po
    +        make po-init PO_FILE=po/XX.po
     
         [^1]: https://github.com/git-l10n/git-po-helper/
     
    @@ Makefile: po-update: po/git.pot
     +
     +.build/pot/git-core.header: $(LOCALIZED_C_CORE_GEN_PO)
     +	$(call mkdir_p_parent_template)
    -+	$(QUIET_XGETTEXT)$(XGETTEXT) $(XGETTEXT_FLAGS_C) \
    -+		-o - /dev/null | \
    -+	sed -e 's|charset=CHARSET|charset=UTF-8|g' >$@ && \
    -+	echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
    ++	$(QUIET_GEN)$(gen_pot_header)
     +
     +po/git-core.pot: .build/pot/git-core.header $(LOCALIZED_C_CORE_GEN_PO)
     +	$(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $^ >$@
 9:  809128ce21 !  9:  334117bf48 l10n: Document the new l10n workflow
    @@ Commit message
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## po/README.md ##
    -@@ po/README.md: for a new language translation. Because there are more than 5000 messages
    - in the template message file "po/git.pot" that need to be translated,
    - this is not a piece of cake for the contributor for a new language.
    +@@ po/README.md: coordinates our localization effort in the l10 coordinator repository:
    + 
    +     https://github.com/git-l10n/git-po/
    + 
    +-The two character language translation codes are defined by ISO\_639-1, as
    +-stated in the gettext(1) full manual, appendix A.1, Usual Language Codes.
    ++We will use XX as an alias to refer to the language translation code in
    ++the following paragraphs, for example we use "po/XX.po" to refer to the
    ++translation file for a specific language. But this doesn't mean that
    ++the language code has only two letters. The language code can be in one
    ++of two forms: "ll" or "ll\_CC". Here "ll" is the ISO 639 two-letter
    ++language code and "CC" is the ISO 3166 two-letter code for country names
    ++and subdivisions. For example: "de" for German language code, "zh\_CN"
    ++for Simplified Chinese language code.
    + 
    + 
    + ## Contributing to an existing translation
    +@@ po/README.md: language, so that the l10n coordinator only needs to interact with one
    + person per language.
    + 
    + 
    +-## Core translation
    ++## Translation Process Flow
    + 
    +-The core translation is the smallest set of work that must be completed
    +-for a new language translation. Because there are more than 5000 messages
    +-in the template message file "po/git.pot" that need to be translated,
    +-this is not a piece of cake for the contributor for a new language.
    ++The overall data-flow looks like this:
      
     -The core template message file which contains a small set of messages
     -will be generated in "po-core/core.pot" automatically by running a helper
     -program named "git-po-helper" (described later).
    -+The "core" set of messages can be generated at "po/git-core.pot" by
    -+running:
    ++    +-------------------+             +------------------+
    ++    | Git source code   | ----(2)---> | L10n coordinator |
    ++    | repository        | <---(5)---- | repository       |
    ++    +-------------------+             +------------------+
    ++                    |                     |    ^
    ++                   (1)                   (3)  (4)
    ++                    V                     v    |
    ++               +----------------------------------+
    ++               |        Language Team XX          |
    ++               +----------------------------------+
      
    - ```shell
    +-```shell
     -git-po-helper init --core XX.po
    -+make core-pot
    - ```
    +-```
    ++- Translatable strings are marked in the source file.
    ++- Language teams can start translation iterations at any time, even
    ++  before the l10n window opens:
      
     -After translating the generated "po-core/XX.po", you can merge it to
     -"po/XX.po" using the following commands:
    --
    ++  + Pull from the master branch of the source (1)
    ++  + Update the message file by running "make po-update PO\_FILE=po/XX.po"
    ++  + Translate the message file "po/XX.po"
    + 
     -```shell
     -msgcat po-core/XX.po po/XX.po -s -o /tmp/XX.po
     -mv /tmp/XX.po po/XX.po
     -git-po-helper update XX.po
     -```
    --
    ++- The L10n coordinator pulls from source and announces the l10n window
    ++  open (2)
    ++- Language team pulls from the l10n coordinator, starts another
    ++  translation iteration against the l10n coordinator's tree (3)
    + 
     -Edit "po/XX.po" by hand to fix "fuzzy" messages, which may have misplaced
     -translated messages and duplicate messages.
    -+And then proceeding with the rest of these instructions on the new
    -+generated "po/git-core.pot" file.
    ++  + Run "git pull --rebase" from the l10n coordinator
    ++  + Update the message file by running "make po-update PO\_FILE=po/XX.po"
    ++  + Translate the message file "po/XX.po"
    ++  + Squash trivial l10n git commits using "git rebase -i"
      
    ++- Language team sends pull request to the l10n coordinator (4)
    ++- L10n coordinator checks and merges
    ++- L10n coordinator asks the result to be pulled (5).
      
    - ## Translation Process Flow
    +-## Translation Process Flow
      
    - The overall data-flow looks like this:
    +-The overall data-flow looks like this:
    ++## Dynamically generated POT files
      
     -    +-------------------+            +------------------+
     -    | Git source code   | ---(1)---> | L10n coordinator |
    @@ po/README.md: for a new language translation. Because there are more than 5000 m
     -                                     +------------------+
     -                                     | Language Team XX |
     -                                     +------------------+
    -+    +-------------------+             +------------------+
    -+    | Git source code   | ----(2)---> | L10n coordinator |
    -+    | repository        | <---(5)---- | repository       |
    -+    +-------------------+             +------------------+
    -+                    |                     |    ^
    -+                   (1)                   (3)  (4)
    -+                    V                     v    |
    -+               +----------------------------------+
    -+               |        Language Team XX          |
    -+               +----------------------------------+
    ++POT files are templates for l10n contributors to create or update their
    ++translation files. We used to have the "po/git.pot" file which was
    ++generated by the l10n coordinator, but this file had been removed from
    ++the tree.
      
    - - Translatable strings are marked in the source file.
    +-- Translatable strings are marked in the source file.
     -- L10n coordinator pulls from the source (1)
     -- L10n coordinator updates the message template "po/git.pot"
     -- Language team pulls from L10n coordinator (2)
     -- Language team updates the message file "po/XX.po"
     -- L10n coordinator pulls from Language team (3)
     -- L10n coordinator asks the result to be pulled (4).
    -+- Language teams can start translation iterations at any time, even
    -+  before the l10n window opens:
    -+
    -+  + Pull from the source (1)
    -+  + Update the message file by running "make po-update PO\_FILE=po/XX.po"
    -+  + Translate the message file "po/XX.po"
    ++The two POT files "po/git.pot" and "po/git-core.pot" can be created
    ++dynamically when necessary.
      
    -+- The L10n coordinator pulls from source and announces the l10n window
    -+  open (2)
    -+- Language team pulls from the l10n coordinator, starts another
    -+  translation iteration against the l10n coordinator's tree (3)
    ++L10n contributors use "po/git.pot" to prepare translations for their
    ++languages, but they are not expected to modify it. The "po/git.pot" file
    ++can be generated manually with the following command:
      
     -## Maintaining the "po/git.pot" file
    -+  + Run "git pull --rebase" from the l10n coordinator
    -+  + Update the message file by running "make po-update PO\_FILE=po/XX.po"
    -+  + Translate the message file "po/XX.po"
    -+  + Squash trivial l10n git commits using "git rebase -i"
    ++```shell
    ++make po/git.pot
    ++```
      
     -(This is done by the l10n coordinator).
    -+- Language team sends pull request to the l10n coordinator (4)
    -+- L10n coordinator checks and merges
    -+- L10n coordinator asks the result to be pulled (5).
    ++The "po/git-core.pot" file is the template for core translations. A core
    ++translation is the minimum set of work necessary to complete a
    ++translation of a new language. Since there are more than 5000 messages
    ++in the full set of template message file "po/git.pot" that need to be
    ++translated, this is not a piece of cake for new language contributors.
      
     -The "po/git.pot" file contains a message catalog extracted from Git's
     -sources. The l10n coordinator maintains it by adding new translations with
    @@ po/README.md: for a new language translation. Because there are more than 5000 m
     -expected to pull from the main git repository at strategic point in
     -history (e.g. when a major release and release candidates are tagged),
     -and then run "make pot" at the top-level directory.
    ++The "core" template file "po/git-core.pot" can be generated manually
    ++by running:
      
     -Language contributors use this file to prepare translations for their
     -language, but they are not expected to modify it.
    -+## Creating the "po/git.pot" file
    -+
    -+The "po/git.pot" file, once generated by the the l10n coordinator had
    -+been removed from the tree. L10n contributors can generated it at
    -+runtime using command:
    -+
     +```shell
    -+make pot
    ++make po/git-core.pot
     +```
    -+
    -+Then language contributors use this file to prepare translations for
    -+their language, but they are not expected to modify it.
      
      
      ## Initializing a "XX.po" file
    @@ po/README.md: If your language XX does not have translated message file "po/XX.p
      
      ```shell
     -msginit --locale=XX
    --```
    --
    ++make po-init PO_FILE=po/XX.po
    + ```
    + 
     -in the "po/" directory, where XX is the locale, e.g. "de", "is", "pt\_BR",
     -"zh\_CN", etc.
     -
    @@ -1,6 +1,6 @@
     -+# Copyright (C) 2010 Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     -+# This file is distributed under the same license as the Git package.
     - # Ævar Arnfjörð Bjarmason <avarab@gmail.com>, 2010.
    -+make po-init PO_FILE=po/XX.po
    - ```
    - 
    +-```
    +-
     -And change references to PACKAGE VERSION in the PO Header Entry to
     -just "Git":
    -+Where XX is the locale, e.g. "de", "is", "pt\_BR", "zh\_CN", etc.
    ++where XX is the locale, e.g. "de", "is", "pt\_BR", "zh\_CN", etc.
      
     -```shell
     -perl -pi -e 's/(?<="Project-Id-Version: )PACKAGE VERSION/Git/' XX.po
    @@ po/README.md: and ask the l10n coordinator to pull from you.
      
     -Once you are done testing the translation (see below), commit the result
     -and ask the l10n coordinator to pull from you.
    -+- Call "make pot" at runtime to generate new "po/git.pot" file
    ++- Call "make po/git.pot" to generate new "po/git.pot" file
     +- Call "msgmerge --add-location --backup=off -U po/XX.po po/git.pot"
     +  to update your "po/XX.po"
     +- The "--add-location" option for msgmerge will add location lines,
    -+  and these location lines will help translation tools to location
    ++  and these location lines will help translation tools to locate
     +  translation context easily.
     +
     +Once you are done testing the translation (see below), it's better
    @@ po/README.md: and ask the l10n coordinator to pull from you.
     +To save a location-less "po/XX.po" automatically in repository, you
     +can:
     +
    -+- Define new attribute for "po/XX.po" by adding new line in
    -+  ".git/info/attributes":
    ++First define a new attribute for "po/XX.po" by appending the following
    ++line in ".git/info/attributes":
     +
    -+        /po/XX.po filter=gettext-no-location
    ++```
    ++/po/XX.po filter=gettext-no-location
    ++```
     +
    -+- Define driver for "gettext-no-location" filter:
    ++Then define the driver for the "gettext-no-location" clean filter to
    ++strip out both filenames and locations from the contents as follows:
     +
    -+        $ git config --global filter.gettext-no-location.clean \
    -+              "msgcat --no-location -"
    ++```shell
    ++git config --global filter.gettext-no-location.clean \
    ++           "msgcat --no-location -"
    ++```
    ++
    ++For users who have gettext version 0.20 or higher, it is also possible
    ++to define a clean filter to preserve filenames but not locations:
    ++
    ++```shell
    ++git config --global filter.gettext-no-location.clean \
    ++           "msgcat --add-location=file -"
    ++```
     +
     +You're now ready to ask the l10n coordinator to pull from you.
      

---

Jiang Xin (4):
  Makefile: sort "po/git.pot" by file location
  Makefile: generate "po/git.pot" from stable LOCALIZED_C
  po/git.pot: this is now a generated file
  Makefile: add "po-update" rule to update po/XX.po

Ævar Arnfjörð Bjarmason (5):
  Makefile: have "make pot" not "reset --hard"
  i18n CI: stop allowing non-ASCII source messages in po/git.pot
  po/git.pot: don't check in result of "make pot"
  Makefile: add "po-init" rule to initialize po/XX.po
  l10n: Document the new l10n workflow

 .gitignore                  |     1 +
 Makefile                    |   148 +-
 builtin/submodule--helper.c |     2 +-
 ci/run-static-analysis.sh   |     2 +
 po/.gitignore               |     2 +
 po/README.md                |   230 +-
 po/git.pot                  | 25151 ----------------------------------
 shared.mak                  |     2 +
 8 files changed, 250 insertions(+), 25288 deletions(-)
 delete mode 100644 po/git.pot

Comments

Ævar Arnfjörð Bjarmason May 23, 2022, 7:15 a.m. UTC | #1
On Mon, May 23 2022, Jiang Xin wrote:

> From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
>
> A workflow change for translators are being proposed.
>
> Changes since v2:
>
>  1. Patch 1/9: reword.
>  2. Patch 2/9: reword.
>  3. Patch 3/9: reword, and add "FORCE" to prerequisites of "po/git.pot".
>  4. Patch 6/9: remove "FORCE" from prerequisites of "po/git.pot".
>  5. Patch 8/9: reword, and reuse "$(gen_pot_header)" to prepare pot
>                header for "po/git-core.pot".
>  6. Patch 9/9: various updates on po/README.md.

From skimming this the *.c.po v.s. *.c extension is still left in
comments. I'm not saying you need to go for my suggestions, but it would
be very useful in CL's to note things that were suggested but not
changed, such as that.

Right now I haven't paged that v2 discussion into my brain again, so I
don't know if that was the only thing, it's the only thing I remember
right now...

But let's read on:

> Range-diff vs v2:
>
>  1:  c45f34f233 !  1:  362cd0cbe1 Makefile: sort "po/git.pot" by file location
>     @@ Metadata
>       ## Commit message ##
>          Makefile: sort "po/git.pot" by file location
>      
>     -    Before feeding xgettext with more C souce files which may be ignored
>     -    by various compiler conditions, add new option "--sort-by-file" to
>     -    xgettext program to create stable message template file "po/git.pot".
>     +    We will feed xgettext with more C souce files and in different order in
>     +    subsequent commit. To generate a stable "po/git.pot" regardless of the
>     +    number and order of input source files, we add a new option
>     +    "--sort-by-file" to xgettext program.
>      
>          With this update, the newly generated "po/git.pot" will has the same
>     -    entries while in a different order. We won't checkin the newly generated
>     -    "po/git.pot", because we will remove it from tree in a later commit.
>     +    entries while in a different order.
>     +
>     +    With the help of a custom diff driver as shown below,
>     +
>     +        git config --global diff.gettext-fmt.textconv \
>     +            "msgcat --no-location --sort-by-file"
>     +
>     +    and appending a new entry "*.po diff=gettext-fmt" to git attributes,
>     +    we can see that there are no substantial changes in "po/git.pot".
>     +
>     +    We won't checkin the newly generated "po/git.pot", because we will
>     +    remove it from tree in a later commit.


Does this actually work? This seems to suggest adding a driver for *.po,
but using it against the *.pot file. Isn't that a typo (I haven't tested
it)>

>          But it is much simpler to use variables "$(FOUND_C_SOURCES)" and
>     -    "$(FOUND_C_SOURCES)" to form a stable "LOCALIZED_C".
>     +    "$(FOUND_C_SOURCES)" to form a stable "LOCALIZED_C". We also add
>     +    "$(SCALAR_SOURCES)" files, which are part of C_OBJ but not included in
>     +    "$(FOUND_C_SOURCES)" because they are in the "contrib/" directory.

Thanks, good to note that.

[snipped the rest, will re-read individual commits]
Ævar Arnfjörð Bjarmason May 23, 2022, 8:12 a.m. UTC | #2
On Mon, May 23 2022, Ævar Arnfjörð Bjarmason wrote:

> On Mon, May 23 2022, Jiang Xin wrote:
>
>> From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
>>
>> A workflow change for translators are being proposed.
>>
>> Changes since v2:
>>
>>  1. Patch 1/9: reword.
>>  2. Patch 2/9: reword.
>>  3. Patch 3/9: reword, and add "FORCE" to prerequisites of "po/git.pot".
>>  4. Patch 6/9: remove "FORCE" from prerequisites of "po/git.pot".
>>  5. Patch 8/9: reword, and reuse "$(gen_pot_header)" to prepare pot
>>                header for "po/git-core.pot".
>>  6. Patch 9/9: various updates on po/README.md.
>
> From skimming this the *.c.po v.s. *.c extension is still left in
> comments. I'm not saying you need to go for my suggestions, but it would
> be very useful in CL's to note things that were suggested but not
> changed, such as that.
>
> Right now I haven't paged that v2 discussion into my brain again, so I
> don't know if that was the only thing, it's the only thing I remember
> right now...

This fix-up below implements what I suggested on v2, so now the comments
in the generated file are correct, and don't refer to our intermediate
files:
	
	$ grep '#-#' po/git.pot
	#. #-#-#-#-#  git-add--interactive.perl  #-#-#-#-#
	#. #-#-#-#-#  add-patch.c  #-#-#-#-#
	#. #-#-#-#-#  git-add--interactive.perl  #-#-#-#-#
	#. #-#-#-#-#  branch.c  #-#-#-#-#
	#. #-#-#-#-#  object-name.c  #-#-#-#-#
	#. #-#-#-#-#  grep.c  #-#-#-#-#

I gathered that the reason you preferred the whole "grep -q PRItime" was
because you wanted to mitigate the effects of your IDE discovering these
files.

With the below you can define AGGRESSIVE_INTERMEDIATE and when you "make
pot" the generated *.c files will only exist for as long as they're
needed to generate the next step.

But if you do a subsequent "make pot" will be slower, as we'll of course
need to generate them again.

I think it's better to go in this direction, and rename that
AGGRESSIVE_INTERMEDIATE to something like
MAKE_AVOID_REAL_EXTENSIONS_IN_GITIGNORED_FILES or whatever.

I.e. our correctness shouldn't suffer because we're trying to work
around some issue in a specific (and optional) developer tooling.

There's also the fix there for the "header" dependency, but as noted in
another reply it should probably be dropped altogether...
	
diff --git a/Makefile b/Makefile
index d3eae150de9..0b96b55b63f 100644
--- a/Makefile
+++ b/Makefile
@@ -2736,6 +2736,7 @@ endif
 ## "po/git.pot" file.
 LOCALIZED_ALL_GEN_PO =
 
+LOCALIZED_C_GEN_C = $(LOCALIZED_C:%=.build/pot/po-munged/%)
 LOCALIZED_C_GEN_PO = $(LOCALIZED_C:%=.build/pot/po/%.po)
 LOCALIZED_ALL_GEN_PO += $(LOCALIZED_C_GEN_PO)
 
@@ -2745,26 +2746,19 @@ LOCALIZED_ALL_GEN_PO += $(LOCALIZED_SH_GEN_PO)
 LOCALIZED_PERL_GEN_PO = $(LOCALIZED_PERL:%=.build/pot/po/%.po)
 LOCALIZED_ALL_GEN_PO += $(LOCALIZED_PERL_GEN_PO)
 
-## Gettext tools cannot work with our own custom PRItime type, so
-## we replace PRItime with PRIuMAX.  We need to update this to
-## PRIdMAX if we switch to a signed type later.
-$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
+ifdef AGGRESSIVE_INTERMEDIATE
+.INTERMEDIATE: $(LOCALIZED_C_GEN_C)
+endif
+$(LOCALIZED_C_GEN_C): .build/pot/po-munged/%: %
 	$(call mkdir_p_parent_template)
-	$(QUIET_XGETTEXT) \
-	    if grep -q PRItime $<; then \
-		(\
-			sed -e 's|PRItime|PRIuMAX|g' <$< \
-				>.build/pot/po/$< && \
-			cd .build/pot/po && \
-			$(XGETTEXT) --omit-header \
-				-o $(@:.build/pot/po/%=%) \
-				$(XGETTEXT_FLAGS_C) $< && \
-			rm $<; \
-		); \
-	    else \
-		$(XGETTEXT) --omit-header \
-			-o $@ $(XGETTEXT_FLAGS_C) $<; \
-	    fi
+	$(QUIET_GEN)sed -e 's|PRItime|PRIuMAX|g' <$< >$@
+
+$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: .build/pot/po-munged/%
+	$(call mkdir_p_parent_template)
+	$(QUIET_XGETTEXT)( \
+		cd $(<D) && \
+		$(XGETTEXT) $(XGETTEXT_FLAGS_C) --omit-header -o - $(<F) \
+	) >$@
 
 $(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
 	$(call mkdir_p_parent_template)
@@ -2786,11 +2780,24 @@ sed -e 's|charset=CHARSET|charset=UTF-8|' \
 echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
 endef
 
-.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
+.build/pot/git.header:
 	$(call mkdir_p_parent_template)
 	$(QUIET_GEN)$(gen_pot_header)
 
-po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
+# We go through this dance of having a prepared
+# e.g. .build/pot/po/grep.c.po and copying it to
+# .build/pot/to-cat/grep.c only because some IDEs (e.g. VSCode) pick
+# up on the "real" extension for the purposes of auto-completion, even
+# if the .build directiory is in .gitignore.
+LOCALIZED_ALL_GEN_TO_CAT = $(LOCALIZED_ALL_GEN_PO:.build/pot/po/%.po=.build/pot/to-cat/%)
+ifdef AGGRESSIVE_INTERMEDIATE
+.INTERMEDIATE: $(LOCALIZED_ALL_GEN_TO_CAT)
+endif
+$(LOCALIZED_ALL_GEN_TO_CAT): .build/pot/to-cat/%: .build/pot/po/%.po
+	$(call mkdir_p_parent_template)
+	$(QUIET_GEN)cat $< >$@
+
+po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_TO_CAT)
 	$(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $^ >$@
 
 .PHONY: pot
Jiang Xin May 23, 2022, 8:26 a.m. UTC | #3
On Mon, May 23, 2022 at 3:25 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, May 23 2022, Jiang Xin wrote:
>
> > From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> >
> > A workflow change for translators are being proposed.
> >
> > Changes since v2:
> >
> >  1. Patch 1/9: reword.
> >  2. Patch 2/9: reword.
> >  3. Patch 3/9: reword, and add "FORCE" to prerequisites of "po/git.pot".
> >  4. Patch 6/9: remove "FORCE" from prerequisites of "po/git.pot".
> >  5. Patch 8/9: reword, and reuse "$(gen_pot_header)" to prepare pot
> >                header for "po/git-core.pot".
> >  6. Patch 9/9: various updates on po/README.md.
>
> From skimming this the *.c.po v.s. *.c extension is still left in
> comments. I'm not saying you need to go for my suggestions, but it would
> be very useful in CL's to note things that were suggested but not
> changed, such as that.

I've tried to improve some commit logs to make my point that we should
name the po files in ".build/po/" with ".po" extension instead of ".c"
extension. We can choose plan A, move forward with this patch series,
and start using the new workflow in 2.37. If you want, we can try Plan
B during next release cycle.

> Right now I haven't paged that v2 discussion into my brain again, so I
> don't know if that was the only thing, it's the only thing I remember
> right now...
>
> But let's read on:
>
> > Range-diff vs v2:
> >
> >  1:  c45f34f233 !  1:  362cd0cbe1 Makefile: sort "po/git.pot" by file location
> >     @@ Metadata
> >       ## Commit message ##
> >          Makefile: sort "po/git.pot" by file location
> >
> >     -    Before feeding xgettext with more C souce files which may be ignored
> >     -    by various compiler conditions, add new option "--sort-by-file" to
> >     -    xgettext program to create stable message template file "po/git.pot".
> >     +    We will feed xgettext with more C souce files and in different order in
> >     +    subsequent commit. To generate a stable "po/git.pot" regardless of the
> >     +    number and order of input source files, we add a new option
> >     +    "--sort-by-file" to xgettext program.
> >
> >          With this update, the newly generated "po/git.pot" will has the same
> >     -    entries while in a different order. We won't checkin the newly generated
> >     -    "po/git.pot", because we will remove it from tree in a later commit.
> >     +    entries while in a different order.
> >     +
> >     +    With the help of a custom diff driver as shown below,
> >     +
> >     +        git config --global diff.gettext-fmt.textconv \
> >     +            "msgcat --no-location --sort-by-file"
> >     +
> >     +    and appending a new entry "*.po diff=gettext-fmt" to git attributes,
> >     +    we can see that there are no substantial changes in "po/git.pot".
> >     +
> >     +    We won't checkin the newly generated "po/git.pot", because we will
> >     +    remove it from tree in a later commit.
>
>
> Does this actually work? This seems to suggest adding a driver for *.po,
> but using it against the *.pot file. Isn't that a typo (I haven't tested
> it)>

Thanks, it's really a typo. s/*.po/*.pot/

>
> >          But it is much simpler to use variables "$(FOUND_C_SOURCES)" and
> >     -    "$(FOUND_C_SOURCES)" to form a stable "LOCALIZED_C".
> >     +    "$(FOUND_C_SOURCES)" to form a stable "LOCALIZED_C". We also add
> >     +    "$(SCALAR_SOURCES)" files, which are part of C_OBJ but not included in
> >     +    "$(FOUND_C_SOURCES)" because they are in the "contrib/" directory.
>
> Thanks, good to note that.
>
> [snipped the rest, will re-read individual commits]
Jiang Xin May 23, 2022, 1:42 p.m. UTC | #4
On Mon, May 23, 2022 at 4:19 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, May 23 2022, Ævar Arnfjörð Bjarmason wrote:
>
> > On Mon, May 23 2022, Jiang Xin wrote:
> >
> >> From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> >>
> >> A workflow change for translators are being proposed.
> >>
> >> Changes since v2:
> >>
> >>  1. Patch 1/9: reword.
> >>  2. Patch 2/9: reword.
> >>  3. Patch 3/9: reword, and add "FORCE" to prerequisites of "po/git.pot".
> >>  4. Patch 6/9: remove "FORCE" from prerequisites of "po/git.pot".
> >>  5. Patch 8/9: reword, and reuse "$(gen_pot_header)" to prepare pot
> >>                header for "po/git-core.pot".
> >>  6. Patch 9/9: various updates on po/README.md.
> >
> > From skimming this the *.c.po v.s. *.c extension is still left in
> > comments. I'm not saying you need to go for my suggestions, but it would
> > be very useful in CL's to note things that were suggested but not
> > changed, such as that.
> >
> > Right now I haven't paged that v2 discussion into my brain again, so I
> > don't know if that was the only thing, it's the only thing I remember
> > right now...
>
> This fix-up below implements what I suggested on v2, so now the comments
> in the generated file are correct, and don't refer to our intermediate
> files:
>
>         $ grep '#-#' po/git.pot
>         #. #-#-#-#-#  git-add--interactive.perl  #-#-#-#-#
>         #. #-#-#-#-#  add-patch.c  #-#-#-#-#
>         #. #-#-#-#-#  git-add--interactive.perl  #-#-#-#-#
>         #. #-#-#-#-#  branch.c  #-#-#-#-#
>         #. #-#-#-#-#  object-name.c  #-#-#-#-#
>         #. #-#-#-#-#  grep.c  #-#-#-#-#
>
> I gathered that the reason you preferred the whole "grep -q PRItime" was
> because you wanted to mitigate the effects of your IDE discovering these
> files.
>
> With the below you can define AGGRESSIVE_INTERMEDIATE and when you "make
> pot" the generated *.c files will only exist for as long as they're
> needed to generate the next step.
>
> But if you do a subsequent "make pot" will be slower, as we'll of course
> need to generate them again.
>
> I think it's better to go in this direction, and rename that
> AGGRESSIVE_INTERMEDIATE to something like
> MAKE_AVOID_REAL_EXTENSIONS_IN_GITIGNORED_FILES or whatever.
>
> I.e. our correctness shouldn't suffer because we're trying to work
> around some issue in a specific (and optional) developer tooling.
>
> There's also the fix there for the "header" dependency, but as noted in
> another reply it should probably be dropped altogether...
>
> diff --git a/Makefile b/Makefile
> index d3eae150de9..0b96b55b63f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2736,6 +2736,7 @@ endif
>  ## "po/git.pot" file.
>  LOCALIZED_ALL_GEN_PO =
>
> +LOCALIZED_C_GEN_C = $(LOCALIZED_C:%=.build/pot/po-munged/%)

Intermediate C source files copied from the original location, and
PRItime will be replaced by PRIuMAX, if any.

>  LOCALIZED_C_GEN_PO = $(LOCALIZED_C:%=.build/pot/po/%.po)
>  LOCALIZED_ALL_GEN_PO += $(LOCALIZED_C_GEN_PO)
>
> @@ -2745,26 +2746,19 @@ LOCALIZED_ALL_GEN_PO += $(LOCALIZED_SH_GEN_PO)
>  LOCALIZED_PERL_GEN_PO = $(LOCALIZED_PERL:%=.build/pot/po/%.po)
>  LOCALIZED_ALL_GEN_PO += $(LOCALIZED_PERL_GEN_PO)
>
> -## Gettext tools cannot work with our own custom PRItime type, so
> -## we replace PRItime with PRIuMAX.  We need to update this to
> -## PRIdMAX if we switch to a signed type later.
> -$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
> +ifdef AGGRESSIVE_INTERMEDIATE
> +.INTERMEDIATE: $(LOCALIZED_C_GEN_C)

Intermediate files "$(LOCALIZED_C_GEN_C)" will be removed automatically.

> +endif
> +$(LOCALIZED_C_GEN_C): .build/pot/po-munged/%: %
>         $(call mkdir_p_parent_template)
> -       $(QUIET_XGETTEXT) \
> -           if grep -q PRItime $<; then \
> -               (\
> -                       sed -e 's|PRItime|PRIuMAX|g' <$< \
> -                               >.build/pot/po/$< && \
> -                       cd .build/pot/po && \
> -                       $(XGETTEXT) --omit-header \
> -                               -o $(@:.build/pot/po/%=%) \
> -                               $(XGETTEXT_FLAGS_C) $< && \
> -                       rm $<; \
> -               ); \
> -           else \
> -               $(XGETTEXT) --omit-header \
> -                       -o $@ $(XGETTEXT_FLAGS_C) $<; \
> -           fi
> +       $(QUIET_GEN)sed -e 's|PRItime|PRIuMAX|g' <$< >$@

Copy each source files in $(LOCALIZED_C) to corresponding intermediate
file ($(LOCALIZED_C_GEN_C)) in ".build/pot/po-munged/", replacing
PRItime with PRIuMAX if any.

> +
> +$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: .build/pot/po-munged/%
> +       $(call mkdir_p_parent_template)
> +       $(QUIET_XGETTEXT)( \
> +               cd $(<D) && \
> +               $(XGETTEXT) $(XGETTEXT_FLAGS_C) --omit-header -o - $(<F) \
> +       ) >$@

For each intermediate C source files in ".build/pot/po-munged/", call
xgettext to create corresponding po file in ".build/pot/po/"
directory.

>  $(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
>         $(call mkdir_p_parent_template)
> @@ -2786,11 +2780,24 @@ sed -e 's|charset=CHARSET|charset=UTF-8|' \
>  echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
>  endef
>
> -.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
> +.build/pot/git.header:

No. We should rebuild the pot header if any po file need to be update,
because we want to refresh the timestamp in the "POT-Creation-Date:"
filed of the pot header.

>         $(call mkdir_p_parent_template)
>         $(QUIET_GEN)$(gen_pot_header)
>
> -po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
> +# We go through this dance of having a prepared
> +# e.g. .build/pot/po/grep.c.po and copying it to
> +# .build/pot/to-cat/grep.c only because some IDEs (e.g. VSCode) pick
> +# up on the "real" extension for the purposes of auto-completion, even
> +# if the .build directiory is in .gitignore.
> +LOCALIZED_ALL_GEN_TO_CAT = $(LOCALIZED_ALL_GEN_PO:.build/pot/po/%.po=.build/pot/to-cat/%)
> +ifdef AGGRESSIVE_INTERMEDIATE
> +.INTERMEDIATE: $(LOCALIZED_ALL_GEN_TO_CAT)
> +endif
> +$(LOCALIZED_ALL_GEN_TO_CAT): .build/pot/to-cat/%: .build/pot/po/%.po
> +       $(call mkdir_p_parent_template)
> +       $(QUIET_GEN)cat $< >$@

Copy each po file in ".build/pot/po/" to another location
".build/pot/to-cat/", but without the ".po" extension.

Let's take "date.c" as an example:

1. Copy "date.c" to an intermediate C source file
".build/pot/po-munged/date.c" and replace PRItime with PRIuMAX in it.

2. Call xgettext to create  ".build/pot/po/date.c.po" from the
intermediate C source file ".build/pot/po-munged/date.c".

3. Optionally remove intermediate C source files like
".build/pot/po-munged/date.c". To have two identical C source files in
the same worktree is not good, some software may break. So I choose to
remove them.

4. Copy the po file (".build/pot/po/date.c.po") created in step 2 to
an intermediate fake C source file ".build/pot/to-cat/date.c" which is
a file without the ".po" extension. Please note this intermediate fake
C source file ".build/pot/to-cat/date.c" is not a valid C file, but a
PO file.

5. Call msgcat to create "po/git.pot" from all the intermediate fake C
source files including  ".build/pot/to-cat/date.c".

6. Optionally remove all the intermediate fake C source files in
".build/pot/to-cat/". I choose to remove them, because leave lots of
invalid C source files in worktree is not good.

For example, ".build/pot/po/date.c.po" was created from
> +
> +po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_TO_CAT)
>         $(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $^ >$@

7. "po/git.pot" depends on the intermediate fake C source files. If
any single C source file has been changed, will run step 6 to copy all
po files in ".build/pot/po" to corresponding fake C source files in
".build/pot/to-cat/", if we choose to remove these intermediate fake C
source files.

This implementation is too heavy to solve a trivial issue. I think we
can push forward this patch series and leave these comments in
"po/git.pot":

>         $ grep '#-#' po/git.pot
>         #. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
>         #. #-#-#-#-#  add-patch.c.po  #-#-#-#-#
>         #. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
>         #. #-#-#-#-#  branch.c.po  #-#-#-#-#
>         #. #-#-#-#-#  object-name.c.po  #-#-#-#-#
>         #. #-#-#-#-#  grep.c.po  #-#-#-#-#
Ævar Arnfjörð Bjarmason May 23, 2022, 2:38 p.m. UTC | #5
On Mon, May 23 2022, Jiang Xin wrote:

> On Mon, May 23, 2022 at 4:19 PM Ævar Arnfjörð Bjarmason
>>  $(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
>>         $(call mkdir_p_parent_template)
>> @@ -2786,11 +2780,24 @@ sed -e 's|charset=CHARSET|charset=UTF-8|' \
>>  echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
>>  endef
>>
>> -.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
>> +.build/pot/git.header:
>
> No. We should rebuild the pot header if any po file need to be update,
> because we want to refresh the timestamp in the "POT-Creation-Date:"
> filed of the pot header.

Okey, I did leave a question about this in an earlier E-Mail though,
i.e. does anything actually rely on this, or the header at all, or is
this just cargo-culting?

I haven't found anything in our toolchain that cares about the header at
all (for the *.pot, not *.po!) let alone the update timestamp.

Except insofar as e.g. Emacs will add a timestamp or update it if it
finds a header already.

>>         $(call mkdir_p_parent_template)
>>         $(QUIET_GEN)$(gen_pot_header)
>>
>> -po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
>> +# We go through this dance of having a prepared
>> +# e.g. .build/pot/po/grep.c.po and copying it to
>> +# .build/pot/to-cat/grep.c only because some IDEs (e.g. VSCode) pick
>> +# up on the "real" extension for the purposes of auto-completion, even
>> +# if the .build directiory is in .gitignore.
>> +LOCALIZED_ALL_GEN_TO_CAT = $(LOCALIZED_ALL_GEN_PO:.build/pot/po/%.po=.build/pot/to-cat/%)
>> +ifdef AGGRESSIVE_INTERMEDIATE
>> +.INTERMEDIATE: $(LOCALIZED_ALL_GEN_TO_CAT)
>> +endif
>> +$(LOCALIZED_ALL_GEN_TO_CAT): .build/pot/to-cat/%: .build/pot/po/%.po
>> +       $(call mkdir_p_parent_template)
>> +       $(QUIET_GEN)cat $< >$@
>
> Copy each po file in ".build/pot/po/" to another location
> ".build/pot/to-cat/", but without the ".po" extension.
>
> Let's take "date.c" as an example:
>
> 1. Copy "date.c" to an intermediate C source file
> ".build/pot/po-munged/date.c" and replace PRItime with PRIuMAX in it.
>
> 2. Call xgettext to create  ".build/pot/po/date.c.po" from the
> intermediate C source file ".build/pot/po-munged/date.c".
>
> 3. Optionally remove intermediate C source files like
> ".build/pot/po-munged/date.c". To have two identical C source files in
> the same worktree is not good, some software may break. So I choose to
> remove them.
>
> 4. Copy the po file (".build/pot/po/date.c.po") created in step 2 to
> an intermediate fake C source file ".build/pot/to-cat/date.c" which is
> a file without the ".po" extension. Please note this intermediate fake
> C source file ".build/pot/to-cat/date.c" is not a valid C file, but a
> PO file.
>
> 5. Call msgcat to create "po/git.pot" from all the intermediate fake C
> source files including  ".build/pot/to-cat/date.c".
>
> 6. Optionally remove all the intermediate fake C source files in
> ".build/pot/to-cat/". I choose to remove them, because leave lots of
> invalid C source files in worktree is not good.
>
> For example, ".build/pot/po/date.c.po" was created from
>> +
>> +po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_TO_CAT)
>>         $(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $^ >$@
>
> 7. "po/git.pot" depends on the intermediate fake C source files. If
> any single C source file has been changed, will run step 6 to copy all
> po files in ".build/pot/po" to corresponding fake C source files in
> ".build/pot/to-cat/", if we choose to remove these intermediate fake C
> source files.
>
> This implementation is too heavy to solve a trivial issue. I think we
> can push forward this patch series and leave these comments in
> "po/git.pot":

If you find it too "heavy" & are trying to optimize it for some reason
then that whole extra special-dance can be made conditional on
MAKE_AVOID_REAL_EXTENSIONS_IN_GITIGNORED_FILES.

But really, it's 15MB of .build/pot in my local HEAD with this fix-up,
it's 1.4MB without it, but this whole thing just seems like premature
optimization. Especially given:
    
    $ git hyperfine -r 3 -L rev origin/master,HEAD~,HEAD,avar/Makefile-incremental-po-git-pot-rule~,avar/Makefile-incremental-po-git-pot-rule -p 'git clean -dxf; git reset --hard' 'make pot' --warmup 1
    Benchmark 1: make pot' in 'origin/master
      Time (mean ± σ):      1.970 s ±  0.014 s    [User: 1.683 s, System: 0.353 s]
      Range (min … max):    1.955 s …  1.982 s    3 runs
    
    Benchmark 2: make pot' in 'HEAD~
      Time (mean ± σ):     931.3 ms ±   4.7 ms    [User: 3358.5 ms, System: 1088.7 ms]
      Range (min … max):   927.0 ms … 936.3 ms    3 runs
    
    Benchmark 3: make pot' in 'HEAD
      Time (mean ± σ):      1.506 s ±  0.389 s    [User: 4.655 s, System: 1.363 s]
      Range (min … max):    1.257 s …  1.955 s    3 runs
    
    Benchmark 4: make pot' in 'avar/Makefile-incremental-po-git-pot-rule~
      Time (mean ± σ):      1.015 s ±  0.002 s    [User: 3.615 s, System: 1.224 s]
      Range (min … max):    1.013 s …  1.017 s    3 runs
    
    Benchmark 5: make pot' in 'avar/Makefile-incremental-po-git-pot-rule
      Time (mean ± σ):      1.014 s ±  0.008 s    [User: 3.540 s, System: 1.068 s]
      Range (min … max):    1.007 s …  1.023 s    3 runs
    
    Summary
      'make pot' in 'HEAD~' ran
        1.09 ± 0.01 times faster than 'make pot' in 'avar/Makefile-incremental-po-git-pot-rule'
        1.09 ± 0.01 times faster than 'make pot' in 'avar/Makefile-incremental-po-git-pot-rule~'
        1.62 ± 0.42 times faster than 'make pot' in 'HEAD'
        2.12 ± 0.02 times faster than 'make pot' in 'origin/master'

I.e. all of this is much faster than what we have on "master" now. My
22434ef36ae (Makefile: avoid "sed" on C files that don't need it,
2022-04-08) (avar/Makefile-incremental-po-git-pot-rule) is then just 10%
slower than the "grep or xgettext", its "~" is the corresponding
unoptimized.

The HEAD here is with my fix-up, and HEAD~ is your series here.

Anyway, if you really feel strongly about it let's go with your way of
doing it.

It just sounded like you weren't actually trying top optimize anything,
but to work around your editor. So if we had a method to do that....

...except it seems you also care about making it much faster than
"master" (or care about <20MB of disk space), which to be blunt seems a
bit crazy to me :) Last I checked "make test" ended up creating ~1GB of
data (not all at once, but in parallel testing a lot more than 10MB is
often in play at once).

As this was a pretty obscure target that I only expect CI, you,
translators & me to run in practice a small difference in the initial
run didn't seem to matter, especially as it's all an improvement over
"master".

Anyway, you do whatever you think is best with that :)

>>         $ grep '#-#' po/git.pot
>>         #. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
>>         #. #-#-#-#-#  add-patch.c.po  #-#-#-#-#
>>         #. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
>>         #. #-#-#-#-#  branch.c.po  #-#-#-#-#
>>         #. #-#-#-#-#  object-name.c.po  #-#-#-#-#
>>         #. #-#-#-#-#  grep.c.po  #-#-#-#-#
Jiang Xin May 23, 2022, 4:13 p.m. UTC | #6
On Mon, May 23, 2022 at 10:56 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, May 23 2022, Jiang Xin wrote:
>
> > On Mon, May 23, 2022 at 4:19 PM Ævar Arnfjörð Bjarmason
> >>  $(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: %
> >>         $(call mkdir_p_parent_template)
> >> @@ -2786,11 +2780,24 @@ sed -e 's|charset=CHARSET|charset=UTF-8|' \
> >>  echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@
> >>  endef
> >>
> >> -.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO)
> >> +.build/pot/git.header:
> >
> > No. We should rebuild the pot header if any po file need to be update,
> > because we want to refresh the timestamp in the "POT-Creation-Date:"
> > filed of the pot header.
>
> Okey, I did leave a question about this in an earlier E-Mail though,
> i.e. does anything actually rely on this, or the header at all, or is
> this just cargo-culting?
>
> I haven't found anything in our toolchain that cares about the header at
> all (for the *.pot, not *.po!) let alone the update timestamp.

When creating a new po/XX.po manually using msginit from POT file with
or without a header, the new generated po/XX.po has different header.

  $ msginit -i po/git.pot -o po/XX-with-header.po \
      --locale=ja --no-translator
  $ msginit -i po/git-headless.pot -o po/XX-without-header.po \
      --locale=ja --no-translator
  $ diff po/XX-with-header.po po/XX-without-header.po
  1,5d0
  < # Japanese translations for Git package.
  < # Copyright (C) 2022 THE Git'S COPYRIGHT HOLDER
  < # This file is distributed under the same license as the Git package.
  < # Automatically generated, 2022.
  < #
  8,11c3
  < "Project-Id-Version: Git\n"
  < "Report-Msgid-Bugs-To: Git Mailing List <git@vger.kernel.org>\n"
  < "POT-Creation-Date: 2022-05-23 23:27+0800\n"
  < "PO-Revision-Date: 2022-05-23 23:27+0800\n"
  ---
  > "Project-Id-Version: git 2.36.0.7.g31429651cf.dirty\n"
  16c8
  < "Content-Type: text/plain; charset=UTF-8\n"
  ---
  > "Content-Type: text/plain; charset=ASCII\n"

Should we ignore this change?

>
> >>         $(call mkdir_p_parent_template)
> >>         $(QUIET_GEN)$(gen_pot_header)
> >>
> >> -po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_PO)
> >> +# We go through this dance of having a prepared
> >> +# e.g. .build/pot/po/grep.c.po and copying it to
> >> +# .build/pot/to-cat/grep.c only because some IDEs (e.g. VSCode) pick
> >> +# up on the "real" extension for the purposes of auto-completion, even
> >> +# if the .build directiory is in .gitignore.
> >> +LOCALIZED_ALL_GEN_TO_CAT = $(LOCALIZED_ALL_GEN_PO:.build/pot/po/%.po=.build/pot/to-cat/%)
> >> +ifdef AGGRESSIVE_INTERMEDIATE
> >> +.INTERMEDIATE: $(LOCALIZED_ALL_GEN_TO_CAT)
> >> +endif
> >> +$(LOCALIZED_ALL_GEN_TO_CAT): .build/pot/to-cat/%: .build/pot/po/%.po
> >> +       $(call mkdir_p_parent_template)
> >> +       $(QUIET_GEN)cat $< >$@
> >
> > Copy each po file in ".build/pot/po/" to another location
> > ".build/pot/to-cat/", but without the ".po" extension.
> >
> > Let's take "date.c" as an example:
> >
> > 1. Copy "date.c" to an intermediate C source file
> > ".build/pot/po-munged/date.c" and replace PRItime with PRIuMAX in it.
> >
> > 2. Call xgettext to create  ".build/pot/po/date.c.po" from the
> > intermediate C source file ".build/pot/po-munged/date.c".
> >
> > 3. Optionally remove intermediate C source files like
> > ".build/pot/po-munged/date.c". To have two identical C source files in
> > the same worktree is not good, some software may break. So I choose to
> > remove them.
> >
> > 4. Copy the po file (".build/pot/po/date.c.po") created in step 2 to
> > an intermediate fake C source file ".build/pot/to-cat/date.c" which is
> > a file without the ".po" extension. Please note this intermediate fake
> > C source file ".build/pot/to-cat/date.c" is not a valid C file, but a
> > PO file.
> >
> > 5. Call msgcat to create "po/git.pot" from all the intermediate fake C
> > source files including  ".build/pot/to-cat/date.c".
> >
> > 6. Optionally remove all the intermediate fake C source files in
> > ".build/pot/to-cat/". I choose to remove them, because leave lots of
> > invalid C source files in worktree is not good.
> >
> > For example, ".build/pot/po/date.c.po" was created from
> >> +
> >> +po/git.pot: .build/pot/git.header $(LOCALIZED_ALL_GEN_TO_CAT)
> >>         $(QUIET_GEN)$(MSGCAT) $(MSGCAT_FLAGS) $^ >$@
> >
> > 7. "po/git.pot" depends on the intermediate fake C source files. If
> > any single C source file has been changed, will run step 6 to copy all
> > po files in ".build/pot/po" to corresponding fake C source files in
> > ".build/pot/to-cat/", if we choose to remove these intermediate fake C
> > source files.
> >
> > This implementation is too heavy to solve a trivial issue. I think we
> > can push forward this patch series and leave these comments in
> > "po/git.pot":
>
> If you find it too "heavy" & are trying to optimize it for some reason
> then that whole extra special-dance can be made conditional on
> MAKE_AVOID_REAL_EXTENSIONS_IN_GITIGNORED_FILES.
>
> But really, it's 15MB of .build/pot in my local HEAD with this fix-up,
> it's 1.4MB without it, but this whole thing just seems like premature
> optimization. Especially given:
>
>     $ git hyperfine -r 3 -L rev origin/master,HEAD~,HEAD,avar/Makefile-incremental-po-git-pot-rule~,avar/Makefile-incremental-po-git-pot-rule -p 'git clean -dxf; git reset --hard' 'make pot' --warmup 1
>     Benchmark 1: make pot' in 'origin/master
>       Time (mean ± σ):      1.970 s ±  0.014 s    [User: 1.683 s, System: 0.353 s]
>       Range (min … max):    1.955 s …  1.982 s    3 runs
>
>     Benchmark 2: make pot' in 'HEAD~
>       Time (mean ± σ):     931.3 ms ±   4.7 ms    [User: 3358.5 ms, System: 1088.7 ms]
>       Range (min … max):   927.0 ms … 936.3 ms    3 runs
>
>     Benchmark 3: make pot' in 'HEAD
>       Time (mean ± σ):      1.506 s ±  0.389 s    [User: 4.655 s, System: 1.363 s]
>       Range (min … max):    1.257 s …  1.955 s    3 runs
>
>     Benchmark 4: make pot' in 'avar/Makefile-incremental-po-git-pot-rule~
>       Time (mean ± σ):      1.015 s ±  0.002 s    [User: 3.615 s, System: 1.224 s]
>       Range (min … max):    1.013 s …  1.017 s    3 runs
>
>     Benchmark 5: make pot' in 'avar/Makefile-incremental-po-git-pot-rule
>       Time (mean ± σ):      1.014 s ±  0.008 s    [User: 3.540 s, System: 1.068 s]
>       Range (min … max):    1.007 s …  1.023 s    3 runs
>
>     Summary
>       'make pot' in 'HEAD~' ran
>         1.09 ± 0.01 times faster than 'make pot' in 'avar/Makefile-incremental-po-git-pot-rule'
>         1.09 ± 0.01 times faster than 'make pot' in 'avar/Makefile-incremental-po-git-pot-rule~'
>         1.62 ± 0.42 times faster than 'make pot' in 'HEAD'
>         2.12 ± 0.02 times faster than 'make pot' in 'origin/master'
>
> I.e. all of this is much faster than what we have on "master" now. My
> 22434ef36ae (Makefile: avoid "sed" on C files that don't need it,
> 2022-04-08) (avar/Makefile-incremental-po-git-pot-rule) is then just 10%
> slower than the "grep or xgettext", its "~" is the corresponding
> unoptimized.
>
> The HEAD here is with my fix-up, and HEAD~ is your series here.
>
> Anyway, if you really feel strongly about it let's go with your way of
> doing it.
>
> It just sounded like you weren't actually trying top optimize anything,
> but to work around your editor. So if we had a method to do that....

But for users who prefer to delete all the intermediate files in
".build/pot/to-cat" and ".build/pot/po-munged/", then they will get
performance penalty.

> ...except it seems you also care about making it much faster than
> "master" (or care about <20MB of disk space), which to be blunt seems a
> bit crazy to me :) Last I checked "make test" ended up creating ~1GB of
> data (not all at once, but in parallel testing a lot more than 10MB is
> often in play at once).
>
> As this was a pretty obscure target that I only expect CI, you,
> translators & me to run in practice a small difference in the initial
> run didn't seem to matter, especially as it's all an improvement over
> "master".
>
> Anyway, you do whatever you think is best with that :)
>
> >>         $ grep '#-#' po/git.pot
> >>         #. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
> >>         #. #-#-#-#-#  add-patch.c.po  #-#-#-#-#
> >>         #. #-#-#-#-#  git-add--interactive.perl.po  #-#-#-#-#
> >>         #. #-#-#-#-#  branch.c.po  #-#-#-#-#
> >>         #. #-#-#-#-#  object-name.c.po  #-#-#-#-#
> >>         #. #-#-#-#-#  grep.c.po  #-#-#-#-#
>