diff mbox series

Makefile: dedup git-ls-files output to prevent duplicate targets

Message ID 20220526021540.2812-1-worldhello.net@gmail.com (mailing list archive)
State New
Headers show
Series Makefile: dedup git-ls-files output to prevent duplicate targets | expand

Commit Message

Jiang Xin May 26, 2022, 2:15 a.m. UTC
From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

If there are unresolved conflicts left in the working tree, "make" may
report warnings as follows:

    Makefile:xxxx: target '.build/pot/po/FOO.c.po' given more than once
                   in the same rule

The duplicate targets are introduced by the following pattern rule we
added in the preceding commit for incremental build of "po/git.pot",

    $(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %

and the duplicate entries in $(LOCALIZED_C_GEN_PO) come from the
"git ls-files" command in SOURCES_CMD.

We can pass the option "--deduplicate" to git-ls-files to suppress
duplicate entries for unresolved conflicts.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Junio C Hamano May 26, 2022, 4:02 a.m. UTC | #1
Jiang Xin <worldhello.net@gmail.com> writes:

> From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
>
> If there are unresolved conflicts left in the working tree, "make" may
> report warnings as follows:
>
>     Makefile:xxxx: target '.build/pot/po/FOO.c.po' given more than once
>                    in the same rule
>
> The duplicate targets are introduced by the following pattern rule we
> added in the preceding commit for incremental build of "po/git.pot",
>
>     $(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
>
> and the duplicate entries in $(LOCALIZED_C_GEN_PO) come from the
> "git ls-files" command in SOURCES_CMD.
>
> We can pass the option "--deduplicate" to git-ls-files to suppress
> duplicate entries for unresolved conflicts.

Thanks for a quick response.

We certainly can say "your SOURCES_CMD MUST NOT produce duplicates"
and passing the --deduplicate option is one valid way to fix this
specific case.

But I wonder if a more future-proof solution is to dedup the output
of the SOURCES_CMD ourselves on the Makefile side.  That way, even
if we update SOURCES_CMD in a way that could contain duplicates, we
won't have to worry about duplicates.

---

It feels way overkill to "sort" the list just to dedup its elements,
but that is what GNU Make documentation info page recommends us to
do, and we already do use it for deduplication in our Makefile
twice.

'$(sort LIST)'
     Sorts the words of LIST in lexical order, removing duplicate words.
     The output is a list of words separated by single spaces.  Thus,

          $(sort foo bar lose)

     returns the value 'bar foo lose'.

     Incidentally, since 'sort' removes duplicate words, you can use it
     for this purpose even if you don't care about the sort order.


diff --git i/Makefile w/Makefile
index 2b61f66259..1d3d3deba1 100644
--- i/Makefile
+++ w/Makefile
@@ -860,7 +860,7 @@ SOURCES_CMD = ( \
 		-o \( -name '*.sh' -type f -print \) \
 		| sed -e 's|^\./||' \
 	)
-FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
+FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))
 
 FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
 FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))
Jiang Xin May 26, 2022, 6:06 a.m. UTC | #2
On Thu, May 26, 2022 at 12:03 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jiang Xin <worldhello.net@gmail.com> writes:
>
> > From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> >
> > If there are unresolved conflicts left in the working tree, "make" may
> > report warnings as follows:
> >
> >     Makefile:xxxx: target '.build/pot/po/FOO.c.po' given more than once
> >                    in the same rule
> >
> > The duplicate targets are introduced by the following pattern rule we
> > added in the preceding commit for incremental build of "po/git.pot",
> >
> >     $(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: %
> >
> > and the duplicate entries in $(LOCALIZED_C_GEN_PO) come from the
> > "git ls-files" command in SOURCES_CMD.
> >
> > We can pass the option "--deduplicate" to git-ls-files to suppress
> > duplicate entries for unresolved conflicts.
>
> Thanks for a quick response.
>
> We certainly can say "your SOURCES_CMD MUST NOT produce duplicates"
> and passing the --deduplicate option is one valid way to fix this
> specific case.
>
> But I wonder if a more future-proof solution is to dedup the output
> of the SOURCES_CMD ourselves on the Makefile side.  That way, even
> if we update SOURCES_CMD in a way that could contain duplicates, we
> won't have to worry about duplicates.
>
> ---
>
> It feels way overkill to "sort" the list just to dedup its elements,
> but that is what GNU Make documentation info page recommends us to
> do, and we already do use it for deduplication in our Makefile
> twice.
>
> '$(sort LIST)'
>      Sorts the words of LIST in lexical order, removing duplicate words.
>      The output is a list of words separated by single spaces.  Thus,
>
>           $(sort foo bar lose)
>
>      returns the value 'bar foo lose'.
>
>      Incidentally, since 'sort' removes duplicate words, you can use it
>      for this purpose even if you don't care about the sort order.
>
>
> diff --git i/Makefile w/Makefile
> index 2b61f66259..1d3d3deba1 100644
> --- i/Makefile
> +++ w/Makefile
> @@ -860,7 +860,7 @@ SOURCES_CMD = ( \
>                 -o \( -name '*.sh' -type f -print \) \
>                 | sed -e 's|^\./||' \
>         )
> -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
> +FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))
>
>  FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
>  FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))
>

If I disabled the git-ls-files command like below,

    @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H)
     ## Exhaustive lists of our source files, either dynamically generated,
     ## or hardcoded.
     SOURCES_CMD = ( \
    -       git ls-files --deduplicate \
    +       git bad-ls-files --deduplicate \
                    '*.[hcS]' \
                    '*.sh' \
                    ':!*[tp][0-9][0-9][0-9][0-9]*' \

, and run "make", will display the following warnings:

    Makefile:2751: target `.build/pot/po/command-list.h.po' given more
than once in the same rule.
    Makefile:2751: target `.build/pot/po/config-list.h.po' given more
than once in the same rule.
    Makefile:2751: target `.build/pot/po/hook-list.h.po' given more
than once in the same rule.

This is because the three generated header files (defined in
$(GENERATED_H)) are also included in the result of "SOURCES_CMD". We
can fix this by sorting LOCALIZED_C:

    -LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
    -             $(GENERATED_H)
    +LOCALIZED_C = $(sort $(FOUND_C_SOURCES) $(FOUND_H_SOURCES)
$(SCALAR_SOURCES) \
    +               $(GENERATED_H))

Will send v2 patch.
Junio C Hamano May 26, 2022, 6:23 a.m. UTC | #3
Jiang Xin <worldhello.net@gmail.com> writes:

>>         )
>> -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
>> +FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))
>>
>>  FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
>>  FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))
>>
>
> If I disabled the git-ls-files command like below,
>
>     @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H)
>...
> This is because the three generated header files (defined in
> $(GENERATED_H)) are also included in the result of "SOURCES_CMD". We
> can fix this by sorting LOCALIZED_C:
>
>     -LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
>     -             $(GENERATED_H)
>     +LOCALIZED_C = $(sort $(FOUND_C_SOURCES) $(FOUND_H_SOURCES)
> $(SCALAR_SOURCES) \
>     +               $(GENERATED_H))

If you make FOUND_SOURCE_FILES unique upfront, the at least there
wouldn't be any duplicates there.  Do you mean that some of what is
in FOUND_SOURCE_FILES appear in either SCALAR_SOURCES or GENERATED_H?

If not, I think deduplicating near the source of the issue, i.e.

  FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))

may be sufficient.  Deduplicating near the consumer, like
LOCALIZED_C, may force us to dedup all the consumers of it (e.g.
LOCALIZED_C is not the sole consumer of FOUND_C_SOURCES; you'd need
to sort the input to COCCI_SOURCES, for example).

Thanks.
Jiang Xin May 26, 2022, 7:04 a.m. UTC | #4
On Thu, May 26, 2022 at 2:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jiang Xin <worldhello.net@gmail.com> writes:
>
> >>         )
> >> -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
> >> +FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))
> >>
> >>  FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
> >>  FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))
> >>
> >
> > If I disabled the git-ls-files command like below,
> >
> >     @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H)
> >...
> > This is because the three generated header files (defined in
> > $(GENERATED_H)) are also included in the result of "SOURCES_CMD". We
> > can fix this by sorting LOCALIZED_C:
> >
> >     -LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
> >     -             $(GENERATED_H)
> >     +LOCALIZED_C = $(sort $(FOUND_C_SOURCES) $(FOUND_H_SOURCES)
> > $(SCALAR_SOURCES) \
> >     +               $(GENERATED_H))
>
> If you make FOUND_SOURCE_FILES unique upfront, the at least there
> wouldn't be any duplicates there.  Do you mean that some of what is
> in FOUND_SOURCE_FILES appear in either SCALAR_SOURCES or GENERATED_H?

Yes. If find source files using "git-ls-files", the generated headers
are not included in FOUND_SOURCE_FILES, but this is not the case for
the find command.

> If not, I think deduplicating near the source of the issue, i.e.
>
>   FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))

If we pass the "--deduplicate" option to git-ls-files, we can make
sure files are unique in FOUND_SOURCE_FILES. So sorting
FOUND_SOURCE_FILES becomes unnecessary. But FOUND_SOURCE_FILES may
contain generated files if using find instead of git-ls-files in
SOURCES_CMD, that means sort FOUND_SOURCE_FILES is not enough. Instead
of sorting it, we may want to filter-out the generated files from it,
like:

    FOUND_SOURCE_FILES := $(filter-out $(GENERATED_H),$(shell $(SOURCES_CMD)))

Exclude the several generated headers by add extra fixed options to
find command is not good, but we may need to exclude the ".build"
directory from the find command in SOURCES_CMD, in case we may
duplicate C files in it in future version.


    --- a/Makefile
    +++ b/Makefile
    @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H)
    ## Exhaustive lists of our source files, either dynamically generated,
    ## or hardcoded.
    SOURCES_CMD = ( \
    -       git ls-files \
    +       git ls-files --deduplicate \
                   '*.[hcS]' \
                   '*.sh' \
                   ':!*[tp][0-9][0-9][0-9][0-9]*' \
    @@ -857,12 +857,13 @@ SOURCES_CMD = ( \
                   -o \( -name '[tp][0-9][0-9][0-9][0-9]*' -prune \) \
                   -o \( -name contrib -type d -prune \) \
                   -o \( -name build -type d -prune \) \
    +               -o \( -name .build -type d -prune \) \
                   -o \( -name 'trash*' -type d -prune \) \
                   -o \( -name '*.[hcS]' -type f -print \) \
                   -o \( -name '*.sh' -type f -print \) \
                   | sed -e 's|^\./||' \
           )
    -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
    +FOUND_SOURCE_FILES := $(filter-out $(GENERATED_H),$(shell $(SOURCES_CMD)))

    FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
    FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))

> may be sufficient.  Deduplicating near the consumer, like
> LOCALIZED_C, may force us to dedup all the consumers of it (e.g.
> LOCALIZED_C is not the sole consumer of FOUND_C_SOURCES; you'd need
> to sort the input to COCCI_SOURCES, for example).

If we apply the above patch, sorting LOCALIZED_C is not necessary.

--
Jiang Xin
Ævar Arnfjörð Bjarmason May 26, 2022, 10 a.m. UTC | #5
On Thu, May 26 2022, Jiang Xin wrote:

> On Thu, May 26, 2022 at 2:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Jiang Xin <worldhello.net@gmail.com> writes:
>>
>> >>         )
>> >> -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
>> >> +FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))
>> >>
>> >>  FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
>> >>  FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))
>> >>
>> >
>> > If I disabled the git-ls-files command like below,
>> >
>> >     @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H)
>> >...
>> > This is because the three generated header files (defined in
>> > $(GENERATED_H)) are also included in the result of "SOURCES_CMD". We
>> > can fix this by sorting LOCALIZED_C:
>> >
>> >     -LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
>> >     -             $(GENERATED_H)
>> >     +LOCALIZED_C = $(sort $(FOUND_C_SOURCES) $(FOUND_H_SOURCES)
>> > $(SCALAR_SOURCES) \
>> >     +               $(GENERATED_H))
>>
>> If you make FOUND_SOURCE_FILES unique upfront, the at least there
>> wouldn't be any duplicates there.  Do you mean that some of what is
>> in FOUND_SOURCE_FILES appear in either SCALAR_SOURCES or GENERATED_H?
>
> Yes. If find source files using "git-ls-files", the generated headers
> are not included in FOUND_SOURCE_FILES, but this is not the case for
> the find command.
>
>> If not, I think deduplicating near the source of the issue, i.e.
>>
>>   FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))
>
> If we pass the "--deduplicate" option to git-ls-files, we can make
> sure files are unique in FOUND_SOURCE_FILES. So sorting
> FOUND_SOURCE_FILES becomes unnecessary. But FOUND_SOURCE_FILES may
> contain generated files if using find instead of git-ls-files in
> SOURCES_CMD, that means sort FOUND_SOURCE_FILES is not enough. Instead
> of sorting it, we may want to filter-out the generated files from it,
> like:
>
>     FOUND_SOURCE_FILES := $(filter-out $(GENERATED_H),$(shell $(SOURCES_CMD)))
>
> Exclude the several generated headers by add extra fixed options to
> find command is not good, but we may need to exclude the ".build"
> directory from the find command in SOURCES_CMD, in case we may
> duplicate C files in it in future version.
>
>
>     --- a/Makefile
>     +++ b/Makefile
>     @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H)
>     ## Exhaustive lists of our source files, either dynamically generated,
>     ## or hardcoded.
>     SOURCES_CMD = ( \
>     -       git ls-files \
>     +       git ls-files --deduplicate \
>                    '*.[hcS]' \
>                    '*.sh' \
>                    ':!*[tp][0-9][0-9][0-9][0-9]*' \
>     @@ -857,12 +857,13 @@ SOURCES_CMD = ( \
>                    -o \( -name '[tp][0-9][0-9][0-9][0-9]*' -prune \) \
>                    -o \( -name contrib -type d -prune \) \
>                    -o \( -name build -type d -prune \) \
>     +               -o \( -name .build -type d -prune \) \

This change is good and something we should do in any case, I think (and
this is my fault) that due to this we'd have broken other things,
e.g. included these in the TAGS file if we don't have "git ls-files".

>                    -o \( -name 'trash*' -type d -prune \) \
>                    -o \( -name '*.[hcS]' -type f -print \) \
>                    -o \( -name '*.sh' -type f -print \) \
>                    | sed -e 's|^\./||' \
>            )
>     -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
>     +FOUND_SOURCE_FILES := $(filter-out $(GENERATED_H),$(shell $(SOURCES_CMD)))
>
>     FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
>     FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))
>
>> may be sufficient.  Deduplicating near the consumer, like
>> LOCALIZED_C, may force us to dedup all the consumers of it (e.g.
>> LOCALIZED_C is not the sole consumer of FOUND_C_SOURCES; you'd need
>> to sort the input to COCCI_SOURCES, for example).
>
> If we apply the above patch, sorting LOCALIZED_C is not necessary.

Per earlier feedback in
https://lore.kernel.org/git/220519.86tu9l6fw4.gmgdl@evledraar.gmail.com/
this all seesm a bit too complex, especially now.

I pointed out then that with --sort-by-file added we:

 * Don't group the translations by C/SH/Perl anymore
 * Change the sort order within files, to be line/sorted instead of
   line/order (i.e. first occurring translations first)

I suggested then to just use $(sort) on the respective lists.

So why not just:

 1. Switch to the $(FOUND_C_SOURCES) (good)
 2. Filter that by C/Perl/SH as before (just a simple $(filter)
 3. $(sort) that (which as noted, also de-dupes it)

Then we don't have any of the behavior change of --sort-by-file, and we
don't have to carefully curate the ls-files/find commands to not include
duplicates (although as seen here that seems to have been a useful
canary in the "find" case).
Jiang Xin May 26, 2022, 11:06 a.m. UTC | #6
On Thu, May 26, 2022 at 6:04 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> > If we apply the above patch, sorting LOCALIZED_C is not necessary.
>
> Per earlier feedback in
> https://lore.kernel.org/git/220519.86tu9l6fw4.gmgdl@evledraar.gmail.com/
> this all seesm a bit too complex, especially now.
>
> I pointed out then that with --sort-by-file added we:
>
>  * Don't group the translations by C/SH/Perl anymore

I missed this point in the previous discussion. Will take this into
account in next reroll.

>  * Change the sort order within files, to be line/sorted instead of
>    line/order (i.e. first occurring translations first)
>
> I suggested then to just use $(sort) on the respective lists.
>
> So why not just:
>
>  1. Switch to the $(FOUND_C_SOURCES) (good)
>  2. Filter that by C/Perl/SH as before (just a simple $(filter)
>  3. $(sort) that (which as noted, also de-dupes it)

Will try this direction.

> Then we don't have any of the behavior change of --sort-by-file, and we
> don't have to carefully curate the ls-files/find commands to not include
> duplicates (although as seen here that seems to have been a useful
> canary in the "find" case).
Junio C Hamano May 26, 2022, 5:18 p.m. UTC | #7
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I pointed out then that with --sort-by-file added we:
>
>  * Don't group the translations by C/SH/Perl anymore
>  * Change the sort order within files, to be line/sorted instead of
>    line/order (i.e. first occurring translations first)
>
> I suggested then to just use $(sort) on the respective lists.
>
> So why not just:
>
>  1. Switch to the $(FOUND_C_SOURCES) (good)
>  2. Filter that by C/Perl/SH as before (just a simple $(filter)
>  3. $(sort) that (which as noted, also de-dupes it)
>
> Then we don't have any of the behavior change of --sort-by-file, and we
> don't have to carefully curate the ls-files/find commands to not include
> duplicates (although as seen here that seems to have been a useful
> canary in the "find" case).

Does "--sort-by-file" really mean that?

The option is documented to sort output by file location, but does
it mean without the option (i.e. default), there is no guarantee in
the output order?  Or are we sure that the output is sorted by the
order of input files, and that is guaranteed to hold in the future?

If we are depending on certain ordering of the output produced by
gettext suite of programs, I would keep the option, regardless of
what we do to the input to them, if I were running the i18n part of
this project.

But I am not, so I would not complain if --sort-by-file is dropped
against my advice ;-)
Ævar Arnfjörð Bjarmason May 26, 2022, 6:25 p.m. UTC | #8
On Thu, May 26 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I pointed out then that with --sort-by-file added we:
>>
>>  * Don't group the translations by C/SH/Perl anymore
>>  * Change the sort order within files, to be line/sorted instead of
>>    line/order (i.e. first occurring translations first)
>>
>> I suggested then to just use $(sort) on the respective lists.
>>
>> So why not just:
>>
>>  1. Switch to the $(FOUND_C_SOURCES) (good)
>>  2. Filter that by C/Perl/SH as before (just a simple $(filter)
>>  3. $(sort) that (which as noted, also de-dupes it)
>>
>> Then we don't have any of the behavior change of --sort-by-file, and we
>> don't have to carefully curate the ls-files/find commands to not include
>> duplicates (although as seen here that seems to have been a useful
>> canary in the "find" case).
>
> Does "--sort-by-file" really mean that?
>
> The option is documented to sort output by file location, but does
> it mean without the option (i.e. default), there is no guarantee in
> the output order?  Or are we sure that the output is sorted by the
> order of input files, and that is guaranteed to hold in the future?
>
> If we are depending on certain ordering of the output produced by
> gettext suite of programs, I would keep the option, regardless of
> what we do to the input to them, if I were running the i18n part of
> this project.
>
> But I am not, so I would not complain if --sort-by-file is dropped
> against my advice ;-)

The gettext docs are pretty light on the subject, but the default "sort
order" is none at all. I.e. it'll just inhale source and spew out
translations in the order you feed them to xgettext.

So in order of input files, and then in order they're seen in the
program.

I don't think that's ever going to change.

The --sort-output and and --sort-by-file then re-sort that.

AFAICT the --sort-by-file could be more accurately named
--sort-by-file-then-line-number-then-by-msgid.

I.e. the semantics seem to be to sort by file, then emit messages in the
order they appear in the file *by line*, but within a line act as though
--sort-output was given. I don't know if that's intentional or not.
Junio C Hamano May 26, 2022, 7 p.m. UTC | #9
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Does "--sort-by-file" really mean that?
>>
>> The option is documented to sort output by file location, but does
>> it mean without the option (i.e. default), there is no guarantee in
>> the output order?  Or are we sure that the output is sorted by the
>> order of input files, and that is guaranteed to hold in the future?
>>
>> If we are depending on certain ordering of the output produced by
>> gettext suite of programs, I would keep the option, regardless of
>> what we do to the input to them, if I were running the i18n part of
>> this project.
>>
>> But I am not, so I would not complain if --sort-by-file is dropped
>> against my advice ;-)
>
> The gettext docs are pretty light on the subject, but the default "sort
> order" is none at all. I.e. it'll just inhale source and spew out
> translations in the order you feed them to xgettext.
>
> So in order of input files, and then in order they're seen in the
> program.
>
> I don't think that's ever going to change.

OK, so as long as make's notion of $(sort) and gettext suite's
notion of --sort-by-file are the same, we didn't make any change,
and even if they were different, since there is no version of Git
that uses "--sort-by-file" while preparing the po and pot files, it
still is OK.  As long as make's $(sort) is as stable as gettext
suite's "--sort-by-file" across developer locales (and our filenames
are ascii-only and hopefully will stay that way), everybody will get
the messages in the same order either way (or we would have the same
problem so switching from --sort-by-file to $(sort) is not making
anything worse).
Ævar Arnfjörð Bjarmason May 26, 2022, 7:17 p.m. UTC | #10
On Thu, May 26 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>> Does "--sort-by-file" really mean that?
>>>
>>> The option is documented to sort output by file location, but does
>>> it mean without the option (i.e. default), there is no guarantee in
>>> the output order?  Or are we sure that the output is sorted by the
>>> order of input files, and that is guaranteed to hold in the future?
>>>
>>> If we are depending on certain ordering of the output produced by
>>> gettext suite of programs, I would keep the option, regardless of
>>> what we do to the input to them, if I were running the i18n part of
>>> this project.
>>>
>>> But I am not, so I would not complain if --sort-by-file is dropped
>>> against my advice ;-)
>>
>> The gettext docs are pretty light on the subject, but the default "sort
>> order" is none at all. I.e. it'll just inhale source and spew out
>> translations in the order you feed them to xgettext.
>>
>> So in order of input files, and then in order they're seen in the
>> program.
>>
>> I don't think that's ever going to change.
>
> OK, so as long as make's notion of $(sort) and gettext suite's
> notion of --sort-by-file are the same.

They're not, I mean $(sort) and xgettext's *default* behavior are the
same, but the --sort-by-file is not only a sort by file, it also affects
intra-line-number sorting, although that's an admittedly obscure case
(we have <10 messages where it matters, I think).

> , we didn't make any change, and even if they were different, since
> there is no version of Git that uses "--sort-by-file" while preparing
> the po and pot files, it still is OK.

Well, it would be OK in any case, this is just how we prepare the
git.pot file, but it does affect diff churn eventually in the *.po
files.

> As long as make's $(sort) is as stable as gettext
> suite's "--sort-by-file" across developer locales (and our filenames
> are ascii-only and hopefully will stay that way), everybody will get
> the messages in the same order either way (or we would have the same
> problem so switching from --sort-by-file to $(sort) is not making
> anything worse).

FWIW we can't in the general case rely on the Makefile's $(sort) working
the same way across our platforms, but I don't think it matters in this
case. IIRC there's versions of one Windows setup or another (this was
via CI) where "-" at least (the purely ASCII one) sorts differently.

I ran into that in some Makefile experiments where I wanted to use
$(sort) to assert that our various hardcoded lists were in sorted order
already.
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index a3f8446de9..a8880ed482 100644
--- a/Makefile
+++ b/Makefile
@@ -846,7 +846,7 @@  generated-hdrs: $(GENERATED_H)
 ## Exhaustive lists of our source files, either dynamically generated,
 ## or hardcoded.
 SOURCES_CMD = ( \
-	git ls-files \
+	git ls-files --deduplicate \
 		'*.[hcS]' \
 		'*.sh' \
 		':!*[tp][0-9][0-9][0-9][0-9]*' \