Makefile: add and use the ".DELETE_ON_ERROR" flag

Message ID	patch-1.1-9420448e74f-20210622T141100Z-avarab@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>, Felipe Contreras <felipe.contreras@gmail.com>, =?utf-8?b?w4Z2YXIgQXJuZmo=?= =?utf-8?b?w7Zyw7AgQmphcm1hc29u?= <avarab@gmail.com> Subject: [PATCH] Makefile: add and use the ".DELETE_ON_ERROR" flag Date: Tue, 22 Jun 2021 16:13:13 +0200 Message-Id: <patch-1.1-9420448e74f-20210622T141100Z-avarab@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Makefile: add and use the ".DELETE_ON_ERROR" flag \| expand Makefile: add and use the ".DELETE_ON_ERROR" flag

Ævar Arnfjörð Bjarmason June 22, 2021, 2:13 p.m. UTC

Use the GNU make ".DELETE_ON_ERROR" flag in our main Makefile, as we
already do in the Documentation/Makefile since db10fc6c09f (doc:
simplify Makefile using .DELETE_ON_ERROR, 2021-05-21).

Now if a command to make X fails X will be removed, the default
behavior of GNU make is to only do so if "make" itself is interrupted
with a signal.

E.g. if we now intentionally break one of the rules with:

    -       mv $@+ $@
    +       mv $@+ $@ && \
    +       false

We'll get output like:

    $ make git
        CC git.o
        LINK git
    make: *** [Makefile:2179: git] Error 1
    make: *** Deleting file 'git'
    $ file git
    git: cannot open `git' (No such file or directory)

Before this change we'd leave the file in place in under this
scenario.

As in db10fc6c09f this allows us to remove patterns of removing
leftover $@ files at the start of rules, since previous failing runs
of the Makefile won't have left those littered around anymore.

I'm not as confident that we should be replacing the "mv $@+ $@"
pattern entirely, since that means that external programs or one of
our other Makefiles might race and get partial content.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

Extracted from an earlier series of mine[1] which was tangled up with
wider use of the "stuff >$@+ && mv $@+ $@" pattern, which caused that
series not to go forward.

1. https://lore.kernel.org/git/cover-0.6-00000000000-20210329T161723Z-avarab@gmail.com/

 Makefile | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

Taylor Blau June 22, 2021, 3:27 p.m. UTC | #1

On Tue, Jun 22, 2021 at 04:13:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Use the GNU make ".DELETE_ON_ERROR" flag in our main Makefile, as we
> already do in the Documentation/Makefile since db10fc6c09f (doc:
> simplify Makefile using .DELETE_ON_ERROR, 2021-05-21).

This all looks quite reasonable to me. I would be happy to see us use
.DELETE_ON_ERROR instead of the "write to a temporary file and the move
it into place" pattern, but my only reservation is nicely summarized by
Peff in [1].

I do think that .DELETE_ON_ERROR is less robust when running "make" in
one terminal and inspecting the result in another, but I'm also not sure
how much we should be concerned about that. On the other hand, we lose
a nice property of our existing Makefile which is that you can always
run ./git and get a working binary. The new state is that you might see
a half-written binary.

That makes me a little sad, but it does leave us with a much cleaner
Makefile as a result. So, I'm not really sure how to feel about it. I
think in general I would be happy overall to see this picked up.

[1]: https://lore.kernel.org/git/YGQdpkHAcFR%2FzNOx@coredump.intra.peff.net/

> Before this change we'd leave the file in place in under this
> scenario.
>
> As in db10fc6c09f this allows us to remove patterns of removing
> leftover $@ files at the start of rules, since previous failing runs
> of the Makefile won't have left those littered around anymore.
>
> I'm not as confident that we should be replacing the "mv $@+ $@"
> pattern entirely, since that means that external programs or one of
> our other Makefiles might race and get partial content.

I agree with all of this. Everything below looks good, too.

    Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

Ævar Arnfjörð Bjarmason June 22, 2021, 5:34 p.m. UTC | #2

On Tue, Jun 22 2021, Taylor Blau wrote:

> On Tue, Jun 22, 2021 at 04:13:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Use the GNU make ".DELETE_ON_ERROR" flag in our main Makefile, as we
>> already do in the Documentation/Makefile since db10fc6c09f (doc:
>> simplify Makefile using .DELETE_ON_ERROR, 2021-05-21).
>
> This all looks quite reasonable to me. I would be happy to see us use
> .DELETE_ON_ERROR instead of the "write to a temporary file and the move
> it into place" pattern, but my only reservation is nicely summarized by
> Peff in [1].
>
> I do think that .DELETE_ON_ERROR is less robust when running "make" in
> one terminal and inspecting the result in another, but I'm also not sure
> how much we should be concerned about that. On the other hand, we lose
> a nice property of our existing Makefile which is that you can always
> run ./git and get a working binary. The new state is that you might see
> a half-written binary.

I think that here you're responding not to this patch but something more
like what Felipe used DELETE_ON_ERROR in db10fc6c09f (doc: simplify
Makefile using .DELETE_ON_ERROR, 2021-05-21).

These are all part of "mv $@+ $@" patterns, so the change is:

 1) We don't rm $@ at the beginning, so the time you'll have your
    working binary is extended. There's no point now where it disappears
    as the rule is midway through running.

 2) If it dies midway and we don't get to the "mv" part the error isn't
    persistent, your halfway written "foo" gets removed by make itself.

I don't think the way Felipe used it in his patch is an unambiguous
improvement, it would need to be some combination of reverted/adjusted
if we went for the "anything you make must always have a 100% working
copy" general approach in:
https://lore.kernel.org/git/cover-0.6-00000000000-20210329T161723Z-avarab@gmail.com/

But as discussed below I think Junio wants to actively not have fixes in
that area, so the point is moot.

> That makes me a little sad, but it does leave us with a much cleaner
> Makefile as a result. So, I'm not really sure how to feel about it. I
> think in general I would be happy overall to see this picked up.
>
> [1]: https://lore.kernel.org/git/YGQdpkHAcFR%2FzNOx@coredump.intra.peff.net/

Yes, it makes me sad too, but as noted above I think you're right about
the general case, and so is Jeff in that E-Mail you linked, but it
doesn't apply to these patches, or my earlier patches.

I'd like us to always have a working binary, but from my understanding
of Jeff and Junio's position on it it's something they'd like to
actively prevent, see the discussion around my earlier series.

I.e. from what I gather they view this "your thing is clobbered as it
builds" as a feature. I.e. it serves to throw a monkey wrench into any
use of git that may overlap with said build, and they think that's a
feature.

So my reading of that thread is that they wouldn't agree that's a "nice
property", but something we should if anything more actively prevent,
say by having a global lock around our "make" invocations that git
programs started from the build directory would detect and BUG() on.

Whereas I think we don't have any practical problem with that, and
insisting that we can't fix that "nice property" makes improving actual
test failures that matter on AIX so tedious that I mostly stopped doing
it as a result.

Jeff King June 22, 2021, 7:17 p.m. UTC | #3

On Tue, Jun 22, 2021 at 07:34:13PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > That makes me a little sad, but it does leave us with a much cleaner
> > Makefile as a result. So, I'm not really sure how to feel about it. I
> > think in general I would be happy overall to see this picked up.
> >
> > [1]: https://lore.kernel.org/git/YGQdpkHAcFR%2FzNOx@coredump.intra.peff.net/
> 
> Yes, it makes me sad too, but as noted above I think you're right about
> the general case, and so is Jeff in that E-Mail you linked, but it
> doesn't apply to these patches, or my earlier patches.
> 
> I'd like us to always have a working binary, but from my understanding
> of Jeff and Junio's position on it it's something they'd like to
> actively prevent, see the discussion around my earlier series.
> 
> I.e. from what I gather they view this "your thing is clobbered as it
> builds" as a feature. I.e. it serves to throw a monkey wrench into any
> use of git that may overlap with said build, and they think that's a
> feature.

Just to be clear, I would be happy to drop the "oops, the tests barf if
you recompile halfway through" feature away if it made things more
robust overall (i.e., if we always did an atomic rename-into-place). I
just consider it the fact that we do clobber to be an accidental feature
that is not really worth "fixing". But if we care about "oops, make was
interrupted and now you have a stale build artifact with a bogus
timestamp" type of robustness, and "the tests barf" goes away as a side
effect, I won't complain.

I'd like it a lot more if we didn't have to add "mv $@+ $@" to every
rule to get there. In some other projects I've worked on, compilation
happens with a script, like:

  %.o: %.c
	./compile $@

and then that "compile" script is generated by the Makefile, and encodes
all of the dependencies of what's in $(CC), $(CFLAGS), and so on (we'd
probably have an update-if-changed rule like we do for GIT-CFLAGS).

And it also becomes an easy single spot to do that kind of "let's
replaced the output atomically" trick.

That's a pretty big departure from our current Makefile style, though.
And I don't feel like it buys us a lot. Having a pretty generic and
typical Makefile is nice for people coming to the project (I have
noticed that most people are not well versed in "make" arcana).

-Peff

Felipe Contreras June 23, 2021, 7:01 p.m. UTC | #4

Ævar Arnfjörð Bjarmason wrote:
> As in db10fc6c09f this allows us to remove patterns of removing
> leftover $@ files at the start of rules, since previous failing runs
> of the Makefile won't have left those littered around anymore.
> 
> I'm not as confident that we should be replacing the "mv $@+ $@"
> pattern entirely, since that means that external programs or one of
> our other Makefiles might race and get partial content.

The reason I did it in db10fc6c09 is because both asciidoctor and
asciidoc should deal with temporary files by themselves (like gcc). If
you interrupt the build nothing gets generated.

However, other scripts like build-docdep.perl would indeed generate
partial output.

In my opinion it's the scripts themselves that should be fixed, and not
the Makefile, *if* we care about this at all.

Felipe Contreras June 23, 2021, 7:09 p.m. UTC | #5

Taylor Blau wrote:

> I do think that .DELETE_ON_ERROR is less robust when running "make" in
> one terminal and inspecting the result in another, but I'm also not sure
> how much we should be concerned about that. On the other hand, we lose
> a nice property of our existing Makefile which is that you can always
> run ./git and get a working binary. The new state is that you might see
> a half-written binary.

How would that happen? First, the git target isn't changed, and second,
gcc (and presumably other compilers/linkers) wouldn't write the binary
if interrupted.

Felipe Contreras June 23, 2021, 7:15 p.m. UTC | #6

Ævar Arnfjörð Bjarmason wrote:

> I don't think the way Felipe used it in his patch is an unambiguous
> improvement, it would need to be some combination of reverted/adjusted
> if we went for the "anything you make must always have a 100% working
> copy" general approach in:
> https://lore.kernel.org/git/cover-0.6-00000000000-20210329T161723Z-avarab@gmail.com/

My view is that it's the tool itself (gcc, asciidoctor, asciidoc) that
shouldn't write the output if interrupted, and generally that's the
case.

If we do want out scripts to not generate output if interrupted, that
should be fixed in the script itself.

Felipe Contreras June 23, 2021, 7:21 p.m. UTC | #7

Ævar Arnfjörð Bjarmason wrote:
> @@ -2243,7 +2253,6 @@ SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
>  	$(gitwebdir_SQ):$(PERL_PATH_SQ):$(SANE_TEXT_GREP):$(PAGER_ENV):\
>  	$(perllibdir_SQ)
>  define cmd_munge_script
> -$(RM) $@ $@+ && \
>  sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
>      -e 's|@SHELL_PATH@|$(SHELL_PATH_SQ)|' \
>      -e 's|@@DIFF@@|$(DIFF_SQ)|' \

Any reason why the same isn't done for the $(BUILT_INS) target?

> @@ -2514,7 +2522,6 @@ endif
>  ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
>  all:: compile_commands.json
>  compile_commands.json:
> -	@$(RM) $@
>  	$(QUIET_GEN)sed -e '1s/^/[/' -e '$$s/,$$/]/' $(compdb_dir)/*.o.json > $@+
>  	@if test -s $@+; then mv $@+ $@; else $(RM) $@+; fi
>  endif

What about these?

  $(REMOTE_CURL_ALIASES):
  $(LIB_FILE):
  $(XDIFF_LIB): 
  $(ETAGS_TARGET):
  tags:
  cscope:

Ævar Arnfjörð Bjarmason June 23, 2021, 7:45 p.m. UTC | #8

On Wed, Jun 23 2021, Felipe Contreras wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> As in db10fc6c09f this allows us to remove patterns of removing
>> leftover $@ files at the start of rules, since previous failing runs
>> of the Makefile won't have left those littered around anymore.
>> 
>> I'm not as confident that we should be replacing the "mv $@+ $@"
>> pattern entirely, since that means that external programs or one of
>> our other Makefiles might race and get partial content.
>
> The reason I did it in db10fc6c09 is because both asciidoctor and
> asciidoc should deal with temporary files by themselves (like gcc). If
> you interrupt the build nothing gets generated.

If you interrupt the build default make behavior without
.DELETE_ON_ERROR kicks in.

My gcc 8.3.0 just does an unlink()/openat(..., O_RDWR|O_CREAT|O_TRUNC)
dance followed by chmod() when I do e.g.:

    gcc -o main main.c

So no in-place atomic renaming, does yours do something different?

But yes, some tools do this themselves, I think in general it's less
annoying to deal with it yourself in a case like git's, because if they
do it their idea of an in-tree tempfile may not jive with your
.gitignore, so you'll racily see ghost files during build, or those
files getting left behind if the tool hard dies.

> However, other scripts like build-docdep.perl would indeed generate
> partial output.
>
> In my opinion it's the scripts themselves that should be fixed, and not
> the Makefile, *if* we care about this at all.

I don't think default tool/make/*nix semantics are broken, I just think
it's neat to do that rename dance yourself, it's a cheap way to
guarantee that we always have working tools for use by other concurrent
scripts.

Many build systems or modes of running them don't care about that
use-case.

Ævar Arnfjörð Bjarmason June 23, 2021, 7:54 p.m. UTC | #9

On Tue, Jun 22 2021, Jeff King wrote:

> On Tue, Jun 22, 2021 at 07:34:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > That makes me a little sad, but it does leave us with a much cleaner
>> > Makefile as a result. So, I'm not really sure how to feel about it. I
>> > think in general I would be happy overall to see this picked up.
>> >
>> > [1]: https://lore.kernel.org/git/YGQdpkHAcFR%2FzNOx@coredump.intra.peff.net/
>> 
>> Yes, it makes me sad too, but as noted above I think you're right about
>> the general case, and so is Jeff in that E-Mail you linked, but it
>> doesn't apply to these patches, or my earlier patches.
>> 
>> I'd like us to always have a working binary, but from my understanding
>> of Jeff and Junio's position on it it's something they'd like to
>> actively prevent, see the discussion around my earlier series.
>> 
>> I.e. from what I gather they view this "your thing is clobbered as it
>> builds" as a feature. I.e. it serves to throw a monkey wrench into any
>> use of git that may overlap with said build, and they think that's a
>> feature.
>
> Just to be clear, I would be happy to drop the "oops, the tests barf if
> you recompile halfway through" feature away if it made things more
> robust overall (i.e., if we always did an atomic rename-into-place). I
> just consider it the fact that we do clobber to be an accidental feature
> that is not really worth "fixing". But if we care about "oops, make was
> interrupted and now you have a stale build artifact with a bogus
> timestamp" type of robustness, and "the tests barf" goes away as a side
> effect, I won't complain.

..and "this behavior is really annoying on one platform we target, and
the fix is rather trivial".

> I'd like it a lot more if we didn't have to add "mv $@+ $@" to every
> rule to get there. In some other projects I've worked on, compilation
> happens with a script, like:
>
>   %.o: %.c
> 	./compile $@

I'd think that supporting e.g. "-o" in the middle of an argument list in
such a tool would be more annoying than on the order of a dozen
callsites I needed to add this to in the linked series.

But yes, we could do it in some helper script too; I actually wrote one
that does almost that a while ago for a related use-case, simplifying
the "use cmp(1) and replace if different" we have copy/pasted in various
places.

> and then that "compile" script is generated by the Makefile, and encodes
> all of the dependencies of what's in $(CC), $(CFLAGS), and so on (we'd
> probably have an update-if-changed rule like we do for GIT-CFLAGS).
>
> And it also becomes an easy single spot to do that kind of "let's
> replaced the output atomically" trick.
>
> That's a pretty big departure from our current Makefile style, though.
> And I don't feel like it buys us a lot. Having a pretty generic and
> typical Makefile is nice for people coming to the project (I have
> noticed that most people are not well versed in "make" arcana).

I still think just doing "&& mv $@+ $@" is the simplest in this case, we
already have that in a dozen places in the Makefile, I wanted to add it
to a dozen or so more.

It's a common pattern already, I'd think if anything applying it
uniformly would make things easier to read, even if we didn't get more
portability & the ability to run stuff concurrently when you have "make"
active as bonus.

Ævar Arnfjörð Bjarmason June 23, 2021, 7:59 p.m. UTC | #10

On Wed, Jun 23 2021, Felipe Contreras wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> @@ -2243,7 +2253,6 @@ SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
>>  	$(gitwebdir_SQ):$(PERL_PATH_SQ):$(SANE_TEXT_GREP):$(PAGER_ENV):\
>>  	$(perllibdir_SQ)
>>  define cmd_munge_script
>> -$(RM) $@ $@+ && \
>>  sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
>>      -e 's|@SHELL_PATH@|$(SHELL_PATH_SQ)|' \
>>      -e 's|@@DIFF@@|$(DIFF_SQ)|' \
>
> Any reason why the same isn't done for the $(BUILT_INS) target?
>
>> @@ -2514,7 +2522,6 @@ endif
>>  ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
>>  all:: compile_commands.json
>>  compile_commands.json:
>> -	@$(RM) $@
>>  	$(QUIET_GEN)sed -e '1s/^/[/' -e '$$s/,$$/]/' $(compdb_dir)/*.o.json > $@+
>>  	@if test -s $@+; then mv $@+ $@; else $(RM) $@+; fi
>>  endif
>
> What about these?
>
>   $(REMOTE_CURL_ALIASES):

Uses a chain of ln/ln -s/cp, would need to add "-f" flags.

I've got another series I'm planning to (re-)submit to fix those ln/ln
-s/cp patterns.

>   $(LIB_FILE):

Can we rely on $(AR) happily clobbering things everywhere? Not knowing
is why I skipped it.

>   $(XDIFF_LIB): 

ditto.

>   $(ETAGS_TARGET):
>   tags:
>   cscope:

Addressed in the related:
https://lore.kernel.org/git/YNH+zsXDnRsT3uvZ@nand.local/T/#t

But yeah, the implicit point of "note that in the commit message" is
taken.

Felipe Contreras June 23, 2021, 8:32 p.m. UTC | #11

Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Jun 23 2021, Felipe Contreras wrote:
> 
> > Ævar Arnfjörð Bjarmason wrote:
> >> As in db10fc6c09f this allows us to remove patterns of removing
> >> leftover $@ files at the start of rules, since previous failing runs
> >> of the Makefile won't have left those littered around anymore.
> >> 
> >> I'm not as confident that we should be replacing the "mv $@+ $@"
> >> pattern entirely, since that means that external programs or one of
> >> our other Makefiles might race and get partial content.
> >
> > The reason I did it in db10fc6c09 is because both asciidoctor and
> > asciidoc should deal with temporary files by themselves (like gcc). If
> > you interrupt the build nothing gets generated.
> 
> If you interrupt the build default make behavior without
> .DELETE_ON_ERROR kicks in.

Generally yes, but it's possible the program traps the interrupt signal,
in which case make never receives it.

> My gcc 8.3.0 just does an unlink()/openat(..., O_RDWR|O_CREAT|O_TRUNC)
> dance followed by chmod() when I do e.g.:
> 
>     gcc -o main main.c
> 
> So no in-place atomic renaming, does yours do something different?

It doesn't rename the file, but if interrupted the file is unlinked.

> > However, other scripts like build-docdep.perl would indeed generate
> > partial output.
> >
> > In my opinion it's the scripts themselves that should be fixed, and not
> > the Makefile, *if* we care about this at all.
> 
> I don't think default tool/make/*nix semantics are broken, I just think
> it's neat to do that rename dance yourself, it's a cheap way to
> guarantee that we always have working tools for use by other concurrent
> scripts.

It is cheap in the sense that it doesn't cost the computer much, but it
makes the code less maintenable and harder to read.

To me it's a layering violation. If the tool is already dealing with
interrupted builds, and on top of that make is doing the same, not only
for interrupted builds but also failures, then it makes little sense to
add even more safeties on top of that in the Makefile.

If this was really an important feature, it should be part of make
itself, or ninja, or whatever.

IMO the whole point of DELETE_ON_ERROR is to avoid everyone doing the
exact same dance in their Makefiles.

Cheers.

Felipe Contreras June 23, 2021, 8:52 p.m. UTC | #12

Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Jun 23 2021, Felipe Contreras wrote:
> 
> > Ævar Arnfjörð Bjarmason wrote:
> >> @@ -2243,7 +2253,6 @@ SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
> >>  	$(gitwebdir_SQ):$(PERL_PATH_SQ):$(SANE_TEXT_GREP):$(PAGER_ENV):\
> >>  	$(perllibdir_SQ)
> >>  define cmd_munge_script
> >> -$(RM) $@ $@+ && \
> >>  sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
> >>      -e 's|@SHELL_PATH@|$(SHELL_PATH_SQ)|' \
> >>      -e 's|@@DIFF@@|$(DIFF_SQ)|' \
> >
> > Any reason why the same isn't done for the $(BUILT_INS) target?
> >
> >> @@ -2514,7 +2522,6 @@ endif
> >>  ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
> >>  all:: compile_commands.json
> >>  compile_commands.json:
> >> -	@$(RM) $@
> >>  	$(QUIET_GEN)sed -e '1s/^/[/' -e '$$s/,$$/]/' $(compdb_dir)/*.o.json > $@+
> >>  	@if test -s $@+; then mv $@+ $@; else $(RM) $@+; fi
> >>  endif
> >
> > What about these?
> >
> >   $(REMOTE_CURL_ALIASES):
> 
> Uses a chain of ln/ln -s/cp, would need to add "-f" flags.

Why? Isn't "x && a || b || c" the same as "a || b || c" if x is always true?

> >   $(LIB_FILE):
> 
> Can we rely on $(AR) happily clobbering things everywhere? Not knowing
> is why I skipped it.

We have c (create) in ARFLAGS, so presumably yes.

> >   $(ETAGS_TARGET):
> >   tags:
> >   cscope:
> 
> Addressed in the related:
> https://lore.kernel.org/git/YNH+zsXDnRsT3uvZ@nand.local/T/#t

I think ideally this patch should remove the $(RM) and the other patch
should focus on the rest of the changes, but given the difficulty of
landing chained patch series in git I understand the decision to clump
them together.

Cheers.

Jeff King June 23, 2021, 10:21 p.m. UTC | #13

On Wed, Jun 23, 2021 at 09:54:06PM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> > Just to be clear, I would be happy to drop the "oops, the tests barf if
> > you recompile halfway through" feature away if it made things more
> > robust overall (i.e., if we always did an atomic rename-into-place). I
> > just consider it the fact that we do clobber to be an accidental feature
> > that is not really worth "fixing". But if we care about "oops, make was
> > interrupted and now you have a stale build artifact with a bogus
> > timestamp" type of robustness, and "the tests barf" goes away as a side
> > effect, I won't complain.
> 
> ..and "this behavior is really annoying on one platform we target, and
> the fix is rather trivial".

Yeah, that's a fine reason, too. I'm not entirely clear on what the
problem is, though, or why this is the best solution (I expect you
probably explained it in an earlier thread/series, but if so it went in
one ear and out the other on my end).

> > That's a pretty big departure from our current Makefile style, though.
> > And I don't feel like it buys us a lot. Having a pretty generic and
> > typical Makefile is nice for people coming to the project (I have
> > noticed that most people are not well versed in "make" arcana).
> 
> I still think just doing "&& mv $@+ $@" is the simplest in this case, we
> already have that in a dozen places in the Makefile, I wanted to add it
> to a dozen or so more.
> 
> It's a common pattern already, I'd think if anything applying it
> uniformly would make things easier to read, even if we didn't get more
> portability & the ability to run stuff concurrently when you have "make"
> active as bonus.

Yeah, and I'm OK with that direction, too.

-Peff

Ævar Arnfjörð Bjarmason June 24, 2021, 1:53 p.m. UTC | #14

On Wed, Jun 23 2021, Jeff King wrote:

> On Wed, Jun 23, 2021 at 09:54:06PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> 
>> > Just to be clear, I would be happy to drop the "oops, the tests barf if
>> > you recompile halfway through" feature away if it made things more
>> > robust overall (i.e., if we always did an atomic rename-into-place). I
>> > just consider it the fact that we do clobber to be an accidental feature
>> > that is not really worth "fixing". But if we care about "oops, make was
>> > interrupted and now you have a stale build artifact with a bogus
>> > timestamp" type of robustness, and "the tests barf" goes away as a side
>> > effect, I won't complain.
>> 
>> ..and "this behavior is really annoying on one platform we target, and
>> the fix is rather trivial".
>
> Yeah, that's a fine reason, too. I'm not entirely clear on what the
> problem is, though, or why this is the best solution (I expect you
> probably explained it in an earlier thread/series, but if so it went in
> one ear and out the other on my end).

On *nix systems you can open a file, and unlink() it in another process,
on Windows this is an error.

AIX has its own variant of this annoying behavior, you can't clobber (or
open for writing) binaries that are currently being run, you can unlink
and rename them though.

So without that "mv $@ $@+" trick you can't recompile if you have a
concurrent test (that you don't care about failing) running, and we have
bugs in our tests where e.g. "git-daemon" gets lost and won't get killed
on that platform, so you can't recompile and test without tracking it
down and killing it.

Jeff King June 24, 2021, 2:49 p.m. UTC | #15

On Thu, Jun 24, 2021 at 03:53:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> ..and "this behavior is really annoying on one platform we target, and
> >> the fix is rather trivial".
> >
> > Yeah, that's a fine reason, too. I'm not entirely clear on what the
> > problem is, though, or why this is the best solution (I expect you
> > probably explained it in an earlier thread/series, but if so it went in
> > one ear and out the other on my end).
> 
> On *nix systems you can open a file, and unlink() it in another process,
> on Windows this is an error.
> 
> AIX has its own variant of this annoying behavior, you can't clobber (or
> open for writing) binaries that are currently being run, you can unlink
> and rename them though.

Ah, right. Thanks for refreshing me.

TBH, I don't find this that serious a problem. Your compile will fail.
But is rebuilding in the middle of a test run something it's important
to support seamlessly? It seems like a bad practice in the first place.

It would likewise be a problem if you were running regular git commands
straight out of your build directory. And we do support that, but IMHO
it is not that important a use case.

So again, I'm not all that opposed to atomic rename-into-place
generation. But the use case doesn't seem important to me.

> So without that "mv $@ $@+" trick you can't recompile if you have a
> concurrent test (that you don't care about failing) running, and we have
> bugs in our tests where e.g. "git-daemon" gets lost and won't get killed
> on that platform, so you can't recompile and test without tracking it
> down and killing it.

The "git-daemon" thing sounds like a separate bug that is maybe
exacerbating things. But we'd want to fix it anyway, since even without
blocking compilation, it will cause a re-run of the tests to use the
wrong version (and either fail, or hit the auto-skip behavior). I've run
into this with apache hanging around after tests were killed (we do try
to clean up, but depending how the script is killed, that may or may not
succeed).

-Peff

Ævar Arnfjörð Bjarmason June 25, 2021, 9:49 a.m. UTC | #16

On Thu, Jun 24 2021, Jeff King wrote:

> On Thu, Jun 24, 2021 at 03:53:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> >> ..and "this behavior is really annoying on one platform we target, and
>> >> the fix is rather trivial".
>> >
>> > Yeah, that's a fine reason, too. I'm not entirely clear on what the
>> > problem is, though, or why this is the best solution (I expect you
>> > probably explained it in an earlier thread/series, but if so it went in
>> > one ear and out the other on my end).
>> 
>> On *nix systems you can open a file, and unlink() it in another process,
>> on Windows this is an error.
>> 
>> AIX has its own variant of this annoying behavior, you can't clobber (or
>> open for writing) binaries that are currently being run, you can unlink
>> and rename them though.
>
> Ah, right. Thanks for refreshing me.
>
> TBH, I don't find this that serious a problem. Your compile will fail.
> But is rebuilding in the middle of a test run something it's important
> to support seamlessly? It seems like a bad practice in the first place.

Yeah I think so, and I think it's good practice, it enabled development
workflows that you and Junio clearly don't use (which is fine), but
which are useful to others. For me it would result in more total
testing, and earlier catching of bugs, not less.

Quoting an earlier mail of yours[1]:

    I think having the test suite loudly complain is a good way to
    remind you that you have not in fact run the whole suite on a given
    build.

It's useful as you're programming you save/compile, and have the tests
run in a loop in the background, and are alerted if they start
failing.

That's not really possible with git currently without having that loop
continually push to another workdir, making it work in one checkout
helps for some workflows.

Yes it could allow you to run format-patch and send-email while you're
50% through a full loop, or not just run the full tests then, but at
some point I think we've got to assume some basic competency from
people. We also have CI running the full tests, and I could have just
run tests 0000-5000, compiled, and then run 5001-9999.

As a mechanism to prevent this it's not even reliable, it won't always
prevent this due to races, you'd need to e.g. issue a "git version" at
the start of a run, then "Bail Out!" if you detect it being different at
the end.

> It would likewise be a problem if you were running regular git commands
> straight out of your build directory. And we do support that, but IMHO
> it is not that important a use case.
>
> So again, I'm not all that opposed to atomic rename-into-place
> generation. But the use case doesn't seem important to me.

I guess because we use computers differently. I often have say a full
test run in one window, see a failure scroll by, re-make in another
window, test in a third, it's annoying to have to go back & forth and
stop/start things. I typically run the "main" one as a while-loop.

>> So without that "mv $@ $@+" trick you can't recompile if you have a
>> concurrent test (that you don't care about failing) running, and we have
>> bugs in our tests where e.g. "git-daemon" gets lost and won't get killed
>> on that platform, so you can't recompile and test without tracking it
>> down and killing it.
>
> The "git-daemon" thing sounds like a separate bug that is maybe
> exacerbating things. But we'd want to fix it anyway, since even without
> blocking compilation, it will cause a re-run of the tests to use the
> wrong version (and either fail, or hit the auto-skip behavior). I've run
> into this with apache hanging around after tests were killed (we do try
> to clean up, but depending how the script is killed, that may or may not
> succeed).

Yes, it's a bug that should be fixed. Right now if I try to login and
fix it (and numerous other bugs) my second full test run is guaranteed
to be impeded by having to track down and kill things.

The in-place-move works around that perfectly, so as a chicken-and-egg
problem of getting to a point where it's not so annoying to fix things
that I give up I suggested this rather trivial "&& mv $@+ $@" change was
something worth carrying.

1. https://lore.kernel.org/git/YEZsON5OxUiDkqPG@coredump.intra.peff.net/

Jeff King June 29, 2021, 2:26 a.m. UTC | #17

On Fri, Jun 25, 2021 at 11:49:14AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Ah, right. Thanks for refreshing me.
> >
> > TBH, I don't find this that serious a problem. Your compile will fail.
> > But is rebuilding in the middle of a test run something it's important
> > to support seamlessly? It seems like a bad practice in the first place.
> 
> Yeah I think so, and I think it's good practice, it enabled development
> workflows that you and Junio clearly don't use (which is fine), but
> which are useful to others. For me it would result in more total
> testing, and earlier catching of bugs, not less.
> 
> Quoting an earlier mail of yours[1]:
> 
>     I think having the test suite loudly complain is a good way to
>     remind you that you have not in fact run the whole suite on a given
>     build.
> 
> It's useful as you're programming you save/compile, and have the tests
> run in a loop in the background, and are alerted if they start
> failing.
> 
> That's not really possible with git currently without having that loop
> continually push to another workdir, making it work in one checkout
> helps for some workflows.

I actually _do_ have a workflow like this, but as you might guess, it
involves a separate workdir. I have a loop going in one terminal waiting
for new commits, which it then checks out on a detached HEAD and tests.

So I'm definitely sympathetic to this kind of continuous testing. My
questioning was really about doing it in a hacky way where you're not
actually sure what's been tested and what hasn't. If you're not willing
to test just committed states, then it gets a lot harder (OTOH, I'd
expect a ton of spurious failures there as you have half-finished
states). But if you are, then I think making sure you've tested each
commit fully has huge value, because then you can cache the results.

I'm not sure if the "only test committed states" thing is a deal-breaker
for you or not. If not, the script I use is at:

  https://github.com/peff/git/blob/meta/ci

which is really just a glorified infinite loop with some inotify
hackery. It relies on this program to do the caching:

  https://github.com/mhagger/git-test

A nice extra thing you can do is use the same cache with "git rebase -x"
as a final check of each patch in a series before you send it out. If
the CI loop was running continuously in the background, then it's just a
noop of "yes, we already tested this".

> Yes it could allow you to run format-patch and send-email while you're
> 50% through a full loop, or not just run the full tests then, but at
> some point I think we've got to assume some basic competency from
> people. We also have CI running the full tests, and I could have just
> run tests 0000-5000, compiled, and then run 5001-9999.

Yeah, I can see the view that running the test suite as a basic sanity
check may have value, if it's backed by more careful testing later (and
certainly while I'm developing, I wouldn't hesitate to run a subset of
the test suite to see how my work is progressing).

My main point was that I don't see much reason to do work to make that
kind of continuous "make test" work with simultaneous recompiles and
test-runs, if we can encourage people to do it more robustly with a
single compile-and-test-run loop. Maybe adding in the extra workdir
there makes it too heavy-weight? (Certainly my "ci" script is overkill,
but it seems like a loop of "reset to the current branch tip, compile,
run" in a worktree would be the minimal thing).

-Peff

Junio C Hamano June 29, 2021, 6:19 a.m. UTC | #18

Jeff King <peff@peff.net> writes:

> Yeah, I can see the view that running the test suite as a basic sanity
> check may have value, if it's backed by more careful testing later (and
> certainly while I'm developing, I wouldn't hesitate to run a subset of
> the test suite to see how my work is progressing).
>
> My main point was that I don't see much reason to do work to make that
> kind of continuous "make test" work with simultaneous recompiles and
> test-runs, if we can encourage people to do it more robustly with a
> single compile-and-test-run loop. Maybe adding in the extra workdir
> there makes it too heavy-weight? (Certainly my "ci" script is overkill,
> but it seems like a loop of "reset to the current branch tip, compile,
> run" in a worktree would be the minimal thing).

I actually do use such a "runs tests in the background while I am
not watching", so I am sympathetic to the higher-level goal, but I
find any execution of the idea that requires "let's reduce the
window where freshly built 'git' or any other things are not ready
by forcing 'mv $@+ $@' trick for added atomicity" simply insane and
not worth supporting.

Tests are run to find cases where things go wrong, and it is a waste
of cycles if that background task is not being run in isolation and
on a stable state.  A separate working tree is so easy to set up
these days, I do not see a point in complicating the build procedure
to avoid using it.

Ævar Arnfjörð Bjarmason June 29, 2021, 7:29 a.m. UTC | #19

On Wed, Jun 23 2021, Felipe Contreras wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Jun 23 2021, Felipe Contreras wrote:
>> 
>> > Ævar Arnfjörð Bjarmason wrote:
>> >> As in db10fc6c09f this allows us to remove patterns of removing
>> >> leftover $@ files at the start of rules, since previous failing runs
>> >> of the Makefile won't have left those littered around anymore.
>> >> 
>> >> I'm not as confident that we should be replacing the "mv $@+ $@"
>> >> pattern entirely, since that means that external programs or one of
>> >> our other Makefiles might race and get partial content.
>> >
>> > The reason I did it in db10fc6c09 is because both asciidoctor and
>> > asciidoc should deal with temporary files by themselves (like gcc). If
>> > you interrupt the build nothing gets generated.
>> 
>> If you interrupt the build default make behavior without
>> .DELETE_ON_ERROR kicks in.
>
> Generally yes, but it's possible the program traps the interrupt signal,
> in which case make never receives it.

Okey, so by "should deal with [it]" you meant that would be ideal, not
that it's something they're doing now. I misunderstood you there.

>> My gcc 8.3.0 just does an unlink()/openat(..., O_RDWR|O_CREAT|O_TRUNC)
>> dance followed by chmod() when I do e.g.:
>> 
>>     gcc -o main main.c
>> 
>> So no in-place atomic renaming, does yours do something different?
>
> It doesn't rename the file, but if interrupted the file is unlinked.

Right, and with .DELETE_ON_ERROR that "interrupted" is extended to
"interrupted, or errors", but bringing this discussion around that's why
I was confident in replacing the "rm" pattern at the start (which really
is 100% replaced by .DELETE_ON_ERROR), but not the "mv" at the end
(which isn't, and is an orthagonal feature).

>> > However, other scripts like build-docdep.perl would indeed generate
>> > partial output.
>> >
>> > In my opinion it's the scripts themselves that should be fixed, and not
>> > the Makefile, *if* we care about this at all.
>> 
>> I don't think default tool/make/*nix semantics are broken, I just think
>> it's neat to do that rename dance yourself, it's a cheap way to
>> guarantee that we always have working tools for use by other concurrent
>> scripts.
>
> It is cheap in the sense that it doesn't cost the computer much, but it
> makes the code less maintenable and harder to read.
>
> To me it's a layering violation. If the tool is already dealing with
> interrupted builds, and on top of that make is doing the same, not only
> for interrupted builds but also failures, then it makes little sense to
> add even more safeties on top of that in the Makefile.

I agree for interrupted builds, but we're talking about
in-place-renaming, which is orthogonal.

> If this was really an important feature, it should be part of make
> itself, or ninja, or whatever.
>
> IMO the whole point of DELETE_ON_ERROR is to avoid everyone doing the
> exact same dance in their Makefiles.

I agree it would be an interesting make feature, but something pretty
far from what it's doing now.

In general "make" has been intentionally sloppy about this sort of
thing. When you make a file "foo" it doesn't enforce that you fsync it
either, or that if it's being created the directory it's inserted into
is fsync'd.

In a POSIXly-strict sense it can't assume that it can operate properly
without those things happening, but in practice modern OS's deal with it
just fine, so "make" leaves that to the rule itself.

It would be nice to have a make feature to e.g. have individual rules
say "I emit on stdout, put it into $@ for me", then it could in-place
rename, fsync, display progress through "pv(1)" or whatever.

Ævar Arnfjörð Bjarmason June 29, 2021, 7:39 a.m. UTC | #20

On Mon, Jun 28 2021, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
>> Yeah, I can see the view that running the test suite as a basic sanity
>> check may have value, if it's backed by more careful testing later (and
>> certainly while I'm developing, I wouldn't hesitate to run a subset of
>> the test suite to see how my work is progressing).
>>
>> My main point was that I don't see much reason to do work to make that
>> kind of continuous "make test" work with simultaneous recompiles and
>> test-runs, if we can encourage people to do it more robustly with a
>> single compile-and-test-run loop. Maybe adding in the extra workdir
>> there makes it too heavy-weight? (Certainly my "ci" script is overkill,
>> but it seems like a loop of "reset to the current branch tip, compile,
>> run" in a worktree would be the minimal thing).
>
> I actually do use such a "runs tests in the background while I am
> not watching", so I am sympathetic to the higher-level goal, but I
> find any execution of the idea that requires "let's reduce the
> window where freshly built 'git' or any other things are not ready
> by forcing 'mv $@+ $@' trick for added atomicity" simply insane and
> not worth supporting.

Do you think upgrading git on your system without having to stop the
world is worth supporting?

Ultimately they're the same problem, and I had some patches in the works
to make "make install" work like that, and wanted to eventually make the
normal compilation use the same helper(s).

Ensuring that tests don't fail either due to re-compilation is also a
nice way to dogfood/smoketest if the installer is keeping that promise.

> Tests are run to find cases where things go wrong, and it is a waste
> of cycles if that background task is not being run in isolation and
> on a stable state.

Sure, at this point it's clear you won't take the patch. Just a note
that this reply addresses 1/2 reasons I wanted this, i.e. not the AIX
FS/behavior portability issue.

>  A separate working tree is so easy to set up these days,[...]

I also test git on e.g. gcc farm boxes where I run out of disk space if
I have a .git, a checkout directory with compiled files, and add a
second checkout directory with compiled files, and on others where a
compile/test cycle takes 30-40 minutes.

If I had to do compilation twice things would slow to a crawl, and no,
I'm not going to try to install ccache or whatever on some
$OBSCURE_PLATFORM/$ANCIENT_OS/$ODD_TOOLCHAIN.

> I do not see a point in complicating the build procedure to avoid
> using it.

I'd really understand your and Jeff's concerns if I was proposing some
really complex workaround, but it's just extending & making consistent
the "mv" dance we already use for 1/2 our rules already.

Even if you don't care about the end result or making git easier to hack
on for people who don't share your setup, I'd think that making those
rules consistent across the board makes things less complex, not more.

Anyway, let's not discussed this forever. We're clearly getting
nowhere. Just for the record I'm quite miffed about the bar for "I don't
care about this area/platform/use-case, but this person actively sending
me patches in the area says it's helpful to send more patches" is so
low.

For comparison we have >1000 lines of CMake duplicating the entire
Makefile now, all just to make things easier on Windows. It doesn't even
work on *nix, so when the CI breaks because I updated the Makefile I
need to push to some Windows box on GitHub and twiddle my thumbs hoping
it'll pass this time around.

Maybe that's all worth it, and I'd be willing to take the Windows devs
at their word that dealing with the make dependency was really *that*
painful. But compare that to carrying a few lines of "mv $@+ $@" to, I
daresay, make the same or larger relative improvement on AIX.

1. https://lore.kernel.org/git/cover-0.6-00000000000-20210329T161723Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason June 29, 2021, 8:17 a.m. UTC | #21

On Wed, Jun 23 2021, Felipe Contreras wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Jun 23 2021, Felipe Contreras wrote:
>> 
>> > Ævar Arnfjörð Bjarmason wrote:
>> >> @@ -2243,7 +2253,6 @@ SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
>> >>  	$(gitwebdir_SQ):$(PERL_PATH_SQ):$(SANE_TEXT_GREP):$(PAGER_ENV):\
>> >>  	$(perllibdir_SQ)
>> >>  define cmd_munge_script
>> >> -$(RM) $@ $@+ && \
>> >>  sed -e '1s|#!.*/sh|#!$(SHELL_PATH_SQ)|' \
>> >>      -e 's|@SHELL_PATH@|$(SHELL_PATH_SQ)|' \
>> >>      -e 's|@@DIFF@@|$(DIFF_SQ)|' \
>> >
>> > Any reason why the same isn't done for the $(BUILT_INS) target?
>> >
>> >> @@ -2514,7 +2522,6 @@ endif
>> >>  ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
>> >>  all:: compile_commands.json
>> >>  compile_commands.json:
>> >> -	@$(RM) $@
>> >>  	$(QUIET_GEN)sed -e '1s/^/[/' -e '$$s/,$$/]/' $(compdb_dir)/*.o.json > $@+
>> >>  	@if test -s $@+; then mv $@+ $@; else $(RM) $@+; fi
>> >>  endif
>> >
>> > What about these?
>> >
>> >   $(REMOTE_CURL_ALIASES):
>> 
>> Uses a chain of ln/ln -s/cp, would need to add "-f" flags.
>
> Why? Isn't "x && a || b || c" the same as "a || b || c" if x is always true?

It does:

    rm x &&
    ln y x || ln -s y x || cp y x

If you run that you'll get a hardlink the first time around, but the
second time around you'll fall back to the "cp" if you remove the "rm".

>> >   $(LIB_FILE):
>> 
>> Can we rely on $(AR) happily clobbering things everywhere? Not knowing
>> is why I skipped it.
>
> We have c (create) in ARFLAGS, so presumably yes.

Will change.

>> >   $(ETAGS_TARGET):
>> >   tags:
>> >   cscope:
>> 
>> Addressed in the related:
>> https://lore.kernel.org/git/YNH+zsXDnRsT3uvZ@nand.local/T/#t
>
> I think ideally this patch should remove the $(RM) and the other patch
> should focus on the rest of the changes, but given the difficulty of
> landing chained patch series in git I understand the decision to clump
> them together.
>
> Cheers.

Junio C Hamano June 29, 2021, 9:38 p.m. UTC | #22

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Even if you don't care about the end result or making git easier to hack
> on for people who don't share your setup, I'd think that making those
> rules consistent across the board makes things less complex, not more.

I think that it is where we disagree.  "Remove $@+ and $@, generate
into $@+ and then move it to $@" idiom has a legitimate reason to be
used, and I do not want to see it blindly used where there is no
reason to do so.  It misleads less experienced readers.

If we write a custom script that does not promise atomicity, it is a
quite natural pattern to use, and it is clear to readers what is
going on, especially when "generate into $@+" step is done via
redirection.  I despise "$(CC) -o $@+ && mv $@+ $@" because there
simply shouldn't be a need to do so---unlike our custom script that
sends its output to its standard output, $(CC) ought to know better.

But as you said elsewhere, the patch in question is *not* about
adding more use of "mv $@+ $@" in inappropriate contexts, so let's
stop discussing it now.  We take advantage of .DELETE_ON_ERROR that
allows us to use the upfront "rm -f $@ $@+" we do in the idiom,
which is a good thing.

Jeff King June 30, 2021, 2:23 a.m. UTC | #23

On Tue, Jun 29, 2021 at 09:39:26AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I do not see a point in complicating the build procedure to avoid
> > using it.
> 
> I'd really understand your and Jeff's concerns if I was proposing some
> really complex workaround, but it's just extending & making consistent
> the "mv" dance we already use for 1/2 our rules already.

Just to clarify my position: I'm not all that upset about adding more
uses of "mv", as I agree it's not a lot of code (and in general I like
making things atomic). Mostly I was trying to understand your use case,
and whether we could be encouraging a more robust workflow there.

And that's what I was prodding at in the last email: what's the reason
that doing it in a separate workdir doesn't work? I did find your
answers compelling that it takes more disk space, and you have to
compile everything twice (I do use ccache myself, but I agree that "oh,
just install ccache" is not helpful advice in the general case).

-Peff

Felipe Contreras July 1, 2021, 3:06 a.m. UTC | #24

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jun 23 2021, Felipe Contreras wrote:
> > Ævar Arnfjörð Bjarmason wrote:
> >> On Wed, Jun 23 2021, Felipe Contreras wrote:
> >> > Ævar Arnfjörð Bjarmason wrote:

> >> >> As in db10fc6c09f this allows us to remove patterns of removing
> >> >> leftover $@ files at the start of rules, since previous failing runs
> >> >> of the Makefile won't have left those littered around anymore.
> >> >> 
> >> >> I'm not as confident that we should be replacing the "mv $@+ $@"
> >> >> pattern entirely, since that means that external programs or one of
> >> >> our other Makefiles might race and get partial content.
> >> >
> >> > The reason I did it in db10fc6c09 is because both asciidoctor and
> >> > asciidoc should deal with temporary files by themselves (like gcc). If
> >> > you interrupt the build nothing gets generated.
> >> 
> >> If you interrupt the build default make behavior without
> >> .DELETE_ON_ERROR kicks in.
> >
> > Generally yes, but it's possible the program traps the interrupt signal,
> > in which case make never receives it.
> 
> Okey, so by "should deal with [it]" you meant that would be ideal, not
> that it's something they're doing now. I misunderstood you there.

It is tricky.

For example asciidoctor does trap interrupt signals, but deals with them
correctly. On the other hand asciidoc does not, but they clearly did
intent to, just did it wrong. I sent a pull request for asciidoc to fix
that [1], but so far no response.

So it's a mixture of both; ideally they should do it, and they kind of
do, but not all of them. Certainly git scripts do not. But they could.

> >> My gcc 8.3.0 just does an unlink()/openat(..., O_RDWR|O_CREAT|O_TRUNC)
> >> dance followed by chmod() when I do e.g.:
> >> 
> >>     gcc -o main main.c
> >> 
> >> So no in-place atomic renaming, does yours do something different?
> >
> > It doesn't rename the file, but if interrupted the file is unlinked.
> 
> Right, and with .DELETE_ON_ERROR that "interrupted" is extended to
> "interrupted, or errors", but bringing this discussion around that's why
> I was confident in replacing the "rm" pattern at the start (which really
> is 100% replaced by .DELETE_ON_ERROR), but not the "mv" at the end
> (which isn't, and is an orthagonal feature).

Depnds on what "the feature" is.

If the feature is not having lingering partial files on error, then gcc already
deals with that.

If the feature is never having partial files at all, then yeah, you need
the "mv" at the end, but as Jeff and Junio already pointed out: that
feature is of doubtful value.

I see value on .DELTE_ON_ERROR, not so much on never having partial
files. I have tried to imagine why anybody would want this, and I just
can't picture it, though that could be a failure of my imagination.

> >> > However, other scripts like build-docdep.perl would indeed generate
> >> > partial output.
> >> >
> >> > In my opinion it's the scripts themselves that should be fixed, and not
> >> > the Makefile, *if* we care about this at all.
> >> 
> >> I don't think default tool/make/*nix semantics are broken, I just think
> >> it's neat to do that rename dance yourself, it's a cheap way to
> >> guarantee that we always have working tools for use by other concurrent
> >> scripts.
> >
> > It is cheap in the sense that it doesn't cost the computer much, but it
> > makes the code less maintenable and harder to read.
> >
> > To me it's a layering violation. If the tool is already dealing with
> > interrupted builds, and on top of that make is doing the same, not only
> > for interrupted builds but also failures, then it makes little sense to
> > add even more safeties on top of that in the Makefile.
> 
> I agree for interrupted builds, but we're talking about
> in-place-renaming, which is orthogonal.

In-place-renaming is the means, the end-goal (I presume) is to never
have partial files.

Yes, it's orthogonal, but also I don't see the point.

> > If this was really an important feature, it should be part of make
> > itself, or ninja, or whatever.
> >
> > IMO the whole point of DELETE_ON_ERROR is to avoid everyone doing the
> > exact same dance in their Makefiles.
> 
> I agree it would be an interesting make feature, but something pretty
> far from what it's doing now.
> 
> In general "make" has been intentionally sloppy about this sort of
> thing. When you make a file "foo" it doesn't enforce that you fsync it
> either, or that if it's being created the directory it's inserted into
> is fsync'd.
> 
> In a POSIXly-strict sense it can't assume that it can operate properly
> without those things happening, but in practice modern OS's deal with it
> just fine, so "make" leaves that to the rule itself.
> 
> It would be nice to have a make feature to e.g. have individual rules
> say "I emit on stdout, put it into $@ for me", then it could in-place
> rename, fsync, display progress through "pv(1)" or whatever.

Perhaps. I still don't see why this is something important.

Either way a pattern I've seen lately in a lot of software is a
reluctance to modernize itself, and that results in other software
starting from scracth (GCC vs. LLVM, vim vs. neovim, and make vs. ninja).

If we are reaching the limit to what make can offer us--and plenty of
other projects are already using more modern alternatives--does it
really make much sense to focus on a small thing make can't offer us
natively and work around that?

Maybe it would make more sense to stop relying on make so much and
attempt to make other tools support this feature natively.

Cheers.

[1] https://github.com/asciidoc-py/asciidoc-py/pull/195

Felipe Contreras July 1, 2021, 3:19 a.m. UTC | #25

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jun 23 2021, Felipe Contreras wrote:

> >> > What about these?
> >> >
> >> >   $(REMOTE_CURL_ALIASES):
> >> 
> >> Uses a chain of ln/ln -s/cp, would need to add "-f" flags.
> >
> > Why? Isn't "x && a || b || c" the same as "a || b || c" if x is always true?
> 
> It does:
> 
>     rm x &&
>     ln y x || ln -s y x || cp y x
> 
> If you run that you'll get a hardlink the first time around, but the
> second time around you'll fall back to the "cp" if you remove the "rm".

Right. We'll need to do ln -f, and god knows if that's portable.

Felipe Contreras July 1, 2021, 3:54 a.m. UTC | #26

Ævar Arnfjörð Bjarmason wrote:
> On Mon, Jun 28 2021, Junio C Hamano wrote:

> > I do not see a point in complicating the build procedure to avoid
> > using it.
> 
> I'd really understand your and Jeff's concerns if I was proposing some
> really complex workaround, but it's just extending & making consistent
> the "mv" dance we already use for 1/2 our rules already.

I'm not entirely sure what's going on here. We have agreed that
.DELETE_ON_ERROR and the "mv" dance are orthogonal. So the patch to use
.DELETE_ON_ERROR can move forward, while the "mv" dance can be discussed
later.

Like Junio and Jeff, I don't see much value in the "mv" dance, but that
doesn't mean I want it gone. On the contrary, I would to try a scenario
in which it's usefull.

But that is *orthogonal*. Leave that for another discussion.

> Even if you don't care about the end result or making git easier to hack
> on for people who don't share your setup,

I don't know about Junio, I do want to make git easier to hack for
people that don't share my setup, but I would like to know what that
setup is.

> I'd think that making those rules consistent across the board makes
> things less complex, not more.

I don't agree with that. Consistency is just one of the many factors we
have to consider. Even if 90% of instances in the documentation said
"fast forward", that doesn't necessarily mean we should convert the
remaining 10% away from "fast-foward".

First we need to decide what is the end-goal we want to reach, and then
we can go for consistency.

But again, this is orthogonal to this patch, isn't it?

> Anyway, let's not discussed this forever. We're clearly getting
> nowhere. Just for the record I'm quite miffed about the bar for "I don't
> care about this area/platform/use-case, but this person actively sending
> me patches in the area says it's helpful to send more patches" is so
> low.

I don't think it's quite like that. Skepticism doesn't mean disapproval.

I for one are skeptic of the possitive value of the "mv" dance, but I
wouldn't be surprised in the least if you showed the value in 4 lines of
code. I just haven't seen them yet.

Once again... That's orthogonal to this patch.

> Maybe that's all worth it, and I'd be willing to take the Windows devs
> at their word that dealing with the make dependency was really *that*
> painful. But compare that to carrying a few lines of "mv $@+ $@" to, I
> daresay, make the same or larger relative improvement on AIX.

Oh I don't trust them at all. I did maintain some Windows installers for
years, and with a couple of tricks I had no problem building them with
plain Makefiles, with much more complex dependencies.

I'm fairly certain I could make git build for Windows with plain
Makefiles... But one controversy at a time.

Cheers.

Ævar Arnfjörð Bjarmason July 1, 2021, 1:34 p.m. UTC | #27

On Wed, Jun 30 2021, Felipe Contreras wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Jun 28 2021, Junio C Hamano wrote:
>
>> > I do not see a point in complicating the build procedure to avoid
>> > using it.
>> 
>> I'd really understand your and Jeff's concerns if I was proposing some
>> really complex workaround, but it's just extending & making consistent
>> the "mv" dance we already use for 1/2 our rules already.
>
> I'm not entirely sure what's going on here. We have agreed that
> .DELETE_ON_ERROR and the "mv" dance are orthogonal. So the patch to use
> .DELETE_ON_ERROR can move forward, while the "mv" dance can be discussed
> later.
>
> Like Junio and Jeff, I don't see much value in the "mv" dance, but that
> doesn't mean I want it gone. On the contrary, I would to try a scenario
> in which it's usefull.
>
> But that is *orthogonal*. Leave that for another discussion.

Yes, this whole sub-thread is just a side-discussion about a
change-not-in-this-series, which started out as a reference to a larger
series I carved this more narrow change from.

>> Even if you don't care about the end result or making git easier to hack
>> on for people who don't share your setup,
>
> I don't know about Junio, I do want to make git easier to hack for
> people that don't share my setup, but I would like to know what that
> setup is.

I think all of this is covered in detail upthread.

>> I'd think that making those rules consistent across the board makes
>> things less complex, not more.
>
> I don't agree with that. Consistency is just one of the many factors we
> have to consider. Even if 90% of instances in the documentation said
> "fast forward", that doesn't necessarily mean we should convert the
> remaining 10% away from "fast-foward".
>
> First we need to decide what is the end-goal we want to reach, and then
> we can go for consistency.
>
> But again, this is orthogonal to this patch, isn't it?

*nod*. I think for build rules it's easier to reason about them if all
of them e.g. do "$(RM) $@" at the start followed by "mv $@ $@+" at the
end, than wondering if the differences are accidental or intentional (in
most cases they're just a historical accident).q

>> Anyway, let's not discussed this forever. We're clearly getting
>> nowhere. Just for the record I'm quite miffed about the bar for "I don't
>> care about this area/platform/use-case, but this person actively sending
>> me patches in the area says it's helpful to send more patches" is so
>> low.
>
> I don't think it's quite like that. Skepticism doesn't mean disapproval.
>
> I for one are skeptic of the possitive value of the "mv" dance, but I
> wouldn't be surprised in the least if you showed the value in 4 lines of
> code. I just haven't seen them yet.
>
> Once again... That's orthogonal to this patch.

*Nod*, as noted covered upthread.

>> Maybe that's all worth it, and I'd be willing to take the Windows devs
>> at their word that dealing with the make dependency was really *that*
>> painful. But compare that to carrying a few lines of "mv $@+ $@" to, I
>> daresay, make the same or larger relative improvement on AIX.
>
> Oh I don't trust them at all. I did maintain some Windows installers for
> years, and with a couple of tricks I had no problem building them with
> plain Makefiles, with much more complex dependencies.
>
> I'm fairly certain I could make git build for Windows with plain
> Makefiles... But one controversy at a time.

Yeah, I think (from memory of reading the relevant threads) it's some
combinatin of "the dependency is large & painful" and "it's a bit
slower". I've found it hard in the past to get accurate estimates of
what's "slow" from our resident Windows maintainer:) Per:
https://lore.kernel.org/git/875z1lz6wl.fsf@evledraar.gmail.com/

Felipe Contreras July 3, 2021, 12:41 a.m. UTC | #28

Ævar Arnfjörð Bjarmason wrote:
> On Wed, Jun 30 2021, Felipe Contreras wrote:
> > Ævar Arnfjörð Bjarmason wrote:
> 
> >> Even if you don't care about the end result or making git easier to hack
> >> on for people who don't share your setup,
> >
> > I don't know about Junio, I do want to make git easier to hack for
> > people that don't share my setup, but I would like to know what that
> > setup is.
> 
> I think all of this is covered in detail upthread.

From [1] I understand some systems have a problem clobbering a binary
that is being run. So if you are running a test that is using a binary
that you are rebuilding at the same time, you get an error.

OK.

I still don't see why anyone would want to rebuild the binary in the
middle of running tests. The result of the tests is only meaningful for
a particular build. This is what I don't get. I get that you want to do
this, what I don't get is *why*.

In order to be able to rebuild _and_ keep running the tests for a
certain build an out-of-source build is needed. Modern build systems
like CMake and Meson do these kinds of builds by default. In fact, Meson
doesn't support anything else.

With an in-source build such as in git, I first stop the tests, and
then rebuild, and that's what I would expect everyone else to do. I
don't think it has been explained why anybody wouldn't.

Cheers.

[1] https://lore.kernel.org/git/871r8r1hwe.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason July 3, 2021, 12:31 p.m. UTC | #29

On Fri, Jul 02 2021, Felipe Contreras wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Jun 30 2021, Felipe Contreras wrote:
>> > Ævar Arnfjörð Bjarmason wrote:
>> 
>> >> Even if you don't care about the end result or making git easier to hack
>> >> on for people who don't share your setup,
>> >
>> > I don't know about Junio, I do want to make git easier to hack for
>> > people that don't share my setup, but I would like to know what that
>> > setup is.
>> 
>> I think all of this is covered in detail upthread.
>
> From [1] I understand some systems have a problem clobbering a binary
> that is being run. So if you are running a test that is using a binary
> that you are rebuilding at the same time, you get an error.
>
> OK.
>
> I still don't see why anyone would want to rebuild the binary in the
> middle of running tests. The result of the tests is only meaningful for
> a particular build. This is what I don't get. I get that you want to do
> this, what I don't get is *why*.

This is mostly covered upthread & in the linked thread(s), but as
summary / elaboration:

 1. Running the tests on some of these machines takes 30-45 minutes. A
    common use-case is firing off a test run to see how bright the
    dumpster fire is that day, noticing an early failure, and inspecting
    it manually.

    Then while the rest of the full run is underway I'd like to
    re-compile and e.g. add some printf debugging guarded by a getenv()
    to some isolated code I'm poking, it's nice if the full test run
    isn't invalidated by that.

    Keep in mind that it takes 30-45 minutes because it's *slooooooow*,
    so "just use another workdir" isn't a viable option. I'm also going
    to wait 10-20 minutes for another full recompile in that workdir
    (and now the concurrent test run takes more than an hour).

 2. We have bugs in the test suite that e.g. leave orphaned git-daemon
    background processes on these platforms.

    Yes that should be fixed, but fixing it is annoying when you can't
    even recompile and run other (even more broken) tests due to the bug
    you're trying to fix.

 3. You're assuming that the only thing I might want to use the built
    git for is the tests.

    E.g. I might (and do) also clone some other repo to debug something
    else, if that one-off clone is running in one terminal I can't
    recompile git until it's finished.

    (Most of the boxes on the GCC farm have a /usr/bin/git, but some
    have e.g. version 1.8-something of git, so to do anything usefully
    modern like worktrees I'll need to bootstrap my own git anyway).

 4. I think you/Junio/Jeff (although maybe less so in Jeff's case) are
    taking this axiom that thou shalt not recompile during tests as an
    absolute.

    I just don't agree with that in general. E.g. if I'm hacking on some
    object.c changes and fire of a test run then sure, that's a general
    enough component that I'd like a full test run. The failure might
    (and often is) be in some obscure edge case in a test I won't expect
    to fail.

    But while that run is taking the better part of an hour maybe I'll
    fix a small bug in git-send-email, recompile, and run t/t9001*.sh
    while the main test run is underway.

    I'd be completely confident in submitting such a fix to
    git-send-email, even though I've never run the full test suite with
    it. It's simply not the case that all the code we develop is so
    generally used that all of it requires a full test suite run.

    I usually I do a full run anyway, but for something like a
    portability fix on an obscure platform on the GCC farm. No, often I
    don't run the full test suite, if I've assured myself that I've
    tested the code in question thoroughly by some other means
    (e.g. running the subset of tests I know stress it meaningfully).

I think you've also said something to the effect that the 3rd party tool
should be the thing doing the in-place-rename if needed, fair
enough.

But claiming that it's both an external implementation detail (so it
could do an in-place rename, or not), and also maintaining that we can't
do in-place rename ourselves because doing so would enable bad thing XYZ
to happen (i.e. this concurrent test thing), seems like a case of
wanting to have your cake and eat it too.

I.e. surely it's one or the other. If it's so important that we use this
behavior as a proxy in case someone is so mistaken as to re-build git
during tests we should be replacing things like:

	cc -o $@ $<

With:

	cc -o $@+ $< &&
	cat $@+ >$@

Just in case the "cc" is doing my proposed version of:

	cc -o $@+ $< &&
	mv $@+ $@

behind our backs.

Felipe Contreras July 3, 2021, 6:42 p.m. UTC | #30

Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Jul 02 2021, Felipe Contreras wrote:
> 
> > Ævar Arnfjörð Bjarmason wrote:
> >> On Wed, Jun 30 2021, Felipe Contreras wrote:
> >> > Ævar Arnfjörð Bjarmason wrote:
> >> 
> >> >> Even if you don't care about the end result or making git easier to hack
> >> >> on for people who don't share your setup,
> >> >
> >> > I don't know about Junio, I do want to make git easier to hack for
> >> > people that don't share my setup, but I would like to know what that
> >> > setup is.
> >> 
> >> I think all of this is covered in detail upthread.
> >
> > From [1] I understand some systems have a problem clobbering a binary
> > that is being run. So if you are running a test that is using a binary
> > that you are rebuilding at the same time, you get an error.
> >
> > OK.
> >
> > I still don't see why anyone would want to rebuild the binary in the
> > middle of running tests. The result of the tests is only meaningful for
> > a particular build. This is what I don't get. I get that you want to do
> > this, what I don't get is *why*.
> 
> This is mostly covered upthread & in the linked thread(s), but as
> summary / elaboration:
> 
>  1. Running the tests on some of these machines takes 30-45 minutes. A
>     common use-case is firing off a test run to see how bright the
>     dumpster fire is that day, noticing an early failure, and inspecting
>     it manually.
> 
>     Then while the rest of the full run is underway I'd like to
>     re-compile and e.g. add some printf debugging guarded by a getenv()
>     to some isolated code I'm poking, it's nice if the full test run
>     isn't invalidated by that.
> 
>     Keep in mind that it takes 30-45 minutes because it's *slooooooow*,
>     so "just use another workdir" isn't a viable option. I'm also going
>     to wait 10-20 minutes for another full recompile in that workdir
>     (and now the concurrent test run takes more than an hour).

OK. If you are careful enough that makes sense.

>  2. We have bugs in the test suite that e.g. leave orphaned git-daemon
>     background processes on these platforms.
> 
>     Yes that should be fixed, but fixing it is annoying when you can't
>     even recompile and run other (even more broken) tests due to the bug
>     you're trying to fix.

Yeah, that's a separate issue.

>  3. You're assuming that the only thing I might want to use the built
>     git for is the tests.

Not really.

>  4. I think you/Junio/Jeff (although maybe less so in Jeff's case) are
>     taking this axiom that thou shalt not recompile during tests as an
>     absolute.

Just like in language I'm not a prescriptivist in workflows either. The
fact that I don't recompile during tests doesn't mean I would presume to
dictate to others what they should do.

You know more about your setup than me.

> I think you've also said something to the effect that the 3rd party tool
> should be the thing doing the in-place-rename if needed, fair
> enough.
> 
> But claiming that it's both an external implementation detail (so it
> could do an in-place rename, or not), and also maintaining that we can't
> do in-place rename ourselves because doing so would enable bad thing XYZ
> to happen (i.e. this concurrent test thing), seems like a case of
> wanting to have your cake and eat it too.

I never claimed we can't do in-place rename ourselves, I only said that
I did not see the point. And to be clear the fact that I don't see it
doesn't mean it isn't here.


Now I see why you want this and I suppose for this particular case only
it does make sense to do the renaming. But it still seems like a wart to
me. If the build system supported out-of-tree builds there would be no
need for us to do this manually in the Makefile, correct?

But yeah, for now I suppose there's no better alternative.

Makefile: add and use the ".DELETE_ON_ERROR" flag

Commit Message

Comments

Patch