diff mbox

libxl: fix incremental parallel build

Message ID 59A6800202000078001755C4@prv-mh.provo.novell.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Beulich Aug. 30, 2017, 7:06 a.m. UTC
Short-circuit absolute paths of generated headers to their pathless
equivalents, to avoid two racing invocations of the same rule producing
them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This wants to be backported to 4.9 and 4.8.

Comments

Wei Liu Sept. 1, 2017, 3:28 p.m. UTC | #1
On Wed, Aug 30, 2017 at 01:06:10AM -0600, Jan Beulich wrote:
> Short-circuit absolute paths of generated headers to their pathless
> equivalents, to avoid two racing invocations of the same rule producing
> them.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> This wants to be backported to 4.9 and 4.8.
> 
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -264,6 +264,13 @@ $(LIBXL_OBJS) $(LIBXL_TEST_OBJS) $(LIBXL
>  	$(TEST_PROG_OBJS) $(SAVE_HELPER_OBJS): libxl.h
>  $(LIBXL_OBJS) $(LIBXL_TEST_OBJS): libxl_internal.h
>  
> +# libacpi sources (living in another subdirectory) will have their inclusions
> +# of headers from this directory recorded as absolute paths in the .*.o.d
> +# files.  In order to not invoke the subsequent rule twice (and perhaps in a
> +# racing way when doing a parallel build), short-circuit the absolute paths to
> +# the local ones first.
> +$(CURDIR)/_libxl_type%.h: _libxl_type%.h ;

I don't quite get this. I normally build with -j8 and never saw a race.

Do you mean parallel build in which two makes enter libxl? Is that
possible?

Why does libacpi matter? All dependencies files (*.o.d) should be local
to libxl anyway.
Jan Beulich Sept. 1, 2017, 3:35 p.m. UTC | #2
>>> On 01.09.17 at 17:28, <wei.liu2@citrix.com> wrote:
> On Wed, Aug 30, 2017 at 01:06:10AM -0600, Jan Beulich wrote:
>> Short-circuit absolute paths of generated headers to their pathless
>> equivalents, to avoid two racing invocations of the same rule producing
>> them.
>> 
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> This wants to be backported to 4.9 and 4.8.
>> 
>> --- a/tools/libxl/Makefile
>> +++ b/tools/libxl/Makefile
>> @@ -264,6 +264,13 @@ $(LIBXL_OBJS) $(LIBXL_TEST_OBJS) $(LIBXL
>>  	$(TEST_PROG_OBJS) $(SAVE_HELPER_OBJS): libxl.h
>>  $(LIBXL_OBJS) $(LIBXL_TEST_OBJS): libxl_internal.h
>>  
>> +# libacpi sources (living in another subdirectory) will have their 
> inclusions
>> +# of headers from this directory recorded as absolute paths in the .*.o.d
>> +# files.  In order to not invoke the subsequent rule twice (and perhaps in 
> a
>> +# racing way when doing a parallel build), short-circuit the absolute paths 
> to
>> +# the local ones first.
>> +$(CURDIR)/_libxl_type%.h: _libxl_type%.h ;
> 
> I don't quite get this. I normally build with -j8 and never saw a race.

I rarely do incremental tools builds, but each time I do them
(on 4.8 or later) I run into this issue.

> Do you mean parallel build in which two makes enter libxl? Is that
> possible?

No, only a single entry into that subtree.

> Why does libacpi matter? All dependencies files (*.o.d) should be local
> to libxl anyway.

Did you check? My .build.o.d has:

build.o:  \
 /build/xen/unstable-hg/2017-08-10/tools/libxl/../../tools/libacpi/build.c \
  /build/xen/unstable-hg/2017-08-10/tools/libxl/../../tools/config.h \
  /build/xen/unstable-hg/2017-08-10/tools/libxl/libxl_x86_acpi.h \
[...]
  _libxl_list.h \
  /build/xen/unstable-hg/2017-08-10/tools/libxl/_libxl_types.h \
  libxl_event.h libxl.h \
[...]

and it is this non-local _libxl_types.h dependency which breaks
things. I've noted this with gcc 4.3.x, in case that matters (e.g.
if newer compilers are smarter in how they write out deps).

Jan
Ian Jackson Sept. 1, 2017, 5:04 p.m. UTC | #3
Jan Beulich writes ("Re: [PATCH] libxl: fix incremental parallel build"):
> On 01.09.17 at 17:28, <wei.liu2@citrix.com> wrote:
> > Do you mean parallel build in which two makes enter libxl? Is that
> > possible?
> 
> No, only a single entry into that subtree.

Are you sure ?

> > Why does libacpi matter? All dependencies files (*.o.d) should be local
> > to libxl anyway.
> 
> Did you check? My .build.o.d has:

AFAICT you must mean tools/firmware/hvmloader/.build.o.d ?

> build.o:  \
>  /build/xen/unstable-hg/2017-08-10/tools/libxl/../../tools/libacpi/build.c \
>   /build/xen/unstable-hg/2017-08-10/tools/libxl/../../tools/config.h \
>   /build/xen/unstable-hg/2017-08-10/tools/libxl/libxl_x86_acpi.h \
> [...]
>   _libxl_list.h \
>   /build/xen/unstable-hg/2017-08-10/tools/libxl/_libxl_types.h \
>   libxl_event.h libxl.h \
> [...]
> 
> and it is this non-local _libxl_types.h dependency which breaks
> things. I've noted this with gcc 4.3.x, in case that matters (e.g.
> if newer compilers are smarter in how they write out deps).

This suggests that perhaps the problem is that something is reentering
libxl.

With recursive make, it is necessary for the overall structure of the
makefiles to sequence things so that each directory is entered exactly
once, before its dependent directories are entered.  (It is possible
to violate this rule without creating races but it is tricky and
inadvisable.)

Can you provide a complete log of a failing build ?

Thanks,
Ian.
Jan Beulich Sept. 4, 2017, 8:38 a.m. UTC | #4
>>> On 01.09.17 at 19:04, <ian.jackson@eu.citrix.com> wrote:
> Jan Beulich writes ("Re: [PATCH] libxl: fix incremental parallel build"):
>> On 01.09.17 at 17:28, <wei.liu2@citrix.com> wrote:
>> > Do you mean parallel build in which two makes enter libxl? Is that
>> > possible?
>> 
>> No, only a single entry into that subtree.
> 
> Are you sure ?

Yes.

>> > Why does libacpi matter? All dependencies files (*.o.d) should be local
>> > to libxl anyway.
>> 
>> Did you check? My .build.o.d has:
> 
> AFAICT you must mean tools/firmware/hvmloader/.build.o.d ?

No, I mean the libxl instance of the file (remember that the same
libacpi source file is being compiled twice). The hvmloader instance
looks similar, but doesn't cause the same problem (because there
are no auto-generated headers).

>> build.o:  \
>>  /build/xen/unstable-hg/2017-08-10/tools/libxl/../../tools/libacpi/build.c \
>>   /build/xen/unstable-hg/2017-08-10/tools/libxl/../../tools/config.h \
>>   /build/xen/unstable-hg/2017-08-10/tools/libxl/libxl_x86_acpi.h \
>> [...]
>>   _libxl_list.h \
>>   /build/xen/unstable-hg/2017-08-10/tools/libxl/_libxl_types.h \
>>   libxl_event.h libxl.h \
>> [...]
>> 
>> and it is this non-local _libxl_types.h dependency which breaks
>> things. I've noted this with gcc 4.3.x, in case that matters (e.g.
>> if newer compilers are smarter in how they write out deps).
> 
> This suggests that perhaps the problem is that something is reentering
> libxl.

Again, I've verified (multiple times) that this is not the case, as I
too did initially suspect this to be what is happening.

> With recursive make, it is necessary for the overall structure of the
> makefiles to sequence things so that each directory is entered exactly
> once, before its dependent directories are entered.  (It is possible
> to violate this rule without creating races but it is tricky and
> inadvisable.)

I understand that.

> Can you provide a complete log of a failing build ?

With the above, is that really needed? And if so, do you mean
just normal make output, or output from additionally passing -p.

Jan
Ian Jackson Sept. 4, 2017, 10:59 a.m. UTC | #5
Jan Beulich writes ("Re: [PATCH] libxl: fix incremental parallel build"):
> On 01.09.17 at 19:04, <ian.jackson@eu.citrix.com> wrote:
> > AFAICT you must mean tools/firmware/hvmloader/.build.o.d ?
> 
> No, I mean the libxl instance of the file (remember that the same
> libacpi source file is being compiled twice). The hvmloader instance
> looks similar, but doesn't cause the same problem (because there
> are no auto-generated headers).

I had an out of date tree somehow, so I didn't find the build.o.

> > Can you provide a complete log of a failing build ?
> 
> With the above, is that really needed? And if so, do you mean
> just normal make output, or output from additionally passing -p.

I still think it would be helpful.  But in the meantime I'm looking
again at this log of a successful build.

Ian.
Ian Jackson Sept. 4, 2017, 11:33 a.m. UTC | #6
I have investigated this further and have made the following
discoveries:

1. make distinguishes targets purely textually.  It will canonicalise a
  target name by removing ./ before comparison (so `foo' and `./foo'
  are considered the same target) but it won't examine the filesystem
  AFAICT.  So `foo' and `../../this/subdir/foo' are different targets.
  (You seem to have already observed or known this.)

2. If you use gcc -MMD to generate a .d file, and say -I. to include
  things from the cwd, but invoke gcc to compile a file elsewhere, eg
     gcc -MMD -I -o build.o /other/path/build.c
  and build.c #includes `foo.h', then the -MMD output mentions
  `/absolute/path/to/this/subdir/foo.h', even though it could refer
  to `foo.h'.  (VPATH and vpath are obvious ways for this to happen.)
  This is presumably because gcc has noticed that `.' in this context
  must mean relative to the invocation cwd, not relative to build.c.

3. If foo.h is autogenerated using some kind of pattern rule,
  you end up with two `copies' of the rule, which can run
  simultaneously, once for build.c -> build.o and once for normal
  files in the same cwd.

I think (2) is a bug in gcc.  But we can't sensibly expect to fix
that.  Sadly I think the right fix is to seddery the pointless paths
out of the .d files.

I don't think going round all our Makefiles adding rules of the form
   $(ABS_PATH_TO_HERE)/auto-%-generated.: auto-%-generated.h
is sensible.  There are too many and this is too ad-hoc.

Do people agree ?

Ian.
Jan Beulich Sept. 4, 2017, 12:51 p.m. UTC | #7
>>> On 04.09.17 at 13:33, <ian.jackson@eu.citrix.com> wrote:
> I have investigated this further and have made the following
> discoveries:
> 
> 1. make distinguishes targets purely textually.  It will canonicalise a
>   target name by removing ./ before comparison (so `foo' and `./foo'
>   are considered the same target) but it won't examine the filesystem
>   AFAICT.  So `foo' and `../../this/subdir/foo' are different targets.
>   (You seem to have already observed or known this.)
> 
> 2. If you use gcc -MMD to generate a .d file, and say -I. to include
>   things from the cwd, but invoke gcc to compile a file elsewhere, eg
>      gcc -MMD -I -o build.o /other/path/build.c
>   and build.c #includes `foo.h', then the -MMD output mentions
>   `/absolute/path/to/this/subdir/foo.h', even though it could refer
>   to `foo.h'.  (VPATH and vpath are obvious ways for this to happen.)
>   This is presumably because gcc has noticed that `.' in this context
>   must mean relative to the invocation cwd, not relative to build.c.
> 
> 3. If foo.h is autogenerated using some kind of pattern rule,
>   you end up with two `copies' of the rule, which can run
>   simultaneously, once for build.c -> build.o and once for normal
>   files in the same cwd.
> 
> I think (2) is a bug in gcc.  But we can't sensibly expect to fix
> that.  Sadly I think the right fix is to seddery the pointless paths
> out of the .d files.
> 
> I don't think going round all our Makefiles adding rules of the form
>    $(ABS_PATH_TO_HERE)/auto-%-generated.: auto-%-generated.h
> is sensible.  There are too many and this is too ad-hoc.

Well, the absolute paths in there are a problem only when compiling
files from a different subdirectory. Is that something that happens
in many places? I would agree to the sed approach you suggest if
this was a widespread problem, but I don't think it is (and hence I
went this other route).

Jan
Ian Jackson Sept. 4, 2017, 1:33 p.m. UTC | #8
Jan Beulich writes ("Re: [PATCH] libxl: fix incremental parallel build"):
> Well, the absolute paths in there are a problem only when compiling
> files from a different subdirectory. Is that something that happens
> in many places?

It seems to be not wholly uncommon in xen.git, primarily because of a
desire to build the same code for execution in userland and in the
hypervisor (or guest firmware images or similar).

So far we have

 mariner:xen.git> git-grep '^vpath' | wc -l
 8

and that clearly leaves some out because it contains none of the
hypervisor instances.

And my key point is that the need for a workaround is not the fault of
the autogenerated .h files.  Your proposed workaround needs adjusting
every time the set of autogenerated .h files (or the set of Makefiles
which make .h files) changes.

I would like to make a single change which will always work.  After
all we have no way to automatically verify that we don't have bugs of
this form, and when we do write a bug of this form it's a race bug
(and race bugs are horrible to find and debug).

Ian.
Jan Beulich Sept. 4, 2017, 2:36 p.m. UTC | #9
>>> On 04.09.17 at 15:33, <ian.jackson@eu.citrix.com> wrote:
> I would like to make a single change which will always work.  After
> all we have no way to automatically verify that we don't have bugs of
> this form, and when we do write a bug of this form it's a race bug
> (and race bugs are horrible to find and debug).

Well, okay - I take that you're intending to put together such a
change then in the not too distant future.

Jan
Ian Jackson Sept. 4, 2017, 2:39 p.m. UTC | #10
Jan Beulich writes ("Re: [PATCH] libxl: fix incremental parallel build"):
> On 04.09.17 at 15:33, <ian.jackson@eu.citrix.com> wrote:
> > I would like to make a single change which will always work.  After
> > all we have no way to automatically verify that we don't have bugs of
> > this form, and when we do write a bug of this form it's a race bug
> > (and race bugs are horrible to find and debug).
> 
> Well, okay - I take that you're intending to put together such a
> change then in the not too distant future.

Sure, I was indeed volunteering :-).

Ian.
diff mbox

Patch

--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -264,6 +264,13 @@  $(LIBXL_OBJS) $(LIBXL_TEST_OBJS) $(LIBXL
 	$(TEST_PROG_OBJS) $(SAVE_HELPER_OBJS): libxl.h
 $(LIBXL_OBJS) $(LIBXL_TEST_OBJS): libxl_internal.h
 
+# libacpi sources (living in another subdirectory) will have their inclusions
+# of headers from this directory recorded as absolute paths in the .*.o.d
+# files.  In order to not invoke the subsequent rule twice (and perhaps in a
+# racing way when doing a parallel build), short-circuit the absolute paths to
+# the local ones first.
+$(CURDIR)/_libxl_type%.h: _libxl_type%.h ;
+
 _libxl_type%.h _libxl_type%_json.h _libxl_type%_private.h _libxl_type%.c: libxl_type%.idl gentypes.py idl.py
 	$(eval stem = $(notdir $*))
 	$(PYTHON) gentypes.py libxl_type$(stem).idl __libxl_type$(stem).h __libxl_type$(stem)_private.h \