Message ID | 89b7e17469e19c9dca8afa729ec1a70f4e06a2b7.1568309119.git.liu.denton@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Makefile: run coccicheck on all non-upstream sources | expand |
Denton Liu <liu.denton@gmail.com> writes: > +FIND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) > +COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FIND_C_SOURCES)) The former is somewhat misnamed. FIND_SOURCE_FILES is *not* a list of source files---it is a procedure to list source files to its standard output. FIND_C_SOUCRES sounds as if it is a similar procedure, which would be implemented much like FIND_C_SOURCES = $(FIND_SOURCE_FILES) | sed -n -e '/\.c$/p' but that is not what you did and that is not what you want to have. Perhaps call it FOUND_C_SOURCES? I wonder if we can get rid of FIND_SOURCE_FILES that is a mere procedure and replace its use with a true list of source files. Would it make the result more pleasant to work with? Perhaps something like the attached patch, (which would come before this entire thing as a clean-up, and removing the need for 2/3)? I dunno. Using a procedure whose output is fed to xargs has an advantage that a platform with very short command line limit can still work with many source files, but the way you create and use COCCI_SOURCES in this patch would defeat that advantage anyway, so perhaps we can get away with an approach like this. Having a list of things in $(MAKE) variable has a longer-term benefit that we could exploit more parallelism if we wanted to, too. Makefile | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index f9255344ae..9dddd0e88c 100644 --- a/Makefile +++ b/Makefile @@ -2584,7 +2584,7 @@ perl/build/man/man3/Git.3pm: perl/Git.pm $(QUIET_GEN)mkdir -p $(dir $@) && \ pod2man $< $@ -FIND_SOURCE_FILES = ( \ +SOURCE_FILES = $(patsubst ./%,%,$(shell \ git ls-files \ '*.[hcS]' \ '*.sh' \ @@ -2599,19 +2599,19 @@ FIND_SOURCE_FILES = ( \ -o \( -name 'trash*' -type d -prune \) \ -o \( -name '*.[hcS]' -type f -print \) \ -o \( -name '*.sh' -type f -print \) \ - ) + )) $(ETAGS_TARGET): FORCE $(RM) $(ETAGS_TARGET) - $(FIND_SOURCE_FILES) | xargs etags -a -o $(ETAGS_TARGET) + etags -a -o $(ETAGS_TARGET) $(SOURCE_FILES) tags: FORCE $(RM) tags - $(FIND_SOURCE_FILES) | xargs ctags -a + ctags -a $(SOURCE_FILES) cscope: $(RM) cscope* - $(FIND_SOURCE_FILES) | xargs cscope -b + cscope -b $(SOURCE_FILES) ### Detect prefix changes TRACK_PREFIX = $(bindir_SQ):$(gitexecdir_SQ):$(template_dir_SQ):$(prefix_SQ):\
On Thu, Sep 12, 2019 at 11:40:36AM -0700, Junio C Hamano wrote: > Denton Liu <liu.denton@gmail.com> writes: > > > +FIND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) > > +COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FIND_C_SOURCES)) > > The former is somewhat misnamed. FIND_SOURCE_FILES is *not* a list > of source files---it is a procedure to list source files to its > standard output. FIND_C_SOUCRES sounds as if it is a similar > procedure, which would be implemented much like > > FIND_C_SOURCES = $(FIND_SOURCE_FILES) | sed -n -e '/\.c$/p' > > but that is not what you did and that is not what you want to have. > Perhaps call it FOUND_C_SOURCES? > > I wonder if we can get rid of FIND_SOURCE_FILES that is a mere > procedure and replace its use with a true list of source files. > Would it make the result more pleasant to work with? > > Perhaps something like the attached patch, (which would come before > this entire thing as a clean-up, and removing the need for 2/3)? > > I dunno. > > Using a procedure whose output is fed to xargs has an advantage that > a platform with very short command line limit can still work with > many source files, but the way you create and use COCCI_SOURCES in > this patch would defeat that advantage anyway, COCCI_SOURCES is only used as an input to 'xargs', so that advantage is not defeated. > so perhaps we can get > away with an approach like this. Having a list of things in $(MAKE) > variable has a longer-term benefit that we could exploit more > parallelism if we wanted to, too. > > Makefile | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/Makefile b/Makefile > index f9255344ae..9dddd0e88c 100644 > --- a/Makefile > +++ b/Makefile > @@ -2584,7 +2584,7 @@ perl/build/man/man3/Git.3pm: perl/Git.pm > $(QUIET_GEN)mkdir -p $(dir $@) && \ > pod2man $< $@ > > -FIND_SOURCE_FILES = ( \ > +SOURCE_FILES = $(patsubst ./%,%,$(shell \ > git ls-files \ > '*.[hcS]' \ > '*.sh' \ > @@ -2599,19 +2599,19 @@ FIND_SOURCE_FILES = ( \ > -o \( -name 'trash*' -type d -prune \) \ > -o \( -name '*.[hcS]' -type f -print \) \ > -o \( -name '*.sh' -type f -print \) \ > - ) > + )) > > $(ETAGS_TARGET): FORCE > $(RM) $(ETAGS_TARGET) > - $(FIND_SOURCE_FILES) | xargs etags -a -o $(ETAGS_TARGET) > + etags -a -o $(ETAGS_TARGET) $(SOURCE_FILES) > > tags: FORCE > $(RM) tags > - $(FIND_SOURCE_FILES) | xargs ctags -a > + ctags -a $(SOURCE_FILES) > > cscope: > $(RM) cscope* > - $(FIND_SOURCE_FILES) | xargs cscope -b > + cscope -b $(SOURCE_FILES) > > ### Detect prefix changes > TRACK_PREFIX = $(bindir_SQ):$(gitexecdir_SQ):$(template_dir_SQ):$(prefix_SQ):\ > > > >
On Fri, Sep 13, 2019 at 01:49:52PM +0200, SZEDER Gábor wrote: > On Thu, Sep 12, 2019 at 11:40:36AM -0700, Junio C Hamano wrote: > > Denton Liu <liu.denton@gmail.com> writes: > > > > > +FIND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) > > > +COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FIND_C_SOURCES)) > > > > The former is somewhat misnamed. FIND_SOURCE_FILES is *not* a list > > of source files---it is a procedure to list source files to its > > standard output. FIND_C_SOUCRES sounds as if it is a similar > > procedure, which would be implemented much like > > > > FIND_C_SOURCES = $(FIND_SOURCE_FILES) | sed -n -e '/\.c$/p' > > > > but that is not what you did and that is not what you want to have. > > Perhaps call it FOUND_C_SOURCES? > > > > I wonder if we can get rid of FIND_SOURCE_FILES that is a mere > > procedure and replace its use with a true list of source files. > > Would it make the result more pleasant to work with? > > > > Perhaps something like the attached patch, (which would come before > > this entire thing as a clean-up, and removing the need for 2/3)? > > > > I dunno. > > > > Using a procedure whose output is fed to xargs has an advantage that > > a platform with very short command line limit can still work with > > many source files, but the way you create and use COCCI_SOURCES in > > this patch would defeat that advantage anyway, > > COCCI_SOURCES is only used as an input to 'xargs', so that advantage > is not defeated. I think it still does matter; the relevant snippet is as follows: if ! echo $(COCCI_SOURCES) | xargs $$limit \ $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ >$@+ 2>$@.log; \ which means that a really big COCCI_SOURCES could exceed the limit. That being said, COCCI_SOURCES should be smaller than the future SOURCE_FILES variable since we're only taking %.c files (and filtering out some of them too!). I dunno, either. I'm mostly in favour of this change since it makes a lot of sense to keep lists in make variables if possible as opposed to command invocations. I guess worst case, if someone complains in the future, we can always change it back. > > > so perhaps we can get > > away with an approach like this. Having a list of things in $(MAKE) > > variable has a longer-term benefit that we could exploit more > > parallelism if we wanted to, too. > > > > Makefile | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/Makefile b/Makefile > > index f9255344ae..9dddd0e88c 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -2584,7 +2584,7 @@ perl/build/man/man3/Git.3pm: perl/Git.pm > > $(QUIET_GEN)mkdir -p $(dir $@) && \ > > pod2man $< $@ > > > > -FIND_SOURCE_FILES = ( \ > > +SOURCE_FILES = $(patsubst ./%,%,$(shell \ > > git ls-files \ > > '*.[hcS]' \ > > '*.sh' \ > > @@ -2599,19 +2599,19 @@ FIND_SOURCE_FILES = ( \ > > -o \( -name 'trash*' -type d -prune \) \ > > -o \( -name '*.[hcS]' -type f -print \) \ > > -o \( -name '*.sh' -type f -print \) \ > > - ) > > + )) > > > > $(ETAGS_TARGET): FORCE > > $(RM) $(ETAGS_TARGET) > > - $(FIND_SOURCE_FILES) | xargs etags -a -o $(ETAGS_TARGET) > > + etags -a -o $(ETAGS_TARGET) $(SOURCE_FILES) > > > > tags: FORCE > > $(RM) tags > > - $(FIND_SOURCE_FILES) | xargs ctags -a > > + ctags -a $(SOURCE_FILES) > > > > cscope: > > $(RM) cscope* > > - $(FIND_SOURCE_FILES) | xargs cscope -b > > + cscope -b $(SOURCE_FILES) > > > > ### Detect prefix changes > > TRACK_PREFIX = $(bindir_SQ):$(gitexecdir_SQ):$(template_dir_SQ):$(prefix_SQ):\ > > > > > > > >
SZEDER Gábor <szeder.dev@gmail.com> writes: >> Using a procedure whose output is fed to xargs has an advantage that >> a platform with very short command line limit can still work with >> many source files, but the way you create and use COCCI_SOURCES in >> this patch would defeat that advantage anyway, > > COCCI_SOURCES is only used as an input to 'xargs', so that advantage > is not defeated. It is passed as a command line argument to "echo", that pipes to xargs; I would not say it is taking advantage of "xargs" to lift the command line length limit, as it first needs to convince the shell to feed all of them to the "echo" that is upstream of "xargs". As you mentioned elsewhere, LIB_H already uses the same approach as I outlined in the message you are responding to (i.e. "don't define a procedure to produce lines to the standard output in a $(MAKE) variable--instead make the variable to hold the list itself"), so I suspect that we are almost on the same page?
On Fri, Sep 13, 2019 at 10:14:01AM -0700, Denton Liu wrote: > On Fri, Sep 13, 2019 at 01:49:52PM +0200, SZEDER Gábor wrote: > > On Thu, Sep 12, 2019 at 11:40:36AM -0700, Junio C Hamano wrote: > > > Denton Liu <liu.denton@gmail.com> writes: > > > > > > > +FIND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) > > > > +COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FIND_C_SOURCES)) > > > > > > The former is somewhat misnamed. FIND_SOURCE_FILES is *not* a list > > > of source files---it is a procedure to list source files to its > > > standard output. FIND_C_SOUCRES sounds as if it is a similar > > > procedure, which would be implemented much like > > > > > > FIND_C_SOURCES = $(FIND_SOURCE_FILES) | sed -n -e '/\.c$/p' > > > > > > but that is not what you did and that is not what you want to have. > > > Perhaps call it FOUND_C_SOURCES? > > > > > > I wonder if we can get rid of FIND_SOURCE_FILES that is a mere > > > procedure and replace its use with a true list of source files. > > > Would it make the result more pleasant to work with? > > > > > > Perhaps something like the attached patch, (which would come before > > > this entire thing as a clean-up, and removing the need for 2/3)? > > > > > > I dunno. > > > > > > Using a procedure whose output is fed to xargs has an advantage that > > > a platform with very short command line limit can still work with > > > many source files, but the way you create and use COCCI_SOURCES in > > > this patch would defeat that advantage anyway, > > > > COCCI_SOURCES is only used as an input to 'xargs', so that advantage > > is not defeated. > > I think it still does matter; the relevant snippet is as follows: > > if ! echo $(COCCI_SOURCES) | xargs $$limit \ > $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ > >$@+ 2>$@.log; \ > > which means that a really big COCCI_SOURCES could exceed the limit. Oh, you're both right. > That being said, COCCI_SOURCES should be smaller than the future > SOURCE_FILES variable since we're only taking %.c files (and filtering > out some of them too!). We could also argue that Coccinelle only runs on platforms that have a reasonably large command line arg limit, and the number of our source files is way below that, so it won't matter in the foreseeable future. (Furthermore, 'echo' is often a shell builtin command, and I don't think that the platform's argument size limit applies to them. At least the 'echo' of dash, Bash, ksh, ksh93, mksh, and BusyBox sh can deal with at least 10 million arguments; the platform limit is somewhere around 147k) > > > diff --git a/Makefile b/Makefile > > > index f9255344ae..9dddd0e88c 100644 > > > --- a/Makefile > > > +++ b/Makefile > > > @@ -2584,7 +2584,7 @@ perl/build/man/man3/Git.3pm: perl/Git.pm > > > $(QUIET_GEN)mkdir -p $(dir $@) && \ > > > pod2man $< $@ > > > > > > -FIND_SOURCE_FILES = ( \ > > > +SOURCE_FILES = $(patsubst ./%,%,$(shell \ > > > git ls-files \ > > > '*.[hcS]' \ > > > '*.sh' \ > > > @@ -2599,19 +2599,19 @@ FIND_SOURCE_FILES = ( \ > > > -o \( -name 'trash*' -type d -prune \) \ > > > -o \( -name '*.[hcS]' -type f -print \) \ > > > -o \( -name '*.sh' -type f -print \) \ > > > - ) > > > + )) > > > > > > $(ETAGS_TARGET): FORCE > > > $(RM) $(ETAGS_TARGET) > > > - $(FIND_SOURCE_FILES) | xargs etags -a -o $(ETAGS_TARGET) > > > + etags -a -o $(ETAGS_TARGET) $(SOURCE_FILES) > > > > > > tags: FORCE > > > $(RM) tags > > > - $(FIND_SOURCE_FILES) | xargs ctags -a > > > + ctags -a $(SOURCE_FILES) > > > > > > cscope: > > > $(RM) cscope* > > > - $(FIND_SOURCE_FILES) | xargs cscope -b > > > + cscope -b $(SOURCE_FILES) Here, however, the list of source files is passed as argument to non-builtin commands, that also might be used on cmdline-arg-limit-challenged platforms.
On Fri, Sep 13, 2019 at 08:00:14PM +0200, SZEDER Gábor wrote: > On Fri, Sep 13, 2019 at 10:14:01AM -0700, Denton Liu wrote: > > On Fri, Sep 13, 2019 at 01:49:52PM +0200, SZEDER Gábor wrote: > > > On Thu, Sep 12, 2019 at 11:40:36AM -0700, Junio C Hamano wrote: > > > > Denton Liu <liu.denton@gmail.com> writes: > > > > > > > > > +FIND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) > > > > > +COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FIND_C_SOURCES)) > > > > > > > > The former is somewhat misnamed. FIND_SOURCE_FILES is *not* a list > > > > of source files---it is a procedure to list source files to its > > > > standard output. FIND_C_SOUCRES sounds as if it is a similar > > > > procedure, which would be implemented much like > > > > > > > > FIND_C_SOURCES = $(FIND_SOURCE_FILES) | sed -n -e '/\.c$/p' > > > > > > > > but that is not what you did and that is not what you want to have. > > > > Perhaps call it FOUND_C_SOURCES? > > > > > > > > I wonder if we can get rid of FIND_SOURCE_FILES that is a mere > > > > procedure and replace its use with a true list of source files. > > > > Would it make the result more pleasant to work with? > > > > > > > > Perhaps something like the attached patch, (which would come before > > > > this entire thing as a clean-up, and removing the need for 2/3)? > > > > > > > > I dunno. > > > > > > > > Using a procedure whose output is fed to xargs has an advantage that > > > > a platform with very short command line limit can still work with > > > > many source files, but the way you create and use COCCI_SOURCES in > > > > this patch would defeat that advantage anyway, > > > > > > COCCI_SOURCES is only used as an input to 'xargs', so that advantage > > > is not defeated. > > > > I think it still does matter; the relevant snippet is as follows: > > > > if ! echo $(COCCI_SOURCES) | xargs $$limit \ > > $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ > > >$@+ 2>$@.log; \ > > > > which means that a really big COCCI_SOURCES could exceed the limit. > > Oh, you're both right. > > > That being said, COCCI_SOURCES should be smaller than the future > > SOURCE_FILES variable since we're only taking %.c files (and filtering > > out some of them too!). > > We could also argue that Coccinelle only runs on platforms that have a > reasonably large command line arg limit, and the number of our source > files is way below that, so it won't matter in the foreseeable future. Good point. > > (Furthermore, 'echo' is often a shell builtin command, and I don't > think that the platform's argument size limit applies to them. At > least the 'echo' of dash, Bash, ksh, ksh93, mksh, and BusyBox sh can > deal with at least 10 million arguments; the platform limit is > somewhere around 147k) > > > > > diff --git a/Makefile b/Makefile > > > > index f9255344ae..9dddd0e88c 100644 > > > > --- a/Makefile > > > > +++ b/Makefile > > > > @@ -2584,7 +2584,7 @@ perl/build/man/man3/Git.3pm: perl/Git.pm > > > > $(QUIET_GEN)mkdir -p $(dir $@) && \ > > > > pod2man $< $@ > > > > > > > > -FIND_SOURCE_FILES = ( \ > > > > +SOURCE_FILES = $(patsubst ./%,%,$(shell \ > > > > git ls-files \ > > > > '*.[hcS]' \ > > > > '*.sh' \ > > > > @@ -2599,19 +2599,19 @@ FIND_SOURCE_FILES = ( \ > > > > -o \( -name 'trash*' -type d -prune \) \ > > > > -o \( -name '*.[hcS]' -type f -print \) \ > > > > -o \( -name '*.sh' -type f -print \) \ > > > > - ) > > > > + )) > > > > > > > > $(ETAGS_TARGET): FORCE > > > > $(RM) $(ETAGS_TARGET) > > > > - $(FIND_SOURCE_FILES) | xargs etags -a -o $(ETAGS_TARGET) > > > > + etags -a -o $(ETAGS_TARGET) $(SOURCE_FILES) > > > > > > > > tags: FORCE > > > > $(RM) tags > > > > - $(FIND_SOURCE_FILES) | xargs ctags -a > > > > + ctags -a $(SOURCE_FILES) > > > > > > > > cscope: > > > > $(RM) cscope* > > > > - $(FIND_SOURCE_FILES) | xargs cscope -b > > > > + cscope -b $(SOURCE_FILES) > > Here, however, the list of source files is passed as argument to > non-builtin commands, that also might be used on > cmdline-arg-limit-challenged platforms. > After doing a bit of research, I think that I agree with you. It seems like the max command-line length for CMD on Windows is 8191 characters. However, after running the following, $ git ls-files '*.[hcS]' '*.sh' ':!*[tp][0-9][0-9][0-9][0-9]*' ':!contrib' | wc -c 12779 we can see that the command-line length would definitely exceed the max length so xargs would be required. As a result, we should probably just keep the existing xargs invocations.
diff --git a/Makefile b/Makefile index e2c693440b..7c88e0606f 100644 --- a/Makefile +++ b/Makefile @@ -2803,12 +2803,8 @@ check: command-list.h exit 1; \ fi -C_SOURCES = $(patsubst %.o,%.c,$(C_OBJ)) -ifdef DC_SHA1_SUBMODULE -COCCI_SOURCES = $(filter-out sha1collisiondetection/%,$(C_SOURCES)) -else -COCCI_SOURCES = $(filter-out sha1dc/%,$(C_SOURCES)) -endif +FIND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) +COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FIND_C_SOURCES)) %.cocci.patch: %.cocci $(COCCI_SOURCES) @echo ' ' SPATCH $<; \
Before, when running the "coccicheck" target, only the source files which were being compiled would have been checked by Coccinelle. However, just because we aren't compiling a source file doesn't mean we have to exclude it from analysis. This will allow us to catch more mistakes, in particular ones that affect Windows-only sources since Coccinelle currently runs only on Linux. Make the "coccicheck" target run on all C sources except for those that are taken from some third-party source. We don't want to patch these files since we want them to be as close to upstream as possible so that it'll be easier to pull in upstream updates. When running a build on Arch Linux with no additional flags provided, after applying this patch, the following sources are now checked: * block-sha1/sha1.c * compat/access.c * compat/basename.c * compat/fileno.c * compat/gmtime.c * compat/hstrerror.c * compat/memmem.c * compat/mingw.c * compat/mkdir.c * compat/mkdtemp.c * compat/mmap.c * compat/msvc.c * compat/pread.c * compat/precompose_utf8.c * compat/qsort.c * compat/setenv.c * compat/sha1-chunked.c * compat/snprintf.c * compat/stat.c * compat/strcasestr.c * compat/strdup.c * compat/strtoimax.c * compat/strtoumax.c * compat/unsetenv.c * compat/win32/dirent.c * compat/win32/path-utils.c * compat/win32/pthread.c * compat/win32/syslog.c * compat/win32/trace2_win32_process_info.c * compat/win32mmap.c * compat/winansi.c * ppc/sha1.c This also results in the following source now being excluded: * compat/obstack.c Signed-off-by: Denton Liu <liu.denton@gmail.com> --- Makefile | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)