diff mbox series

Makefile: add support for generating JSON compilation database

Message ID pull.714.git.1598815707540.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Makefile: add support for generating JSON compilation database | expand

Commit Message

Linus Arver via GitGitGadget Aug. 30, 2020, 7:28 p.m. UTC
From: Philippe Blain <levraiphilippeblain@gmail.com>

Tools based on LibClang [1] can make use of a 'JSON Compilation
Database' [2] that keeps track of the exact options used to compile a set
of source files.

The Clang compiler can generate JSON fragments when compiling [3],
using the `-MJ` flag. These JSON fragments (one per compiled source
file) can then be concatenated to create the compilation database,
commonly called 'compile_commands.json'.

Add support to the Makefile for generating these JSON fragments as well
as the compilation database itself, if the environment variable
'GENERATE_COMPILATION_DATABASE' is set.

If this variable is set, check that $(CC) indeed supports the `-MJ`
flag, following what is done for automatic dependencies.

All JSON fragments are placed in the 'compile_commands/' directory, and
the compilation database 'compile_commands.json' is generated as a
dependency of the 'all' target using a `sed` invocation.

[1] https://clang.llvm.org/docs/Tooling.html
[2] https://clang.llvm.org/docs/JSONCompilationDatabase.html
[3] https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-mj-arg

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
---
    Add support for generating JSON compilation database
    
    I don't have a lot of knowledge of Make double-colon rules, or insight
    into why they are used for the 'all' target, but I think the approach I
    chose makes sense. In particular, I do not list any prerequisite for the
    'compile_commands.json' file, but from what I tested it is still rebuilt
    anytime the 'all' target is rebuilt, which is what we want.
    
    Note: CMakeLists.txt in contrib/buildsystems does not need to be updated
    to also support this feature because CMake supports it out-of-the-box
    [1].
    
    [1] 
    https://cmake.org/cmake/help/latest/variable/CMAKE_EXPORT_COMPILE_COMMANDS.html

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-714%2Fphil-blain%2Fcompiledb-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-714/phil-blain/compiledb-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/714

 .gitignore |  2 ++
 Makefile   | 52 +++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 49 insertions(+), 5 deletions(-)


base-commit: d9cd4331470f4d9d78677f12dc79063dab832f53

Comments

brian m. carlson Aug. 30, 2020, 10:10 p.m. UTC | #1
On 2020-08-30 at 19:28:27, Philippe Blain via GitGitGadget wrote:
> From: Philippe Blain <levraiphilippeblain@gmail.com>
> 
> Tools based on LibClang [1] can make use of a 'JSON Compilation
> Database' [2] that keeps track of the exact options used to compile a set
> of source files.

For additional context why this is valuable, clangd, which is a C
language server protocol implementation, can use these files to
determine the flags needed to compile a file so it can provide proper
editor integration.  As a result, editors supporting the language server
protocol (such as VS Code, or Vim with a suitable plugin) can provide
better searching, integration, and refactoring tools.

So I'm very much in favor of a change like this.

> +ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
> +all:: compile_commands.json
> +compile_commands.json:
> +	@$(RM) $@
> +	$(QUIET_GEN)sed -e '1s/^/[/' -e '$$s/,$$/]/' $(compdb_dir)*.o.json > $@+
> +	@if test -s $@+; then mv $@+ $@; else $(RM) $@+; fi
> +endif

How are those commas at the end of the line added?  Are they natively
part of the files?  If so, this seems reasonable.
Philippe Blain Aug. 30, 2020, 10:17 p.m. UTC | #2
> #
> +# Define GENERATE_COMPILATION_DATABASE to generate JSON compilation database
> +# entries during compilation if your compiler supports it, using the `-MJ` flag.
> +# The JSON entries will be placed in the `compile_commands/` directory,
> +# and the JSON compilation database can be created afterwards with
> +# `make compile_commands.json`.
> +#

I'm realizing that the way I'm describing how it works here is wrong: there is no
separate 'make compile_commands.json' step needed (it was needed in my first draft).

I'll fix that for v2.
Philippe Blain Aug. 31, 2020, 2:37 a.m. UTC | #3
Hi Brian, 

> Le 30 août 2020 à 18:10, brian m. carlson <sandals@crustytoothpaste.net> a écrit :
> 
> On 2020-08-30 at 19:28:27, Philippe Blain via GitGitGadget wrote:
>> From: Philippe Blain <levraiphilippeblain@gmail.com>
>> 
>> Tools based on LibClang [1] can make use of a 'JSON Compilation
>> Database' [2] that keeps track of the exact options used to compile a set
>> of source files.
> 
> For additional context why this is valuable, clangd, which is a C
> language server protocol implementation, can use these files to
> determine the flags needed to compile a file so it can provide proper
> editor integration.  As a result, editors supporting the language server
> protocol (such as VS Code, or Vim with a suitable plugin) can provide
> better searching, integration, and refactoring tools.
> 
> So I'm very much in favor of a change like this.

Thanks!

> 
>> +ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
>> +all:: compile_commands.json
>> +compile_commands.json:
>> +	@$(RM) $@
>> +	$(QUIET_GEN)sed -e '1s/^/[/' -e '$$s/,$$/]/' $(compdb_dir)*.o.json > $@+
>> +	@if test -s $@+; then mv $@+ $@; else $(RM) $@+; fi
>> +endif
> 
> How are those commas at the end of the line added?  Are they natively
> part of the files?  If so, this seems reasonable.

Yes: the '*.o.json' files generated by the compiler contain one JSON object per file, 
with a trailing comma.
This 'sed' invocation turns these files into a proper JSON array by:
- adding a '[' at the beginning and a ']' at the end of the list of objects
- removing the comma after the last entry (before the closing ']')
Junio C Hamano Aug. 31, 2020, 4:24 a.m. UTC | #4
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-08-30 at 19:28:27, Philippe Blain via GitGitGadget wrote:
>> From: Philippe Blain <levraiphilippeblain@gmail.com>
>> 
>> Tools based on LibClang [1] can make use of a 'JSON Compilation
>> Database' [2] that keeps track of the exact options used to compile a set
>> of source files.
>
> For additional context why this is valuable, clangd, which is a C
> language server protocol implementation, can use these files to
> determine the flags needed to compile a file so it can provide proper
> editor integration.  As a result, editors supporting the language server
> protocol (such as VS Code, or Vim with a suitable plugin) can provide
> better searching, integration, and refactoring tools.

I found that the proposed commit log was very weak to sell the
change; some of what you gave above should definitely help strenthen
it.

Thanks.
Jeff King Sept. 1, 2020, 7:38 a.m. UTC | #5
On Sun, Aug 30, 2020 at 09:24:03PM -0700, Junio C Hamano wrote:

> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > On 2020-08-30 at 19:28:27, Philippe Blain via GitGitGadget wrote:
> >> From: Philippe Blain <levraiphilippeblain@gmail.com>
> >> 
> >> Tools based on LibClang [1] can make use of a 'JSON Compilation
> >> Database' [2] that keeps track of the exact options used to compile a set
> >> of source files.
> >
> > For additional context why this is valuable, clangd, which is a C
> > language server protocol implementation, can use these files to
> > determine the flags needed to compile a file so it can provide proper
> > editor integration.  As a result, editors supporting the language server
> > protocol (such as VS Code, or Vim with a suitable plugin) can provide
> > better searching, integration, and refactoring tools.
> 
> I found that the proposed commit log was very weak to sell the
> change; some of what you gave above should definitely help strenthen
> it.

Likewise. Looking at the output, I'm confused how it would help with
things like searching and refactoring. It might be nice to spell it out
for those of us exposed to it for the first time (I tried following the
links but remained unenlightened).

I'd also be curious to hear what advantages it gives to add a new
Makefile knob rather than just letting interested parties add -MJ to
their CFLAGS. Is it just a convenience to create the concatenated form?
It seems weird that projects would need to do so themselves with sed
hackery (i.e., I'd expect whatever consumes this json to be able to
handle multiple files).

-Peff
Philippe Blain Sept. 1, 2020, 1:18 p.m. UTC | #6
> Le 1 sept. 2020 à 03:38, Jeff King <peff@peff.net> a écrit :
> 
> On Sun, Aug 30, 2020 at 09:24:03PM -0700, Junio C Hamano wrote:
> 
>> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>> 
>>> On 2020-08-30 at 19:28:27, Philippe Blain via GitGitGadget wrote:
>>>> From: Philippe Blain <levraiphilippeblain@gmail.com>
>>>> 
>>>> Tools based on LibClang [1] can make use of a 'JSON Compilation
>>>> Database' [2] that keeps track of the exact options used to compile a set
>>>> of source files.
>>> 
>>> For additional context why this is valuable, clangd, which is a C
>>> language server protocol implementation, can use these files to
>>> determine the flags needed to compile a file so it can provide proper
>>> editor integration.  As a result, editors supporting the language server
>>> protocol (such as VS Code, or Vim with a suitable plugin) can provide
>>> better searching, integration, and refactoring tools.
>> 
>> I found that the proposed commit log was very weak to sell the
>> change; some of what you gave above should definitely help strenthen
>> it.
> 
> Likewise. Looking at the output, I'm confused how it would help with
> things like searching and refactoring. It might be nice to spell it out
> for those of us exposed to it for the first time (I tried following the
> links but remained unenlightened).

OK, I'll improve the commit message. I'm not at all an expert in this subject,
I just had to generate a compilation database myself to use the Sourcetrail source
explorer [1] with Git so I figured I'd share what I had done. Further exploration of the 
topic are in [2] and [3]. Note that I did try some of the tools listed in [2] before resorting
to modifying the Makefile, but these tools either did not work at all or produced wrong 
output (ex. strings in the JSON were not properly quoted, etc.)

> I'd also be curious to hear what advantages it gives to add a new
> Makefile knob rather than just letting interested parties add -MJ to
> their CFLAGS. Is it just a convenience to create the concatenated form?

Unfortunately this would not work because the '-MJ' flag needs a file name
to know where to put the JSON fragment.

Thanks,
Philippe.


[1] www.sourcetrail.com
[2] https://sarcasm.github.io/notes/dev/compilation-database.html
[3] https://eli.thegreenplace.net/2014/05/21/compilation-databases-for-clang-based-tools
brian m. carlson Sept. 2, 2020, 1:33 a.m. UTC | #7
On 2020-09-01 at 07:38:27, Jeff King wrote:
> Likewise. Looking at the output, I'm confused how it would help with
> things like searching and refactoring. It might be nice to spell it out
> for those of us exposed to it for the first time (I tried following the
> links but remained unenlightened).

Traditionally, editors had to learn about every language if they wanted
to add special functionality like refactoring (e.g., renaming "struct
foo" to "struct bar"), finding all the instances of a type, finding
where a type or function was declared, or similar IDE features.  When
Microsoft developed Visual Studio Code, they decided that they did not
want to implement this functionality for every language under the sun,
and instead developed the Language Server Protocol[0].

With LSP, each editor needs functionality to speak its portion (either
natively, as with VS Code, or with a plugin, such as Vim's ALE) and each
language implements a language server to implement its part of the
functionality.  The protocol is capability based, so implementations can
support those features which make sense for their editor or language and
omit those which don't.  This way, all editors can benefit and language
communities can implement one program to provide features, and the
problem becomes an O(M + N) problem instead of an O(M * N) problem.

In some languages, like Rust, it's pretty obvious how to compile your
project: you use cargo, the built-in build tool.  There is also a
standard layout to find and enumerate files within a project.  However,
C is not so standardized, so clangd, which is a clang-based C and C++
LSP implementation, needs help to find out which flags are needed to
compile, and therefore find the header files to make sense of parsing
the C code and implementing its side of the protocol.  That's what this
patch implements.

I use Vim and ALE extensively, and it pretty much just works for most
languages, including Go and Rust, once you install the LSP server.  Git
is one of the few projects I work on which is still C and therefore
needs help here.

Hopefully this is at least more enlightening about the functionality
that clangd provides, why it's interesting, how it works, and why it's
valuable.

> I'd also be curious to hear what advantages it gives to add a new
> Makefile knob rather than just letting interested parties add -MJ to
> their CFLAGS. Is it just a convenience to create the concatenated form?
> It seems weird that projects would need to do so themselves with sed
> hackery (i.e., I'd expect whatever consumes this json to be able to
> handle multiple files).

I believe clangd does need the concatenated form, and at least the ALE
plugin for Vim uses that specific file name to detect whether clangd
should be used.  The problem is that clangd doesn't know where your
source code is actually located and it's very expensive to traverse an
entire repository which might contain literally millions of files if
you're only really interested in a handful.

[0] https://microsoft.github.io/language-server-protocol/
Jeff King Sept. 2, 2020, 8:04 a.m. UTC | #8
On Wed, Sep 02, 2020 at 01:33:51AM +0000, brian m. carlson wrote:

> Traditionally, editors had to learn about every language if they wanted
> to add special functionality like refactoring (e.g., renaming "struct
> foo" to "struct bar"), finding all the instances of a type, finding
> where a type or function was declared, or similar IDE features.  When
> Microsoft developed Visual Studio Code, they decided that they did not
> want to implement this functionality for every language under the sun,
> and instead developed the Language Server Protocol[0].
> [...]

Thanks for the explanation. I understand what LSP does, but the missing
link for me was how "here are the command-line flags to the compiler"
turned into something useful like "here's a list of identifiers". And
clangd fills in that gap (presumably re-running the front-end bits of
clang on the fly to pull out that kind of information).

-Peff
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..f4c51300e0 100644
--- a/.gitignore
+++ b/.gitignore
@@ -197,6 +197,7 @@ 
 /git.spec
 *.exe
 *.[aos]
+*.o.json
 *.py[co]
 .depend/
 *.gcda
@@ -218,6 +219,7 @@ 
 /tags
 /TAGS
 /cscope*
+/compile_commands.json
 *.hcc
 *.obj
 *.lib
diff --git a/Makefile b/Makefile
index 65f8cfb236..954bd2aa47 100644
--- a/Makefile
+++ b/Makefile
@@ -462,6 +462,12 @@  all::
 # the global variable _wpgmptr containing the absolute path of the current
 # executable (this is the case on Windows).
 #
+# Define GENERATE_COMPILATION_DATABASE to generate JSON compilation database
+# entries during compilation if your compiler supports it, using the `-MJ` flag.
+# The JSON entries will be placed in the `compile_commands/` directory,
+# and the JSON compilation database can be created afterwards with
+# `make compile_commands.json`.
+#
 # Define DEVELOPER to enable more compiler warnings. Compiler version
 # and family are auto detected, but could be overridden by defining
 # COMPILER_FEATURES (see config.mak.dev). You can still set
@@ -1258,6 +1264,20 @@  $(error please set COMPUTE_HEADER_DEPENDENCIES to yes, no, or auto \
 endif
 endif
 
+ifdef GENERATE_COMPILATION_DATABASE
+compdb_check = $(shell $(CC) $(ALL_CFLAGS) \
+	-c -MJ /dev/null \
+	-x c /dev/null -o /dev/null 2>&1; \
+	echo $$?)
+ifeq ($(compdb_check),0)
+override GENERATE_COMPILATION_DATABASE = yes
+else
+override GENERATE_COMPILATION_DATABASE = no
+$(warning GENERATE_COMPILATION_DATABASE is set, but your compiler does not \
+support generating compilation database entries)
+endif
+endif
+
 ifdef SANE_TOOL_PATH
 SANE_TOOL_PATH_SQ = $(subst ','\'',$(SANE_TOOL_PATH))
 BROKEN_PATH_FIX = 's|^\# @@BROKEN_PATH_FIX@@$$|git_broken_path_fix "$(SANE_TOOL_PATH_SQ)"|'
@@ -2381,16 +2401,30 @@  missing_dep_dirs =
 dep_args =
 endif
 
+compdb_dir = compile_commands/
+
+ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
+missing_compdb_dir = $(compdb_dir)
+$(missing_compdb_dir):
+	@mkdir -p $@
+
+compdb_file = $(compdb_dir)$(subst .-,,$(subst /,-,$(dir $@)))$(notdir $@).json
+compdb_args = -MJ $(compdb_file)
+else
+missing_compdb_dir =
+compdb_args =
+endif
+
 ASM_SRC := $(wildcard $(OBJECTS:o=S))
 ASM_OBJ := $(ASM_SRC:S=o)
 C_OBJ := $(filter-out $(ASM_OBJ),$(OBJECTS))
 
 .SUFFIXES:
 
-$(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs)
-	$(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $<
-$(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs)
-	$(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $<
+$(C_OBJ): %.o: %.c GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir)
+	$(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $<
+$(ASM_OBJ): %.o: %.S GIT-CFLAGS $(missing_dep_dirs) $(missing_compdb_dir)
+	$(QUIET_CC)$(CC) -o $*.o -c $(dep_args) $(compdb_args) $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $<
 
 %.s: %.c GIT-CFLAGS FORCE
 	$(QUIET_CC)$(CC) -o $@ -S $(ALL_CFLAGS) $(EXTRA_CPPFLAGS) $<
@@ -2413,6 +2447,14 @@  else
 $(OBJECTS): $(LIB_H) $(GENERATED_H)
 endif
 
+ifeq ($(GENERATE_COMPILATION_DATABASE),yes)
+all:: compile_commands.json
+compile_commands.json:
+	@$(RM) $@
+	$(QUIET_GEN)sed -e '1s/^/[/' -e '$$s/,$$/]/' $(compdb_dir)*.o.json > $@+
+	@if test -s $@+; then mv $@+ $@; else $(RM) $@+; fi
+endif
+
 exec-cmd.sp exec-cmd.s exec-cmd.o: GIT-PREFIX
 exec-cmd.sp exec-cmd.s exec-cmd.o: EXTRA_CPPFLAGS = \
 	'-DGIT_EXEC_PATH="$(gitexecdir_SQ)"' \
@@ -3117,7 +3159,7 @@  clean: profile-clean coverage-clean cocciclean
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
 	$(RM) $(HCC)
-	$(RM) -r bin-wrappers $(dep_dirs)
+	$(RM) -r bin-wrappers $(dep_dirs) $(compdb_dir) compile_commands.json
 	$(RM) -r po/build/
 	$(RM) *.pyc *.pyo */*.pyc */*.pyo $(GENERATED_H) $(ETAGS_TARGET) tags cscope*
 	$(RM) -r $(GIT_TARNAME) .doc-tmp-dir