[1/8] ls-files: add --json to dump the index
diff mbox series

Message ID 20190619095858.30124-2-pclouds@gmail.com
State New
Headers show
Series
  • Add 'ls-files --json' to dump the index in json
Related show

Commit Message

Duy Nguyen June 19, 2019, 9:58 a.m. UTC
So far we don't have a command to basically dump the index file out,
with all its glory details. Checking some info, for example, stat
time, usually involves either writing new code or firing up "xxd" and
decoding values by yourself.

This --json is supposed to help that. It dumps the index in a human
readable format but also easy to be processed with tools. And it will
print almost enough info to reconstruct the index later.

In this patch we only dump the main part, not extensions. But at the
end of the series, the entire index is dumped. The end result could be
very verbose even on a small repository such as git.git.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-ls-files.txt |  5 +++
 builtin/ls-files.c             | 30 +++++++++++---
 cache.h                        |  2 +
 json-writer.c                  | 16 ++++++++
 json-writer.h                  | 21 ++++++++++
 read-cache.c                   | 73 +++++++++++++++++++++++++++++++++-
 6 files changed, 140 insertions(+), 7 deletions(-)

Comments

Ævar Arnfjörð Bjarmason June 19, 2019, 10:30 a.m. UTC | #1
On Wed, Jun 19 2019, Nguyễn Thái Ngọc Duy wrote:

> +		die(_("--show-json cannot be used with other --show- options, or --with-tree"));

Should be --json, not --show-json, right? I assume --show-json is left
over from an earlier version.
Derrick Stolee June 19, 2019, 1:03 p.m. UTC | #2
On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
> So far we don't have a command to basically dump the index file out,
> with all its glory details. Checking some info, for example, stat
> time, usually involves either writing new code or firing up "xxd" and
> decoding values by yourself.
> 
> This --json is supposed to help that. It dumps the index in a human
> readable format but also easy to be processed with tools. And it will
> print almost enough info to reconstruct the index later.

In an earlier message, I stated that I like the idea of this feature.
I know that you wanted to get that feedback before working too hard on
the patch series, so now that interest is declared, please add some tests
that verify this output looks as you expect.

> In this patch we only dump the main part, not extensions. But at the
> end of the series, the entire index is dumped. The end result could be
> very verbose even on a small repository such as git.git.

I would expect this commit to include a complete "output matches expected"
test, and the later patches can update the index to include these
extensions then verify that their sections appear in the output using
a grep.

> +--json::
> +	Dump the entire index content in JSON format. This is for
> +	debugging purposes and the JSON structure may change from time
> +	to time.
> +

"...purposes and the JSON structure may change from time to time" may be better
as "...purposes. The JSON structure is subject to change."

> +		OPT_BOOL(0, "json", &show_json,
> +			N_("dump index content in json format")),

We should probably use "JSON" here in the help text.

> -	show_files(the_repository, &dir);
> -
> -	if (show_resolve_undo)
> -		show_ru_info(the_repository->index);
> +	if (!show_json) {
> +		show_files(the_repository, &dir);
> +
> +		if (show_resolve_undo)
> +			show_ru_info(the_repository->index);
> +	} else {
> +		struct json_writer jw = JSON_WRITER_INIT;
> +
> +		discard_index(the_repository->index);
> +		the_repository->index->jw = &jw;
> +		if (repo_read_index(the_repository) < 0)
> +			die("index file corrupt");
> +		puts(jw.json.buf);
> +		the_repository->index->jw = NULL;
> +		jw_release(&jw);
> +	}
>  
>  	if (ps_matched) {
>  		int bad;

I see this 'ps_matched' condition at the end, which is related to
the '--error-unmatch' option. I added "--error-unmatch foo" to my
command and got the appropriate error message:

  error: pathspec 'foo' did not match any file(s) known to git
  Did you forget to 'git add'?

This was sent to stderr while the JSON was in stdout, so this should
be appropriate to allow both options. Just pointing it out to make
sure this is intended.

> +void jw_object_stat_data(struct json_writer *jw, const char *name,
> +			 const struct stat_data *sd)
> +{
> +	jw_object_inline_begin_object(jw, name);
> +	jw_object_intmax(jw, "st_ctime.sec", sd->sd_ctime.sec);
> +	jw_object_intmax(jw, "st_ctime.nsec", sd->sd_ctime.nsec);
> +	jw_object_intmax(jw, "st_mtime.sec", sd->sd_mtime.sec);
> +	jw_object_intmax(jw, "st_mtime.nsec", sd->sd_mtime.nsec);
> +	jw_object_intmax(jw, "st_dev", sd->sd_dev);
> +	jw_object_intmax(jw, "st_ino", sd->sd_ino);
> +	jw_object_intmax(jw, "st_uid", sd->sd_uid);
> +	jw_object_intmax(jw, "st_gid", sd->sd_gid);
> +	jw_object_intmax(jw, "st_size", sd->sd_size);
> +	jw_end(jw);
> +}

If these are all part of the same object, are the "st_" prefixes
necessary for every member?

> +	/*
> +	 * again redundant info, just so you don't have to decode
> +	 * flags values manually
> +	 */
> +	if (ce->ce_flags & CE_VALID)
> +		jw_object_true(jw, "assume-unchanged");
> +	if (ce->ce_flags & CE_INTENT_TO_ADD)
> +		jw_object_true(jw, "intent-to-add");
> +	if (ce->ce_flags & CE_SKIP_WORKTREE)
> +		jw_object_true(jw, "skip-worktree");
> +	if (ce_stage(ce))
> +		jw_object_intmax(jw, "stage", ce_stage(ce));

I'm really glad these flags are getting expanded! Much easier to
understand what's going on this way.

> +	if (istate->jw) {
> +		jw_object_begin(istate->jw, jw_pretty);
> +		jw_object_intmax(istate->jw, "version", istate->version);
> +		jw_object_string(istate->jw, "oid", oid_to_hex(&istate->oid));
> +		jw_object_intmax(istate->jw, "st_mtime.sec", istate->timestamp.sec);
> +		jw_object_intmax(istate->jw, "st_mtime.nsec", istate->timestamp.nsec);

Here, the "st_" prefixes are not on every member, but would it
be confusing if they were not there? Also, including a "." in
a member name may be troublesome for JSON, as that typically
means we are accessing a member of an object. Perhaps use _sec
and _nsec here and in the earlier stat_data block.

Thanks,
-Stolee
Johannes Schindelin June 21, 2019, 1:04 p.m. UTC | #3
Hi,

On Wed, 19 Jun 2019, Derrick Stolee wrote:

> On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
>
> > +--json::
> > +	Dump the entire index content in JSON format. This is for
> > +	debugging purposes and the JSON structure may change from time
> > +	to time.
> > +
>
> "...purposes and the JSON structure may change from time to time" may be
> better as "...purposes. The JSON structure is subject to change."

It would probably make even more sense to mark this as an experimental
feature for now (i.e. prefix the description with "(EXPERIMENTAL) ", so
that users will have a harder time to miss that vague statement at the
end.

Once the feature stabilized enough, it would probably make sense to start
versioning the JSON format (`--json[=<version>]`, defaulting to the
latest).

That would make this a pretty useful feature not only for debugging, I
would imagine, but really would set a precedent for a better "API" for
3rd-party applications to use than the current one.

Thanks,
Dscho
Duy Nguyen June 24, 2019, 12:50 p.m. UTC | #4
On Wed, Jun 19, 2019 at 8:03 PM Derrick Stolee <stolee@gmail.com> wrote:
> > -     show_files(the_repository, &dir);
> > -
> > -     if (show_resolve_undo)
> > -             show_ru_info(the_repository->index);
> > +     if (!show_json) {
> > +             show_files(the_repository, &dir);
> > +
> > +             if (show_resolve_undo)
> > +                     show_ru_info(the_repository->index);
> > +     } else {
> > +             struct json_writer jw = JSON_WRITER_INIT;
> > +
> > +             discard_index(the_repository->index);
> > +             the_repository->index->jw = &jw;
> > +             if (repo_read_index(the_repository) < 0)
> > +                     die("index file corrupt");
> > +             puts(jw.json.buf);
> > +             the_repository->index->jw = NULL;
> > +             jw_release(&jw);
> > +     }
> >
> >       if (ps_matched) {
> >               int bad;
>
> I see this 'ps_matched' condition at the end, which is related to
> the '--error-unmatch' option. I added "--error-unmatch foo" to my
> command and got the appropriate error message:
>
>   error: pathspec 'foo' did not match any file(s) known to git
>   Did you forget to 'git add'?
>
> This was sent to stderr while the JSON was in stdout, so this should
> be appropriate to allow both options. Just pointing it out to make
> sure this is intended.

--error-unmatch only makes sense when you specify pathspec (like
"foo") but that does not work well with --json at all because we don't
do filtering (how do we even filter in extensions?). I'll just make
sure that "ls-files --json <pathspec>" is rejected. That'll cover
--error-unmatch.

Patch
diff mbox series

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 8461c0e83e..54011c8f65 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -60,6 +60,11 @@  OPTIONS
 --stage::
 	Show staged contents' mode bits, object name and stage number in the output.
 
+--json::
+	Dump the entire index content in JSON format. This is for
+	debugging purposes and the JSON structure may change from time
+	to time.
+
 --directory::
 	If a whole directory is classified as "other", show just its
 	name (with a trailing slash) and not its whole contents.
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7f83c9a6f2..d00f6d3074 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -8,6 +8,7 @@ 
 #include "cache.h"
 #include "repository.h"
 #include "config.h"
+#include "json-writer.h"
 #include "quote.h"
 #include "dir.h"
 #include "builtin.h"
@@ -31,6 +32,7 @@  static int show_modified;
 static int show_killed;
 static int show_valid_bit;
 static int show_fsmonitor_bit;
+static int show_json;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
@@ -543,6 +545,8 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("show staged contents' object name in the output")),
 		OPT_BOOL('k', "killed", &show_killed,
 			N_("show files on the filesystem that need to be removed")),
+		OPT_BOOL(0, "json", &show_json,
+			N_("dump index content in json format")),
 		OPT_BIT(0, "directory", &dir.flags,
 			N_("show 'other' directories' names only"),
 			DIR_SHOW_OTHER_DIRECTORIES),
@@ -660,8 +664,12 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 
 	/* With no flags, we default to showing the cached files */
 	if (!(show_stage || show_deleted || show_others || show_unmerged ||
-	      show_killed || show_modified || show_resolve_undo))
+	      show_killed || show_modified || show_resolve_undo || show_json))
 		show_cached = 1;
+	if (show_json && (show_stage || show_deleted || show_others ||
+			  show_unmerged || show_killed || show_modified ||
+			  show_cached || show_resolve_undo || with_tree))
+		die(_("--show-json cannot be used with other --show- options, or --with-tree"));
 
 	if (with_tree) {
 		/*
@@ -673,10 +681,22 @@  int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		overlay_tree_on_index(the_repository->index, with_tree, max_prefix);
 	}
 
-	show_files(the_repository, &dir);
-
-	if (show_resolve_undo)
-		show_ru_info(the_repository->index);
+	if (!show_json) {
+		show_files(the_repository, &dir);
+
+		if (show_resolve_undo)
+			show_ru_info(the_repository->index);
+	} else {
+		struct json_writer jw = JSON_WRITER_INIT;
+
+		discard_index(the_repository->index);
+		the_repository->index->jw = &jw;
+		if (repo_read_index(the_repository) < 0)
+			die("index file corrupt");
+		puts(jw.json.buf);
+		the_repository->index->jw = NULL;
+		jw_release(&jw);
+	}
 
 	if (ps_matched) {
 		int bad;
diff --git a/cache.h b/cache.h
index bf20337ef4..84d0aeed20 100644
--- a/cache.h
+++ b/cache.h
@@ -326,6 +326,7 @@  static inline unsigned int canon_mode(unsigned int mode)
 #define UNTRACKED_CHANGED	(1 << 7)
 #define FSMONITOR_CHANGED	(1 << 8)
 
+struct json_writer;
 struct split_index;
 struct untracked_cache;
 
@@ -350,6 +351,7 @@  struct index_state {
 	uint64_t fsmonitor_last_update;
 	struct ewah_bitmap *fsmonitor_dirty;
 	struct mem_pool *ce_mem_pool;
+	struct json_writer *jw;
 };
 
 /* Name hashing */
diff --git a/json-writer.c b/json-writer.c
index aadb9dbddc..281bc50b39 100644
--- a/json-writer.c
+++ b/json-writer.c
@@ -202,6 +202,22 @@  void jw_object_null(struct json_writer *jw, const char *key)
 	strbuf_addstr(&jw->json, "null");
 }
 
+void jw_object_stat_data(struct json_writer *jw, const char *name,
+			 const struct stat_data *sd)
+{
+	jw_object_inline_begin_object(jw, name);
+	jw_object_intmax(jw, "st_ctime.sec", sd->sd_ctime.sec);
+	jw_object_intmax(jw, "st_ctime.nsec", sd->sd_ctime.nsec);
+	jw_object_intmax(jw, "st_mtime.sec", sd->sd_mtime.sec);
+	jw_object_intmax(jw, "st_mtime.nsec", sd->sd_mtime.nsec);
+	jw_object_intmax(jw, "st_dev", sd->sd_dev);
+	jw_object_intmax(jw, "st_ino", sd->sd_ino);
+	jw_object_intmax(jw, "st_uid", sd->sd_uid);
+	jw_object_intmax(jw, "st_gid", sd->sd_gid);
+	jw_object_intmax(jw, "st_size", sd->sd_size);
+	jw_end(jw);
+}
+
 static void increase_indent(struct strbuf *sb,
 			    const struct json_writer *jw,
 			    int indent)
diff --git a/json-writer.h b/json-writer.h
index 83906b09c1..38f9c9bf68 100644
--- a/json-writer.h
+++ b/json-writer.h
@@ -44,6 +44,8 @@ 
 
 #include "strbuf.h"
 
+struct stat_data;
+
 struct json_writer
 {
 	/*
@@ -81,6 +83,8 @@  void jw_object_true(struct json_writer *jw, const char *key);
 void jw_object_false(struct json_writer *jw, const char *key);
 void jw_object_bool(struct json_writer *jw, const char *key, int value);
 void jw_object_null(struct json_writer *jw, const char *key);
+void jw_object_stat_data(struct json_writer *jw, const char *key,
+			 const struct stat_data *sd);
 void jw_object_sub_jw(struct json_writer *jw, const char *key,
 		      const struct json_writer *value);
 
@@ -104,4 +108,21 @@  void jw_array_inline_begin_array(struct json_writer *jw);
 int jw_is_terminated(const struct json_writer *jw);
 void jw_end(struct json_writer *jw);
 
+/*
+ * These _gently versions accept NULL json_writer to reduce too much
+ * branching at the call site.
+ */
+static inline void jw_object_inline_begin_array_gently(struct json_writer *jw,
+						       const char *name)
+{
+	if (jw)
+		jw_object_inline_begin_array(jw, name);
+}
+
+static inline void jw_end_gently(struct json_writer *jw)
+{
+	if (jw)
+		jw_end(jw);
+}
+
 #endif /* JSON_WRITER_H */
diff --git a/read-cache.c b/read-cache.c
index 4dd22f4f6e..eec030b3bb 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -25,6 +25,7 @@ 
 #include "fsmonitor.h"
 #include "thread-utils.h"
 #include "progress.h"
+#include "json-writer.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -1952,6 +1953,50 @@  static void *load_index_extensions(void *_data)
 	return NULL;
 }
 
+static void dump_cache_entry(struct index_state *istate,
+			     int index,
+			     unsigned long offset,
+			     const struct cache_entry *ce)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct json_writer *jw = istate->jw;
+
+	jw_array_inline_begin_object(jw);
+
+	/*
+	 * this is technically redundant, but it's for easier
+	 * navigation when there hundreds of entries
+	 */
+	jw_object_intmax(jw, "id", index);
+
+	jw_object_string(jw, "name", ce->name);
+
+	strbuf_addf(&sb, "%06o", ce->ce_mode);
+	jw_object_string(jw, "mode", sb.buf);
+	strbuf_release(&sb);
+
+	jw_object_intmax(jw, "flags", ce->ce_flags);
+	/*
+	 * again redundant info, just so you don't have to decode
+	 * flags values manually
+	 */
+	if (ce->ce_flags & CE_VALID)
+		jw_object_true(jw, "assume-unchanged");
+	if (ce->ce_flags & CE_INTENT_TO_ADD)
+		jw_object_true(jw, "intent-to-add");
+	if (ce->ce_flags & CE_SKIP_WORKTREE)
+		jw_object_true(jw, "skip-worktree");
+	if (ce_stage(ce))
+		jw_object_intmax(jw, "stage", ce_stage(ce));
+
+	jw_object_string(jw, "oid", oid_to_hex(&ce->oid));
+
+	jw_object_stat_data(jw, "stat", &ce->ce_stat_data);
+	jw_object_intmax(jw, "file-offset", offset);
+
+	jw_end(jw);
+}
+
 /*
  * A helper function that will load the specified range of cache entries
  * from the memory mapped file and add them to the given index.
@@ -1972,6 +2017,9 @@  static unsigned long load_cache_entry_block(struct index_state *istate,
 		ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, &consumed, previous_ce);
 		set_index_entry(istate, i, ce);
 
+		if (istate->jw)
+			dump_cache_entry(istate, i, src_offset, ce);
+
 		src_offset += consumed;
 		previous_ce = ce;
 	}
@@ -1983,6 +2031,8 @@  static unsigned long load_all_cache_entries(struct index_state *istate,
 {
 	unsigned long consumed;
 
+	jw_object_inline_begin_array_gently(istate->jw, "entries");
+
 	if (istate->version == 4) {
 		mem_pool_init(&istate->ce_mem_pool,
 				estimate_cache_size_from_compressed(istate->cache_nr));
@@ -1993,6 +2043,8 @@  static unsigned long load_all_cache_entries(struct index_state *istate,
 
 	consumed = load_cache_entry_block(istate, istate->ce_mem_pool,
 					0, istate->cache_nr, mmap, src_offset, NULL);
+
+	jw_end_gently(istate->jw);
 	return consumed;
 }
 
@@ -2120,6 +2172,7 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	size_t extension_offset = 0;
 	int nr_threads, cpus;
 	struct index_entry_offset_table *ieot = NULL;
+	int jw_pretty = 1;
 
 	if (istate->initialized)
 		return istate->cache_nr;
@@ -2154,6 +2207,8 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	istate->cache_nr = ntohl(hdr->hdr_entries);
 	istate->cache_alloc = alloc_nr(istate->cache_nr);
 	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
+	istate->timestamp.sec = st.st_mtime;
+	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 	istate->initialized = 1;
 
 	p.istate = istate;
@@ -2176,6 +2231,20 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (!HAVE_THREADS)
 		nr_threads = 1;
 
+	if (istate->jw) {
+		jw_object_begin(istate->jw, jw_pretty);
+		jw_object_intmax(istate->jw, "version", istate->version);
+		jw_object_string(istate->jw, "oid", oid_to_hex(&istate->oid));
+		jw_object_intmax(istate->jw, "st_mtime.sec", istate->timestamp.sec);
+		jw_object_intmax(istate->jw, "st_mtime.nsec", istate->timestamp.nsec);
+
+		/*
+		 * Threading may mess up json writing. This is for
+		 * debugging only, so performance is not a concern.
+		 */
+		nr_threads = 1;
+	}
+
 	if (nr_threads > 1) {
 		extension_offset = read_eoie_extension(mmap, mmap_size);
 		if (extension_offset) {
@@ -2204,8 +2273,6 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset);
 	}
 
-	istate->timestamp.sec = st.st_mtime;
-	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 
 	/* if we created a thread, join it otherwise load the extensions on the primary thread */
 	if (extension_offset) {
@@ -2216,6 +2283,8 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		p.src_offset = src_offset;
 		load_index_extensions(&p);
 	}
+	jw_end_gently(istate->jw);
+
 	munmap((void *)mmap, mmap_size);
 
 	/*