mbox series

[0/8] Add 'ls-files --json' to dump the index in json

Message ID 20190619095858.30124-1-pclouds@gmail.com (mailing list archive)
Headers show
Series Add 'ls-files --json' to dump the index in json | expand

Message

Duy Nguyen June 19, 2019, 9:58 a.m. UTC
This is probably just my itch. Every time I have to do something with
the index, I need to add a little bit code here, a little bit there to
get a better "view" of the index.

This solves it for me. It allows me to see pretty much everything in the
index (except really low detail stuff like pathname compression). It's
readable by human, but also easy to parse if you need to do statistics
and stuff. You could even do a "diff" between two indexes.

I'm not really sure if anybody else finds this useful. Because if not,
I guess there's not much point trying to merge it to git.git just for a
single user. Maintaining off tree is still a pain for me, but I think
I can manage it.

Nguyễn Thái Ngọc Duy (8):
  ls-files: add --json to dump the index
  split-index.c: dump "link" extension as json
  fsmonitor.c: dump "FSMN" extension as json
  resolve-undo.c: dump "REUC" extension as json
  read-cache.c: dump "EOIE" extension as json
  read-cache.c: dump "IEOT" extension as json
  cache-tree.c: dump "TREE" extension as json
  dir.c: dump "UNTR" extension as json

 Documentation/git-ls-files.txt |   5 ++
 builtin/ls-files.c             |  30 +++++--
 cache-tree.c                   |  41 ++++++++--
 cache-tree.h                   |   5 +-
 cache.h                        |   2 +
 dir.c                          |  56 ++++++++++++-
 dir.h                          |   4 +-
 fsmonitor.c                    |   9 +++
 json-writer.c                  |  30 +++++++
 json-writer.h                  |  29 +++++++
 read-cache.c                   | 139 ++++++++++++++++++++++++++++++---
 resolve-undo.c                 |  36 ++++++++-
 resolve-undo.h                 |   4 +-
 split-index.c                  |  13 ++-
 14 files changed, 376 insertions(+), 27 deletions(-)

Comments

Derrick Stolee June 19, 2019, 11:58 a.m. UTC | #1
On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.
> 
> This solves it for me. It allows me to see pretty much everything in the
> index (except really low detail stuff like pathname compression). It's
> readable by human, but also easy to parse if you need to do statistics
> and stuff. You could even do a "diff" between two indexes.
> 
> I'm not really sure if anybody else finds this useful. Because if not,
> I guess there's not much point trying to merge it to git.git just for a
> single user. Maintaining off tree is still a pain for me, but I think
> I can manage it.

I think we (Microsoft/VFS for Git engineers) would use this tool, as we
frequently need to diagnose something that went wrong in a user's index.
Kevin Willford built a tool to search the index and figure out what's
going on, but I'm not sure it parses all of the new extensions or was
updated to parse the v5 index.

Having a translation from the internal index format to an easier-to-parse
format is valuable.

Thanks,
-Stolee
Duy Nguyen June 19, 2019, 12:42 p.m. UTC | #2
On Wed, Jun 19, 2019 at 6:58 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
> > This is probably just my itch. Every time I have to do something with
> > the index, I need to add a little bit code here, a little bit there to
> > get a better "view" of the index.
> >
> > This solves it for me. It allows me to see pretty much everything in the
> > index (except really low detail stuff like pathname compression). It's
> > readable by human, but also easy to parse if you need to do statistics
> > and stuff. You could even do a "diff" between two indexes.
> >
> > I'm not really sure if anybody else finds this useful. Because if not,
> > I guess there's not much point trying to merge it to git.git just for a
> > single user. Maintaining off tree is still a pain for me, but I think
> > I can manage it.
>
> I think we (Microsoft/VFS for Git engineers) would use this tool, as we
> frequently need to diagnose something that went wrong in a user's index.
> Kevin Willford built a tool to search the index and figure out what's
> going on, but I'm not sure it parses all of the new extensions or was
> updated to parse the v5 index.

OK I suggest you try it out and see if it really fits your internal
tools. I wanted to balance between manual inspection and automation so
the output may not be the best for tools. I also try not to freeze the
format for more wiggle room, which would be fine for one-time scripts,
but if you want to have real tools depend on it, we may have to look
harder at the output format and make sure it's good enough for some
time, and have some documentation.

Also, I don't suppose it matters, but just for the record I don't care
at all about --json performance. I suppose Jeff's json writer does not
cache the entire json output in memory, so dumping giant index files
is fine. But some other things, like reading the index with multiple
threads, are also disabled.
Derrick Stolee June 19, 2019, 12:48 p.m. UTC | #3
On 6/19/2019 8:42 AM, Duy Nguyen wrote:
> On Wed, Jun 19, 2019 at 6:58 PM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
>>> This is probably just my itch. Every time I have to do something with
>>> the index, I need to add a little bit code here, a little bit there to
>>> get a better "view" of the index.
>>>
>>> This solves it for me. It allows me to see pretty much everything in the
>>> index (except really low detail stuff like pathname compression). It's
>>> readable by human, but also easy to parse if you need to do statistics
>>> and stuff. You could even do a "diff" between two indexes.
>>>
>>> I'm not really sure if anybody else finds this useful. Because if not,
>>> I guess there's not much point trying to merge it to git.git just for a
>>> single user. Maintaining off tree is still a pain for me, but I think
>>> I can manage it.
>>
>> I think we (Microsoft/VFS for Git engineers) would use this tool, as we
>> frequently need to diagnose something that went wrong in a user's index.
>> Kevin Willford built a tool to search the index and figure out what's
>> going on, but I'm not sure it parses all of the new extensions or was
>> updated to parse the v5 index.
> 
> OK I suggest you try it out and see if it really fits your internal
> tools. I wanted to balance between manual inspection and automation so
> the output may not be the best for tools. I also try not to freeze the
> format for more wiggle room, which would be fine for one-time scripts,
> but if you want to have real tools depend on it, we may have to look
> harder at the output format and make sure it's good enough for some
> time, and have some documentation.
> 
> Also, I don't suppose it matters, but just for the record I don't care
> at all about --json performance. I suppose Jeff's json writer does not
> cache the entire json output in memory, so dumping giant index files
> is fine. But some other things, like reading the index with multiple
> threads, are also disabled.

Performance is not critical here, and in fact would become slower for
sure because of the extra parsing details. However, I think using JSON
as a translation layer will make any tools that consume the JSON be
more resilient to future index format updates. That stability is
valuable. Even though the JSON format is not guaranteed to stay the
same, it is easier to update an object model to the JSON format than
a new binary parser.

Thanks,
-Stolee
Jeff King June 19, 2019, 7:17 p.m. UTC | #4
On Wed, Jun 19, 2019 at 04:58:50PM +0700, Nguyễn Thái Ngọc Duy wrote:

> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.
> 
> This solves it for me. It allows me to see pretty much everything in the
> index (except really low detail stuff like pathname compression). It's
> readable by human, but also easy to parse if you need to do statistics
> and stuff. You could even do a "diff" between two indexes.
> 
> I'm not really sure if anybody else finds this useful. Because if not,
> I guess there's not much point trying to merge it to git.git just for a
> single user. Maintaining off tree is still a pain for me, but I think
> I can manage it.

I don't have any particular use for this, but I am all in favor of tools
that make it easier to access and analyze information kept in our
on-disk formats (some of this is available via --debug, I think, but
AFAIK most of the extension bits are not).

And I'd rather see something like JSON than inventing yet another ad-hoc
output format.

I think your warning in the manpage that this is for debugging is fine,
as it does not put us on the hook for maintaining the feature nor its
format forever. We might want to call it "--debug=json" or something,
though, in case we do want real stable json support later (though of
course we would be free to steal the option then, since we're making no
promises).

-Peff
Junio C Hamano June 20, 2019, 4 a.m. UTC | #5
Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.

;-)  

JSON is not particularly my cup-of-tea but it is better than many
other things exactly for one reason (everybody and their dog have
heard of it), and certainly is much superiour than inventing our own
ad-hoc format.  

Thanks for working on this (I do not expect I would see an immediate
need for this myself, though).
Jeff Hostetler June 20, 2019, 7:12 p.m. UTC | #6
On 6/19/2019 5:58 AM, Nguyễn Thái Ngọc Duy wrote:
> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.
> 
> This solves it for me. It allows me to see pretty much everything in the
> index (except really low detail stuff like pathname compression). It's
> readable by human, but also easy to parse if you need to do statistics
> and stuff. You could even do a "diff" between two indexes.
> 
> I'm not really sure if anybody else finds this useful. Because if not,
> I guess there's not much point trying to merge it to git.git just for a
> single user. Maintaining off tree is still a pain for me, but I think
> I can manage it.
> 
> Nguyễn Thái Ngọc Duy (8):
>    ls-files: add --json to dump the index
>    split-index.c: dump "link" extension as json
>    fsmonitor.c: dump "FSMN" extension as json
>    resolve-undo.c: dump "REUC" extension as json
>    read-cache.c: dump "EOIE" extension as json
>    read-cache.c: dump "IEOT" extension as json
>    cache-tree.c: dump "TREE" extension as json
>    dir.c: dump "UNTR" extension as json
> 
>   Documentation/git-ls-files.txt |   5 ++
>   builtin/ls-files.c             |  30 +++++--
>   cache-tree.c                   |  41 ++++++++--
>   cache-tree.h                   |   5 +-
>   cache.h                        |   2 +
>   dir.c                          |  56 ++++++++++++-
>   dir.h                          |   4 +-
>   fsmonitor.c                    |   9 +++
>   json-writer.c                  |  30 +++++++
>   json-writer.h                  |  29 +++++++
>   read-cache.c                   | 139 ++++++++++++++++++++++++++++++---
>   resolve-undo.c                 |  36 ++++++++-
>   resolve-undo.h                 |   4 +-
>   split-index.c                  |  13 ++-
>   14 files changed, 376 insertions(+), 27 deletions(-)
> 

Thanks for working on this!  I've been wanting to do something
like this for a while.  I too am tired of digging thru hex dumps
or "od" output whenever I have an odd problem to investigate.
This will certainly help.

Jeff
Duy Nguyen June 21, 2019, 8:37 a.m. UTC | #7
On Thu, Jun 20, 2019 at 2:17 AM Jeff King <peff@peff.net> wrote:
> I think your warning in the manpage that this is for debugging is fine,
> as it does not put us on the hook for maintaining the feature nor its
> format forever. We might want to call it "--debug=json" or something,

Hmm.. does it mean we make --debug PARSE_OPT_OPTARG? In other words,
"--debug" still means "text", --debug=json is obvious, but "--debug
json" means "text" debug with  pathspec "json". Which is really
horrible in my opinion.

Or is it ok to just make the argument mandatory? That would be a
behavior change, but I suppose --debug is a thing only we use and
could still be a safe thing to do...

> though, in case we do want real stable json support later (though of
> course we would be free to steal the option then, since we're making no
> promises).
>
> -Peff
Johannes Schindelin June 21, 2019, 1:16 p.m. UTC | #8
Hi Peff,

On Wed, 19 Jun 2019, Jeff King wrote:

> On Wed, Jun 19, 2019 at 04:58:50PM +0700, Nguyễn Thái Ngọc Duy wrote:
>
> > This is probably just my itch. Every time I have to do something with
> > the index, I need to add a little bit code here, a little bit there to
> > get a better "view" of the index.
> >
> > This solves it for me. It allows me to see pretty much everything in the
> > index (except really low detail stuff like pathname compression). It's
> > readable by human, but also easy to parse if you need to do statistics
> > and stuff. You could even do a "diff" between two indexes.
> >
> > I'm not really sure if anybody else finds this useful. Because if not,
> > I guess there's not much point trying to merge it to git.git just for a
> > single user. Maintaining off tree is still a pain for me, but I think
> > I can manage it.
>
> I don't have any particular use for this, but I am all in favor of tools
> that make it easier to access and analyze information kept in our
> on-disk formats (some of this is available via --debug, I think, but
> AFAIK most of the extension bits are not).
>
> And I'd rather see something like JSON than inventing yet another ad-hoc
> output format.
>
> I think your warning in the manpage that this is for debugging is fine,
> as it does not put us on the hook for maintaining the feature nor its
> format forever. We might want to call it "--debug=json" or something,
> though, in case we do want real stable json support later (though of
> course we would be free to steal the option then, since we're making no
> promises).

Traditionally, we have not catered well to 3rd-party applications in Git,
and this JSON format would provide a way out of that problem.

So I would like *not* to lock the door on letting this feature stabilize
organically.

I'd be much more in favor of `--json[=<version>]`, with an initial version
of 0 to indicate that it really is unstable for now.

Ciao,
Dscho
Duy Nguyen June 21, 2019, 1:49 p.m. UTC | #9
On Fri, Jun 21, 2019 at 8:16 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:

> > I think your warning in the manpage that this is for debugging is fine,
> > as it does not put us on the hook for maintaining the feature nor its
> > format forever. We might want to call it "--debug=json" or something,
> > though, in case we do want real stable json support later (though of
> > course we would be free to steal the option then, since we're making no
> > promises).
>
> Traditionally, we have not catered well to 3rd-party applications in Git,
> and this JSON format would provide a way out of that problem.
>
> So I would like *not* to lock the door on letting this feature stabilize
> organically.
>
> I'd be much more in favor of `--json[=<version>]`, with an initial version
> of 0 to indicate that it really is unstable for now.

Considering the amount of code to output these, supporting multiple
formats would be a nightmare. I may be ok with versioning the output
so the tool know what format they need to deal with, but I'd rather
support just one version. For third parties wanting to dig deep, I
think libgit2 would be a much better fit.
Junio C Hamano June 21, 2019, 3:10 p.m. UTC | #10
Duy Nguyen <pclouds@gmail.com> writes:

> Considering the amount of code to output these, supporting multiple
> formats would be a nightmare. I may be ok with versioning the output
> so the tool know what format they need to deal with, but I'd rather
> support just one version. For third parties wanting to dig deep, I
> think libgit2 would be a much better fit.

Yeah, I think starting with --debug=json (or --debug-json) until we
see some stability in the output and got comfortable to the idea of
"version X" to mean what we output at that point, and then renaming
it to "--json" with "version: 1" in the output stream so that third
party can use it (and interpret it according to version 1 rules) is
the way to go.  Third-party tools are welcome to read --debug-json
output as an early-adoption practice waiting for the real thing, but
we do not want to be locked into a schema too eary before we are
ready.

Thanks.
Jeff King June 21, 2019, 8:48 p.m. UTC | #11
On Fri, Jun 21, 2019 at 03:37:45PM +0700, Duy Nguyen wrote:

> On Thu, Jun 20, 2019 at 2:17 AM Jeff King <peff@peff.net> wrote:
> > I think your warning in the manpage that this is for debugging is fine,
> > as it does not put us on the hook for maintaining the feature nor its
> > format forever. We might want to call it "--debug=json" or something,
> 
> Hmm.. does it mean we make --debug PARSE_OPT_OPTARG? In other words,
> "--debug" still means "text", --debug=json is obvious, but "--debug
> json" means "text" debug with  pathspec "json". Which is really
> horrible in my opinion.

Yeah, that's the nature of OPTARG. ;)

> Or is it ok to just make the argument mandatory? That would be a
> behavior change, but I suppose --debug is a thing only we use and
> could still be a safe thing to do...

Yeah, I think that would be perfectly fine (or you could just call it
--debug-json as a new option, if you didn't want to make people do
--debug=text for the existing behavior).

-Peff
Jeff King June 21, 2019, 8:51 p.m. UTC | #12
On Fri, Jun 21, 2019 at 03:16:52PM +0200, Johannes Schindelin wrote:

> > I think your warning in the manpage that this is for debugging is fine,
> > as it does not put us on the hook for maintaining the feature nor its
> > format forever. We might want to call it "--debug=json" or something,
> > though, in case we do want real stable json support later (though of
> > course we would be free to steal the option then, since we're making no
> > promises).
> 
> Traditionally, we have not catered well to 3rd-party applications in Git,
> and this JSON format would provide a way out of that problem.
> 
> So I would like *not* to lock the door on letting this feature stabilize
> organically.

I'd like it to stabilize organically, too, but my thinking was that we'd
wait a while and then promote it to a stable name eventually.

> I'd be much more in favor of `--json[=<version>]`, with an initial version
> of 0 to indicate that it really is unstable for now.

That's OK with me, too, if you think "0" indicates that sufficiently
(we've used "v0" in a lot of other places to refer to stable protocols,
like the git:// one). Maybe it's OK with some documentation making it
clear.

I'm not sure whether we want to be locked into supporting this v0
forever or not (though maybe it would not be such a burden).

I think JSON-based output also has the potential to need fewer bumps.
It's syntactically stable, so it's really just about our schema. And
it's easy to say "newer versions of Git may produce new keys; you can
ignore them", as long as we do not change the meaning of existing keys.
That might be an easier promise to make.

-Peff
Jeff King June 21, 2019, 8:52 p.m. UTC | #13
On Fri, Jun 21, 2019 at 08:10:58AM -0700, Junio C Hamano wrote:

> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > Considering the amount of code to output these, supporting multiple
> > formats would be a nightmare. I may be ok with versioning the output
> > so the tool know what format they need to deal with, but I'd rather
> > support just one version. For third parties wanting to dig deep, I
> > think libgit2 would be a much better fit.
> 
> Yeah, I think starting with --debug=json (or --debug-json) until we
> see some stability in the output and got comfortable to the idea of
> "version X" to mean what we output at that point, and then renaming
> it to "--json" with "version: 1" in the output stream so that third
> party can use it (and interpret it according to version 1 rules) is
> the way to go.  Third-party tools are welcome to read --debug-json
> output as an early-adoption practice waiting for the real thing, but
> we do not want to be locked into a schema too eary before we are
> ready.

I should have read the whole thread before responding. I made a similar
comment to Dscho, so I guess that is now two of us. :)

-Peff
brian m. carlson June 21, 2019, 11:30 p.m. UTC | #14
On 2019-06-19 at 09:58:50, Nguyễn Thái Ngọc Duy wrote:
> This is probably just my itch. Every time I have to do something with
> the index, I need to add a little bit code here, a little bit there to
> get a better "view" of the index.
> 
> This solves it for me. It allows me to see pretty much everything in the
> index (except really low detail stuff like pathname compression). It's
> readable by human, but also easy to parse if you need to do statistics
> and stuff. You could even do a "diff" between two indexes.
> 
> I'm not really sure if anybody else finds this useful. Because if not,
> I guess there's not much point trying to merge it to git.git just for a
> single user. Maintaining off tree is still a pain for me, but I think
> I can manage it.

I'm generally in favor of this, but we need to document what this does
when it encounters paths that are not valid UTF-8. (Ideally, the answer
is, "die()", but I suspect the answer will be "silently produce invalid
output".) Those can of course occur on Unix systems, but also on
Windows, where unpaired surrogates can occur.
Duy Nguyen June 22, 2019, 2:54 a.m. UTC | #15
On Sat, Jun 22, 2019 at 6:31 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2019-06-19 at 09:58:50, Nguyễn Thái Ngọc Duy wrote:
> > This is probably just my itch. Every time I have to do something with
> > the index, I need to add a little bit code here, a little bit there to
> > get a better "view" of the index.
> >
> > This solves it for me. It allows me to see pretty much everything in the
> > index (except really low detail stuff like pathname compression). It's
> > readable by human, but also easy to parse if you need to do statistics
> > and stuff. You could even do a "diff" between two indexes.
> >
> > I'm not really sure if anybody else finds this useful. Because if not,
> > I guess there's not much point trying to merge it to git.git just for a
> > single user. Maintaining off tree is still a pain for me, but I think
> > I can manage it.
>
> I'm generally in favor of this, but we need to document what this does
> when it encounters paths that are not valid UTF-8. (Ideally, the answer
> is, "die()", but I suspect the answer will be "silently produce invalid
> output".)

I think you're right, we don't assume anything when writing json
strings, so it's not going to be utf-8 (or die) if the path is also
not valid utf-8. The good thing is all this could be done in just one
place, append_quoted_string(), if someone needs too. I'll just go
document the fact that we may produce invalid UTF-8.
Johannes Schindelin June 24, 2019, 9:33 a.m. UTC | #16
Hi Duy,

On Fri, 21 Jun 2019, Duy Nguyen wrote:

> On Fri, Jun 21, 2019 at 8:16 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > > I think your warning in the manpage that this is for debugging is fine,
> > > as it does not put us on the hook for maintaining the feature nor its
> > > format forever. We might want to call it "--debug=json" or something,
> > > though, in case we do want real stable json support later (though of
> > > course we would be free to steal the option then, since we're making no
> > > promises).
> >
> > Traditionally, we have not catered well to 3rd-party applications in Git,
> > and this JSON format would provide a way out of that problem.
> >
> > So I would like *not* to lock the door on letting this feature stabilize
> > organically.
> >
> > I'd be much more in favor of `--json[=<version>]`, with an initial version
> > of 0 to indicate that it really is unstable for now.
>
> Considering the amount of code to output these, supporting multiple
> formats would be a nightmare. I may be ok with versioning the output
> so the tool know what format they need to deal with, but I'd rather
> support just one version.

Once the format stabilized, I don't think it would be a huge burden to
support multiple formats, if we ever had to update.

It would, however, be a huge burden on third-party applications. In
effect, we could be lazy, but we would put a lot more burden on others
than we saved ourselves, so that would be a bit... selfish.

> For third parties wanting to dig deep, I think libgit2 would be a much
> better fit.

If we (i.e. the core Git contributors) were contributing new features/bug
fixes to libgit2, that would be a good recommendation.

But we don't. We essentially ignore libgit2 (and all of their learnings)
all the time.

Even worse, for years, even decades, we recommended the command-line as
"the API". If you want to reverse that recommendation, I think it merits a
bigger discussion than a flimsical comment buried in a thread about an
experimental feature.

Ciao,
Dscho
Johannes Schindelin June 24, 2019, 9:35 a.m. UTC | #17
Hi Peff & Junio,

On Fri, 21 Jun 2019, Jeff King wrote:

> On Fri, Jun 21, 2019 at 08:10:58AM -0700, Junio C Hamano wrote:
>
> > Duy Nguyen <pclouds@gmail.com> writes:
> >
> > > Considering the amount of code to output these, supporting multiple
> > > formats would be a nightmare. I may be ok with versioning the output
> > > so the tool know what format they need to deal with, but I'd rather
> > > support just one version. For third parties wanting to dig deep, I
> > > think libgit2 would be a much better fit.
> >
> > Yeah, I think starting with --debug=json (or --debug-json) until we
> > see some stability in the output and got comfortable to the idea of
> > "version X" to mean what we output at that point, and then renaming
> > it to "--json" with "version: 1" in the output stream so that third
> > party can use it (and interpret it according to version 1 rules) is
> > the way to go.  Third-party tools are welcome to read --debug-json
> > output as an early-adoption practice waiting for the real thing, but
> > we do not want to be locked into a schema too eary before we are
> > ready.
>
> I should have read the whole thread before responding. I made a similar
> comment to Dscho, so I guess that is now two of us. :)

It is a bit of a chicken-and-egg problem. You want the format to
stabilize. But you also don't want to commit to one final format. And you
choose as option name a deliberately discouraging one, deterring the
(third-party application) developers who could most help you evolve the
format to a sensible and useful stable version.

Ciao,
Dscho
Duy Nguyen June 24, 2019, 9:35 a.m. UTC | #18
On Mon, Jun 24, 2019 at 4:32 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Duy,
>
> On Fri, 21 Jun 2019, Duy Nguyen wrote:
>
> > On Fri, Jun 21, 2019 at 8:16 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> >
> > > > I think your warning in the manpage that this is for debugging is fine,
> > > > as it does not put us on the hook for maintaining the feature nor its
> > > > format forever. We might want to call it "--debug=json" or something,
> > > > though, in case we do want real stable json support later (though of
> > > > course we would be free to steal the option then, since we're making no
> > > > promises).
> > >
> > > Traditionally, we have not catered well to 3rd-party applications in Git,
> > > and this JSON format would provide a way out of that problem.
> > >
> > > So I would like *not* to lock the door on letting this feature stabilize
> > > organically.
> > >
> > > I'd be much more in favor of `--json[=<version>]`, with an initial version
> > > of 0 to indicate that it really is unstable for now.
> >
> > Considering the amount of code to output these, supporting multiple
> > formats would be a nightmare. I may be ok with versioning the output
> > so the tool know what format they need to deal with, but I'd rather
> > support just one version.
>
> Once the format stabilized, I don't think it would be a huge burden to
> support multiple formats, if we ever had to update.
>
> It would, however, be a huge burden on third-party applications. In
> effect, we could be lazy, but we would put a lot more burden on others
> than we saved ourselves, so that would be a bit... selfish.

JSON is the land of high level languages. They can adapt to new format
quite easily, compared to restructuring C to support multiple
different formats. Yes I'm quite OK with being selfish in this case.
Johannes Schindelin June 24, 2019, 9:52 a.m. UTC | #19
Hi Peff,

On Fri, 21 Jun 2019, Jeff King wrote:

> On Fri, Jun 21, 2019 at 03:16:52PM +0200, Johannes Schindelin wrote:
>
> > > I think your warning in the manpage that this is for debugging is fine,
> > > as it does not put us on the hook for maintaining the feature nor its
> > > format forever. We might want to call it "--debug=json" or something,
> > > though, in case we do want real stable json support later (though of
> > > course we would be free to steal the option then, since we're making no
> > > promises).
> >
> > Traditionally, we have not catered well to 3rd-party applications in Git,
> > and this JSON format would provide a way out of that problem.
> >
> > So I would like *not* to lock the door on letting this feature stabilize
> > organically.
>
> I'd like it to stabilize organically, too, but my thinking was that we'd
> wait a while and then promote it to a stable name eventually.

Git's command-line options have stabilized organically.

Example: to include untracked files in `git stash`, use `-u` or
`--include-untracked`, to include them in `git add`, use `-A` or `--all`,
to include them in `git grep`, use `--untracked` (no short option), to
include them in `git ls-files`, use `-o` or `--others`. The command `git
commit` does not even have an option to include untracked files.

You know of more examples of organically grown designs in Git, I am sure.
Given those examples, I am not sure that I want the JSON format to
stabilize organically.

> > I'd be much more in favor of `--json[=<version>]`, with an initial
> > version of 0 to indicate that it really is unstable for now.
>
> That's OK with me, too, if you think "0" indicates that sufficiently
> (we've used "v0" in a lot of other places to refer to stable protocols,
> like the git:// one). Maybe it's OK with some documentation making it
> clear.

I did think that the `0` would be clear, but you are probably right.

> I'm not sure whether we want to be locked into supporting this v0
> forever or not (though maybe it would not be such a burden).
>
> I think JSON-based output also has the potential to need fewer bumps.
> It's syntactically stable, so it's really just about our schema. And
> it's easy to say "newer versions of Git may produce new keys; you can
> ignore them", as long as we do not change the meaning of existing keys.
> That might be an easier promise to make.

Right.

Thanks,
Dscho