mbox series

[0/8] fetch: introduce machine-parseable output

Message ID cover.1681906948.git.ps@pks.im (mailing list archive)
Headers show
Series fetch: introduce machine-parseable output | expand

Message

Patrick Steinhardt April 19, 2023, 12:31 p.m. UTC
Hi,

this is the second part of my quest to introduce a machine-parseable
output for git-fetch(1) after the initial refactorings that have been
merged via e9dffbc7f1 (Merge branch 'ps/fetch-ref-update-reporting',
2023-04-06).

Parsing the output of fetches is mostly impossible. It prettifies
reference names that are about to be updated, doesn't print the old and
new object IDs the refs are being updated from and to, and prints all of
that information in nice columns. In short, it is designed to be read by
humans rather than machines.

This makes it hard to use in a script way though, e.g. to learn about
which references actually have been updated or which have not been
updated. This patch series intends to fix that by introducing a new
machine-parseable interface:

```
$ git fetch --output-format=porcelain --no-progress
  fff5a5e7f528b2ed2c335991399a766c2cf01103 af67688dca57999fd848f051eeea1d375ba546b2 refs/remotes/origin/master
* 0000000000000000000000000000000000000000 e046fe5a36a970bc14fbfbcb2074a48776f6b671 refs/remotes/origin/x86-rep-insns
* 0000000000000000000000000000000000000000 bb81ed6862b864c9eb99447f04d49a84ecb647e5 refs/tags/v6.3-rc4
* 0000000000000000000000000000000000000000 83af7b1468c0dca86b4dc9e43e73bfa4f38d9637 refs/tags/v6.3-rc5
* 0000000000000000000000000000000000000000 ab3affb8ed84f68638162fe7e6fd4055e15bff5b refs/tags/v6.3-rc6
* 0000000000000000000000000000000000000000 1c8c28415e8743368a2b800520a6dd0b22ee6ec2 refs/tags/v6.3-rc7
```

The series is structured as following:

    - Patches 1 and 2 improve test coverage for output formats.

    - Patch 3 fixes a bug with the current output format.

    - Patch 4 to 6 perform some preliminary refactorings.

    - Patch 7 introduces a new `--output-format=` option for
      git-fetch(1) that allows the user to configure the output more
      directly.

    - Patch 8 introduces the new "porcelain" output format.

Patrick

Patrick Steinhardt (8):
  fetch: split out tests for output format
  fetch: add a test to exercise invalid output formats
  fetch: fix missing from-reference when fetching HEAD:foo
  fetch: introduce `display_format` enum
  fetch: move display format parsing into main function
  fetch: move option related variables into main function
  fetch: introduce new `--output-format` option
  fetch: introduce machine-parseable "porcelain" output format

 Documentation/config/fetch.txt  |   4 +-
 Documentation/fetch-options.txt |   5 +
 Documentation/git-fetch.txt     |  17 +-
 builtin/fetch.c                 | 406 +++++++++++++++++++-------------
 t/t5510-fetch.sh                |  53 -----
 t/t5574-fetch-output.sh         | 209 ++++++++++++++++
 6 files changed, 475 insertions(+), 219 deletions(-)
 create mode 100755 t/t5574-fetch-output.sh

Comments

Felipe Contreras April 24, 2023, 8:17 p.m. UTC | #1
Patrick Steinhardt wrote:
> Parsing the output of fetches is mostly impossible. It prettifies
> reference names that are about to be updated, doesn't print the old and
> new object IDs the refs are being updated from and to, and prints all of
> that information in nice columns. In short, it is designed to be read by
> humans rather than machines.
> 
> This makes it hard to use in a script way though, e.g. to learn about
> which references actually have been updated or which have not been
> updated. This patch series intends to fix that by introducing a new
> machine-parseable interface:
> 
> ```
> $ git fetch --output-format=porcelain --no-progress
>   fff5a5e7f528b2ed2c335991399a766c2cf01103 af67688dca57999fd848f051eeea1d375ba546b2 refs/remotes/origin/master
> * 0000000000000000000000000000000000000000 e046fe5a36a970bc14fbfbcb2074a48776f6b671 refs/remotes/origin/x86-rep-insns
> * 0000000000000000000000000000000000000000 bb81ed6862b864c9eb99447f04d49a84ecb647e5 refs/tags/v6.3-rc4
> * 0000000000000000000000000000000000000000 83af7b1468c0dca86b4dc9e43e73bfa4f38d9637 refs/tags/v6.3-rc5
> * 0000000000000000000000000000000000000000 ab3affb8ed84f68638162fe7e6fd4055e15bff5b refs/tags/v6.3-rc6
> * 0000000000000000000000000000000000000000 1c8c28415e8743368a2b800520a6dd0b22ee6ec2 refs/tags/v6.3-rc7
> ```

Makes sense, my only question is what other format could `git fetch` have? I
think `--format=porcelain` is clear enough.
Patrick Steinhardt April 25, 2023, 9:58 a.m. UTC | #2
On Mon, Apr 24, 2023 at 02:17:31PM -0600, Felipe Contreras wrote:
> Patrick Steinhardt wrote:
> > Parsing the output of fetches is mostly impossible. It prettifies
> > reference names that are about to be updated, doesn't print the old and
> > new object IDs the refs are being updated from and to, and prints all of
> > that information in nice columns. In short, it is designed to be read by
> > humans rather than machines.
> > 
> > This makes it hard to use in a script way though, e.g. to learn about
> > which references actually have been updated or which have not been
> > updated. This patch series intends to fix that by introducing a new
> > machine-parseable interface:
> > 
> > ```
> > $ git fetch --output-format=porcelain --no-progress
> >   fff5a5e7f528b2ed2c335991399a766c2cf01103 af67688dca57999fd848f051eeea1d375ba546b2 refs/remotes/origin/master
> > * 0000000000000000000000000000000000000000 e046fe5a36a970bc14fbfbcb2074a48776f6b671 refs/remotes/origin/x86-rep-insns
> > * 0000000000000000000000000000000000000000 bb81ed6862b864c9eb99447f04d49a84ecb647e5 refs/tags/v6.3-rc4
> > * 0000000000000000000000000000000000000000 83af7b1468c0dca86b4dc9e43e73bfa4f38d9637 refs/tags/v6.3-rc5
> > * 0000000000000000000000000000000000000000 ab3affb8ed84f68638162fe7e6fd4055e15bff5b refs/tags/v6.3-rc6
> > * 0000000000000000000000000000000000000000 1c8c28415e8743368a2b800520a6dd0b22ee6ec2 refs/tags/v6.3-rc7
> > ```
> 
> Makes sense, my only question is what other format could `git fetch` have? I
> think `--format=porcelain` is clear enough.

Yeah, I'd be perfectly happy to rename this to `--format=porcelain`.
I'll wait for the Review Club that discusses this patch set tomorrow and
will send a new version with that change afterwards if nobody disagrees.

Patrick
Glen Choo April 26, 2023, 6:54 p.m. UTC | #3
Hi Patrick!

Thanks for the pleasant read! I thought this was a great topic for
Review Club. It's too bad that we missed you, but we post all relevant
feedback here anyway.

Nevertheless, if you'd like to see the meeting notes, you can find them
at:

https://docs.google.com/document/d/14L8BAumGTpsXpjDY8VzZ4rRtpAjuGrFSRqn3stCuS_w/edit

Patrick Steinhardt <ps@pks.im> writes:

> Parsing the output of fetches is mostly impossible. It prettifies
> reference names that are about to be updated, doesn't print the old and
> new object IDs the refs are being updated from and to, and prints all of
> that information in nice columns. In short, it is designed to be read by
> humans rather than machines.
>
> This makes it hard to use in a script way though, e.g. to learn about
> which references actually have been updated or which have not been
> updated. This patch series intends to fix that by introducing a new
> machine-parseable interface:
>
> ```
> $ git fetch --output-format=porcelain --no-progress
>   fff5a5e7f528b2ed2c335991399a766c2cf01103 af67688dca57999fd848f051eeea1d375ba546b2 refs/remotes/origin/master
> * 0000000000000000000000000000000000000000 e046fe5a36a970bc14fbfbcb2074a48776f6b671 refs/remotes/origin/x86-rep-insns
> * 0000000000000000000000000000000000000000 bb81ed6862b864c9eb99447f04d49a84ecb647e5 refs/tags/v6.3-rc4
> * 0000000000000000000000000000000000000000 83af7b1468c0dca86b4dc9e43e73bfa4f38d9637 refs/tags/v6.3-rc5
> * 0000000000000000000000000000000000000000 ab3affb8ed84f68638162fe7e6fd4055e15bff5b refs/tags/v6.3-rc6
> * 0000000000000000000000000000000000000000 1c8c28415e8743368a2b800520a6dd0b22ee6ec2 refs/tags/v6.3-rc7
> ```

Having machine-parseable output seems like an obviously good goal to me.
The finer points of the interface and output format are worth
discussing. I'll do so in the patches themselves.

Overall, I found the series very well-structured and easy to follow
along. Thanks.
Jacob Keller April 26, 2023, 7:14 p.m. UTC | #4
On 4/25/2023 2:58 AM, Patrick Steinhardt wrote:
> On Mon, Apr 24, 2023 at 02:17:31PM -0600, Felipe Contreras wrote:
>> Patrick Steinhardt wrote:
>>> Parsing the output of fetches is mostly impossible. It prettifies
>>> reference names that are about to be updated, doesn't print the old and
>>> new object IDs the refs are being updated from and to, and prints all of
>>> that information in nice columns. In short, it is designed to be read by
>>> humans rather than machines.
>>>
>>> This makes it hard to use in a script way though, e.g. to learn about
>>> which references actually have been updated or which have not been
>>> updated. This patch series intends to fix that by introducing a new
>>> machine-parseable interface:
>>>
>>> ```
>>> $ git fetch --output-format=porcelain --no-progress
>>>   fff5a5e7f528b2ed2c335991399a766c2cf01103 af67688dca57999fd848f051eeea1d375ba546b2 refs/remotes/origin/master
>>> * 0000000000000000000000000000000000000000 e046fe5a36a970bc14fbfbcb2074a48776f6b671 refs/remotes/origin/x86-rep-insns
>>> * 0000000000000000000000000000000000000000 bb81ed6862b864c9eb99447f04d49a84ecb647e5 refs/tags/v6.3-rc4
>>> * 0000000000000000000000000000000000000000 83af7b1468c0dca86b4dc9e43e73bfa4f38d9637 refs/tags/v6.3-rc5
>>> * 0000000000000000000000000000000000000000 ab3affb8ed84f68638162fe7e6fd4055e15bff5b refs/tags/v6.3-rc6
>>> * 0000000000000000000000000000000000000000 1c8c28415e8743368a2b800520a6dd0b22ee6ec2 refs/tags/v6.3-rc7
>>> ```
>>
>> Makes sense, my only question is what other format could `git fetch` have? I
>> think `--format=porcelain` is clear enough.
> 
> Yeah, I'd be perfectly happy to rename this to `--format=porcelain`.
> I'll wait for the Review Club that discusses this patch set tomorrow and
> will send a new version with that change afterwards if nobody disagrees.
> 
> Patrick

We had some discussion during review club about this, where the idea of
using "--porcelain" came up because many commands use that when
switching into a machine readable format.

In addition, this format not only changes the output but also moves it
from being on stderr to stdout, which is a hint that the intended usage
of the command is now a little different.

I don't have strong opinion here but want to note that "--format" is
often used by commands like log which changes how we structure the
output of git objects.

Obviously using "--porcelain" is a bit weird when dealing with the
pre-existing compact and full outputs, and perhaps --format wouldn't be
confusing.

I didn't find any command which used --output-format today, and all the
uses of --format I saw were for object formatting like git log.

I'm ok with --format, but wanted to note the potential confusion.

git status also uses --porcelain and has a -z option for NUL terminating
instead of newline terminating. We thought -z might be useful to handle
the potential of weird refnames that include characters. No one on the
review could remember what rules were enforced on refnames to confirm if
it was legal to have '\n' in a refname or not.
Jacob Keller April 26, 2023, 7:17 p.m. UTC | #5
On 4/19/2023 5:31 AM, Patrick Steinhardt wrote:
> 
> ```
> $ git fetch --output-format=porcelain --no-progress
>   fff5a5e7f528b2ed2c335991399a766c2cf01103 af67688dca57999fd848f051eeea1d375ba546b2 refs/remotes/origin/master
> * 0000000000000000000000000000000000000000 e046fe5a36a970bc14fbfbcb2074a48776f6b671 refs/remotes/origin/x86-rep-insns
> * 0000000000000000000000000000000000000000 bb81ed6862b864c9eb99447f04d49a84ecb647e5 refs/tags/v6.3-rc4
> * 0000000000000000000000000000000000000000 83af7b1468c0dca86b4dc9e43e73bfa4f38d9637 refs/tags/v6.3-rc5
> * 0000000000000000000000000000000000000000 ab3affb8ed84f68638162fe7e6fd4055e15bff5b refs/tags/v6.3-rc6
> * 0000000000000000000000000000000000000000 1c8c28415e8743368a2b800520a6dd0b22ee6ec2 refs/tags/v6.3-rc7
> ```
> 

One thing that the standard output (maybe with --progress?) shows to
stderr is which remote is being fetched and what URL was fetched.

This seems like useful data to include in a machine readable format. It
wasn't clear if such output would still be made and whether it would go
to stderr or stdout, nor whether that existing output was machine readable.


Obviously you can somewhat infer remotes based on the refs/remotes, but
that doesn't include other refs like tags.

When fetching from multiple reports (--all or explicitly asking), I
wonder if it makes sense to include some lines to differentiate where
each block of updates for a given remote starts and ends?

Thanks,
Jake
Junio C Hamano April 26, 2023, 8:23 p.m. UTC | #6
Jacob Keller <jacob.e.keller@intel.com> writes:

>> Yeah, I'd be perfectly happy to rename this to `--format=porcelain`.
>> I'll wait for the Review Club that discusses this patch set tomorrow and
>> will send a new version with that change afterwards if nobody disagrees.
>> 
>> Patrick
>
> We had some discussion during review club about this, where the idea of
> using "--porcelain" came up because many commands use that when
> switching into a machine readable format.
>
> In addition, this format not only changes the output but also moves it
> from being on stderr to stdout, which is a hint that the intended usage
> of the command is now a little different.

A little different from what?  I do not think the answer would be
"other program's --porcelain mode", as sending them to stdout would
be one of the things that make the output easier for programs to
parse, so it does sound like very much in the same spirit as "git
status --porcelain" where its output format gets tweaked to be more
machine friendly.

The output with "--porcelain" option enabled tend to be less human
friendly and the distinction between Porcelain (for humans) and
plumbing (for scripts) is reversed in the use of the word there---it
started as "this is the option for those who write Porcelain
commands to use", but still it is not a very good name for the
option.

I am perfectly OK if the plan is to uniformly use --output-format
(or something equally more descriptive) and migrate and deprecate
the "--porcelain" option away from existing commands.

Thanks.
Junio C Hamano April 26, 2023, 8:24 p.m. UTC | #7
Jacob Keller <jacob.e.keller@intel.com> writes:

>> Yeah, I'd be perfectly happy to rename this to `--format=porcelain`.
>> I'll wait for the Review Club that discusses this patch set tomorrow and
>> will send a new version with that change afterwards if nobody disagrees.
>> 
>> Patrick
>
> We had some discussion during review club about this, where the idea of
> using "--porcelain" came up because many commands use that when
> switching into a machine readable format.
>
> In addition, this format not only changes the output but also moves it
> from being on stderr to stdout, which is a hint that the intended usage
> of the command is now a little different.

A little different from what?  I do not think the answer would be
"other program's --porcelain mode", as sending them to stdout would
be one of the things that make the output easier for programs to
parse, so it does sound like very much in the same spirit as "git
status --porcelain" where its output format gets tweaked to be more
machine friendly.

The output with "--porcelain" option enabled tend to be less human
friendly and the distinction between Porcelain (for humans) and
plumbing (for scripts) is reversed in the use of the word there---it
started as "this is the option for those who write Porcelain
commands to use", but still it is not a very good name for the
option.

I am perfectly OK if the plan is to uniformly use --output-format
(or something equally more descriptive) and migrate and deprecate
the "--porcelain" option away from existing commands.

Thanks.
Jacob Keller April 26, 2023, 8:30 p.m. UTC | #8
On 4/26/2023 1:23 PM, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
> 
>>> Yeah, I'd be perfectly happy to rename this to `--format=porcelain`.
>>> I'll wait for the Review Club that discusses this patch set tomorrow and
>>> will send a new version with that change afterwards if nobody disagrees.
>>>
>>> Patrick
>>
>> We had some discussion during review club about this, where the idea of
>> using "--porcelain" came up because many commands use that when
>> switching into a machine readable format.
>>
>> In addition, this format not only changes the output but also moves it
>> from being on stderr to stdout, which is a hint that the intended usage
>> of the command is now a little different.
> 
> A little different from what?  I do not think the answer would be
> "other program's --porcelain mode", as sending them to stdout would
> be one of the things that make the output easier for programs to
> parse, so it does sound like very much in the same spirit as "git
> status --porcelain" where its output format gets tweaked to be more
> machine friendly.

A little different from using git fetch normally where all output is
stderr and is generally "this is what I did" but which was obviously not
intended to be parsed by scripts but instead by the human who ran the
command.

> 
> The output with "--porcelain" option enabled tend to be less human
> friendly and the distinction between Porcelain (for humans) and
> plumbing (for scripts) is reversed in the use of the word there---it
> started as "this is the option for those who write Porcelain
> commands to use", but still it is not a very good name for the
> option.

Yea the option sort of means "to be used by those implementing
porcelain" and its definitely a bit confusing.

> 
> I am perfectly OK if the plan is to uniformly use --output-format
> (or something equally more descriptive) and migrate and deprecate
> the "--porcelain" option away from existing commands.
> 
> Thanks.

That makes sense to me as well.
Glen Choo April 26, 2023, 9:14 p.m. UTC | #9
Glen Choo <chooglen@google.com> writes:

> Hi Patrick!
>
> Thanks for the pleasant read! I thought this was a great topic for
> Review Club. It's too bad that we missed you, but we post all relevant
> feedback here anyway.
>
> Nevertheless, if you'd like to see the meeting notes, you can find them
> at:
>
> https://docs.google.com/document/d/14L8BAumGTpsXpjDY8VzZ4rRtpAjuGrFSRqn3stCuS_w/edit

It seems like Jacob Keller and I have mostly the same comments (since we
did both go to the same Review Club, after all :)), so feel free to
respond to only Jacob's messages and ignore mine (where the messages
overlap) - I'll catch up on those conversations.
Patrick Steinhardt April 27, 2023, 10:58 a.m. UTC | #10
On Wed, Apr 26, 2023 at 01:23:12PM -0700, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
> 
> >> Yeah, I'd be perfectly happy to rename this to `--format=porcelain`.
> >> I'll wait for the Review Club that discusses this patch set tomorrow and
> >> will send a new version with that change afterwards if nobody disagrees.
> >> 
> >> Patrick
> >
> > We had some discussion during review club about this, where the idea of
> > using "--porcelain" came up because many commands use that when
> > switching into a machine readable format.
> >
> > In addition, this format not only changes the output but also moves it
> > from being on stderr to stdout, which is a hint that the intended usage
> > of the command is now a little different.
> 
> A little different from what?  I do not think the answer would be
> "other program's --porcelain mode", as sending them to stdout would
> be one of the things that make the output easier for programs to
> parse, so it does sound like very much in the same spirit as "git
> status --porcelain" where its output format gets tweaked to be more
> machine friendly.
> 
> The output with "--porcelain" option enabled tend to be less human
> friendly and the distinction between Porcelain (for humans) and
> plumbing (for scripts) is reversed in the use of the word there---it
> started as "this is the option for those who write Porcelain
> commands to use", but still it is not a very good name for the
> option.

Ah, thanks for explaining where this reversed meaning actually comes
from. I really wondered why we use "porcelain" in preexisting code to
reflect machine-parseable interface, but that explanation does make
sense.

> I am perfectly OK if the plan is to uniformly use --output-format
> (or something equally more descriptive) and migrate and deprecate
> the "--porcelain" option away from existing commands.

I'm still not quite clear on where the consensus lies now. Personally, I
think that both `--format` and `--output-format` work well and are a bit
more descriptive than simply `--porcelain`, and wouldn't mind also
migrating other binaries to use either of them.

Furthermore, I think that `--[output-]format` has the advantage that you
don't need to handle priorities or mutual exclusion of different options
that all apply to the reference format. To a user, it is not immediately
obvious what `git fetch --format=compact --porcelain` would do, and
which of both options ultimately get respected. But that's likely only
true for future commands, because any migration would create the same
kind of ambiguity for preexisting commands.

If we were to also migrate preexisting code to use `--[output-]format`
then I'd argue that `--output-format` is likely the better name, mostly
because it is less likely to be ambiguous compared to `--format`. The
latter could e.g. easily confused with `--object-format`.

So I think I'll stick with `--output-format` for the time being.

Patrick
Jacob Keller April 27, 2023, 7:46 p.m. UTC | #11
On 4/27/2023 3:58 AM, Patrick Steinhardt wrote:
> On Wed, Apr 26, 2023 at 01:23:12PM -0700, Junio C Hamano wrote:
> Furthermore, I think that `--[output-]format` has the advantage that you
> don't need to handle priorities or mutual exclusion of different options
> that all apply to the reference format. To a user, it is not immediately
> obvious what `git fetch --format=compact --porcelain` would do, and
> which of both options ultimately get respected. But that's likely only
> true for future commands, because any migration would create the same
> kind of ambiguity for preexisting commands.
> 
> If we were to also migrate preexisting code to use `--[output-]format`
> then I'd argue that `--output-format` is likely the better name, mostly
> because it is less likely to be ambiguous compared to `--format`. The
> latter could e.g. easily confused with `--object-format`.
> 
> So I think I'll stick with `--output-format` for the time being.
> 

I agree. I think using --output-format and migrating existing commands
that have --porcelain to use --output-format is good. I'm not sure
whether to keep using --output-format=porcelain or whether to use a
different term there.

I definitely think avoiding confusion with --format is good, and I don't
think its too much of a burden to have the longer --output-format as the
option name.

Thanks,
Jake

> Patrick
Glen Choo April 27, 2023, 10:49 p.m. UTC | #12
Junio C Hamano <gitster@pobox.com> writes:

>> We had some discussion during review club about this, where the idea of
>> using "--porcelain" came up because many commands use that when
>> switching into a machine readable format.
>>
>> In addition, this format not only changes the output but also moves it
>> from being on stderr to stdout, which is a hint that the intended usage
>> of the command is now a little different.
>
> A little different from what?  I do not think the answer would be
> "other program's --porcelain mode", as sending them to stdout would
> be one of the things that make the output easier for programs to
> parse, so it does sound like very much in the same spirit as "git
> status --porcelain" where its output format gets tweaked to be more
> machine friendly.
>
> The output with "--porcelain" option enabled tend to be less human
> friendly and the distinction between Porcelain (for humans) and
> plumbing (for scripts) is reversed in the use of the word there---it
> started as "this is the option for those who write Porcelain
> commands to use", but still it is not a very good name for the
> option.
>
> I am perfectly OK if the plan is to uniformly use --output-format
> (or something equally more descriptive) and migrate and deprecate
> the "--porcelain" option away from existing commands.

I agree that --porcelain is a confusing name that would be nice to
deprecate, but I don't think --output-format captures all of the intent
of "operate in a machine-friendly mode instead of a human-friendly one".
Unfortunately, if we had picked --plumbing from the beginning, I doubt
we would be having this discussion today :/

E.g. machines (Unix ones at least?) like to have output on stdout and to
be able to request NUL-terminated output. It's unfortunate that if we
don't piggyback onto --output-format, we run into option precedence
problems (like Patrick mentioned), but I'd find it more confusing that
--output-format=[porcelain|full|compact] don't behave the same way.

I don't think this puts us in a better spot with regards to option
precedence either. Consider:

  git fetch --output-format=full -z <...>

The only way to respect both options is to have -z affect the
human-readable output, which isn't the end of the world, but it seems
unnecessary.

Perhaps something like --machine instead?