mbox series

[0/5] pretty format %(trailers): improve machine readability

Message ID 20201205013918.18981-1-avarab@gmail.com (mailing list archive)
Headers show
Series pretty format %(trailers): improve machine readability | expand

Message

Ævar Arnfjörð Bjarmason Dec. 5, 2020, 1:39 a.m. UTC
I started writing this on top of "master", but then saw the
outstanding series of other miscellaneous fixes to this
facility[1]. This is on top of that topic & rebased on master.

Anders, any plans to re-roll yours? Otherwise the conflicts I'd have
on mine are easy to fix, so I can also submit it as a stand-alone.

This series comes out of a discussion at work today (well, yesterday
at this point) where someone wanted to parse %(trailers) output. As
noted in 3/5 doing this is rather tedious now if you're trying to
unambiguously grap trailers as a stream of key-value pairs.

So this series adds a "key_value_separator" and "keyonly" parameters,
and fixes a few bugs I saw along the way.

1. https://lore.kernel.org/git/20201025212652.3003036-1-anders@0x63.nu/

Ævar Arnfjörð Bjarmason (5):
  pretty format %(trailers) test: split a long line
  pretty format %(trailers): avoid needless repetition
  pretty format %(trailers): add a "keyonly"
  pretty-format %(trailers): fix broken standalone "valueonly"
  pretty format %(trailers): add a "key_value_separator"

 Documentation/pretty-formats.txt | 33 ++++++++-------
 pretty.c                         | 12 ++++++
 t/t4205-log-pretty-formats.sh    | 71 +++++++++++++++++++++++++++++++-
 trailer.c                        | 14 +++++--
 trailer.h                        |  3 ++
 5 files changed, 113 insertions(+), 20 deletions(-)

Comments

Anders Waldenborg Dec. 5, 2020, 6:18 p.m. UTC | #1
Ævar Arnfjörð Bjarmason writes:

> I started writing this on top of "master", but then saw the
> outstanding series of other miscellaneous fixes to this
> facility[1]. This is on top of that topic & rebased on master.
>
> Anders, any plans to re-roll yours? Otherwise the conflicts I'd have
> on mine are easy to fix, so I can also submit it as a stand-alone.

Yes, I have plans to do that. But have yet to carve out the required
time from my copious spare time to actually do it.

So please don't hold your breath waiting for me to do that.

> This series comes out of a discussion at work today (well, yesterday
> at this point) where someone wanted to parse %(trailers) output. As
> noted in 3/5 doing this is rather tedious now if you're trying to
> unambiguously grap trailers as a stream of key-value pairs.
>
> So this series adds a "key_value_separator" and "keyonly" parameters,
> and fixes a few bugs I saw along the way.

Interesting. When adding "valueonly" I never consider it being used
without "key". The trick you are doing with separate keyonly and
valueonly is quite clever.

I've only been doing machine parsing for explicit keys, things like:
"%cn%x00%x00%an%x00%x00%(trailers:key=Reviewed-By,valueonly,unfold,separator=%x00)%x00%x00%(trailers:key=Backport-Reviewed-By,valueonly,unfold,separator=%x00)"
(double-NUL to separate field, single-NUL to separate values within field).

But I can't help wonder that if the goal just is to have a nice machine
parsable format maybe it would be easier (both for user and
implementation) to have a separate placeholder for "machine readable
trailers" which by default emits in a format suitable for machine
parsing. Something like a new "%(ztrailers)" (but with a better name)
which simply emits a sequence of "<KEY> NUL <VAL> NUL" for each trailer.
Ævar Arnfjörð Bjarmason Dec. 7, 2020, 8:53 a.m. UTC | #2
On Sat, Dec 05 2020, Anders Waldenborg wrote:

> Ævar Arnfjörð Bjarmason writes:
>
>> I started writing this on top of "master", but then saw the
>> outstanding series of other miscellaneous fixes to this
>> facility[1]. This is on top of that topic & rebased on master.
>>
>> Anders, any plans to re-roll yours? Otherwise the conflicts I'd have
>> on mine are easy to fix, so I can also submit it as a stand-alone.
>
> Yes, I have plans to do that. But have yet to carve out the required
> time from my copious spare time to actually do it.
>
> So please don't hold your breath waiting for me to do that.

Thanks. I sent a v2 of mine yesterday as
https://lore.kernel.org/git/20201206002449.31452-1-avarab@gmail.com/

As noted there the merge conflict with yours is trivial, so hopefully it
won't cause you much hassle if you re-roll while it's outstanding.

>> This series comes out of a discussion at work today (well, yesterday
>> at this point) where someone wanted to parse %(trailers) output. As
>> noted in 3/5 doing this is rather tedious now if you're trying to
>> unambiguously grap trailers as a stream of key-value pairs.
>>
>> So this series adds a "key_value_separator" and "keyonly" parameters,
>> and fixes a few bugs I saw along the way.
>
> Interesting. When adding "valueonly" I never consider it being used
> without "key". The trick you are doing with separate keyonly and
> valueonly is quite clever.
>
> I've only been doing machine parsing for explicit keys, things like:
> "%cn%x00%x00%an%x00%x00%(trailers:key=Reviewed-By,valueonly,unfold,separator=%x00)%x00%x00%(trailers:key=Backport-Reviewed-By,valueonly,unfold,separator=%x00)"
> (double-NUL to separate field, single-NUL to separate values within field).
>
> But I can't help wonder that if the goal just is to have a nice machine
> parsable format maybe it would be easier (both for user and
> implementation) to have a separate placeholder for "machine readable
> trailers" which by default emits in a format suitable for machine
> parsing. Something like a new "%(ztrailers)" (but with a better name)
> which simply emits a sequence of "<KEY> NUL <VAL> NUL" for each trailer

I think it's a bit tricky to make something general in the middle of all
the custom format printf-likes in the pretty format. E.g. some users
might want to use \0 as a delimiter for key-values, others \0\0
etc. because they used \0, or the other way around.

Maybe if there's a reason to extend the optimization it could be smarter
about detecting that you only wanted some fixed-string separator and
nothing else custom?

B.t.w. I tried just deleting the optimization for testing and it slowed
down by around 8% on linux.git according to an extended
p4205-log-pretty-formats.sh.

Looking at the code I wonder if there aren't other lower hanging
optimizations, e.g. it seems we call find_separator() on multiple passes
instead of saving it away, e.g. in the format_trailers_from_commit()
entry point if there's any custom options such as "unfold".

I also wonder if memory allocation is a bottleneck in the "git log"
path, but didn't have time to refactor & test it. For each commit the
walking machinery eventually calls the trailer.c code, which allocates &
free()'s internal structures that could be re-used for parsing the next
commit.