mbox series

[00/10] Enrich Trailer API

Message ID pull.1632.git.1704869487.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Enrich Trailer API | expand

Message

Jean-Noël Avila via GitGitGadget Jan. 10, 2024, 6:51 a.m. UTC
This patch series is the first 10 patches of a much larger series I've been
working. The main goal of this series is to enrich the API in trailer.h. The
larger series brings a number of additional code simplifications and
cleanups (exposing and fixing some bugs along the way), and builds on top of
this series. The goal of the larger series is to make the trailer interface
ready for unit testing. By "trailer API" I mean those functions exposed in
trailer.h.

Currently, we have "process_trailers()" in trailer.h which does many
different things (parse command-line arguments, create temporary files, etc)
that are independent of the concept of "trailers", whose interface trailer.h
defines. This makes unit-testing this function difficult. While there is no
technical reason why we couldn't write unit tests for the smaller functions
that are called within process_trailers(), doing so would involve testing
private ("static" in trailer.c) functions instead of the public functions
exposed in trailer.h which authoritatively define the API.

As an alternative to this patch series, we could keep trailer.h intact and
decide to unit-test the existing "trailer_info_get()" function which does
most of the trailer parsing work. However this function wouldn't be easy to
test either, because the resulting data structure merely contains the
unparsed "trailers" lines. So the unit test (if it wants to inspect the
result of parsing these lines) would have to invoke additional parsing
functions itself. And at that point it would not longer be a unit test in
the traditional sense, because it would be invoking multiple functions at
once.

This series breaks up "process_trailers()" into smaller pieces, exposing
many of the parts relevant to trailer-related processing in trailer.h. This
forces us to start writing unit tests for these now public functions, but
that is a good thing because those same unit tests should be easy to write
(due to their small(er) sizes), but also, because those unit tests will now
ensure some degree of stability across new versions of trailer.h (we will
start noticing when the behavior of any of these API functions change).

Another benefit is that more clients (perhaps those outside of Git in the
future) will be able to use the same trailer processing algorithms used by
the interpret-trailers builtin. For example, a web server may want to parse
trailers the same way that interpret-trailers would parse them, without
having to shell out to interpret-trailers. Importing the new (richer)
trailer.h file that this series promotes would help them achieve that goal.
This use case was the original motivation behind my work in this area of
Git.

Thanks to the aggressive refactoring in this series, I've been able to
identify and fix a couple bugs in our existing implementation. Those fixes
build on top of this series but were not included here, in order to keep
this series small.

Linus Arver (10):
  trailer: move process_trailers() to interpret-trailers.c
  trailer: include "trailer" term in API functions
  trailer: unify trailer formatting machinery
  trailer: delete obsolete formatting functions
  sequencer: use the trailer iterator
  trailer: make trailer_info struct private
  trailer: spread usage of "trailer_block" language
  trailer: prepare to move parse_trailers_from_command_line_args() to
    builtin
  trailer: move arg handling to interpret-trailers.c
  trailer: delete obsolete argument handling code from API

 builtin/interpret-trailers.c | 169 ++++++++--
 builtin/shortlog.c           |   7 +-
 pretty.c                     |   2 +-
 ref-filter.c                 |   2 +-
 sequencer.c                  |  35 +--
 trailer.c                    | 579 ++++++++++++++++-------------------
 trailer.h                    | 106 ++++---
 7 files changed, 482 insertions(+), 418 deletions(-)


base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1632%2Flistx%2Ftrailer-api-refactor-part-1-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1632/listx/trailer-api-refactor-part-1-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1632

Comments

Junio C Hamano Jan. 10, 2024, 7:45 p.m. UTC | #1
"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch series is the first 10 patches of a much larger series I've been
> working. The main goal of this series is to enrich the API in trailer.h. The
> larger series brings a number of additional code simplifications and
> cleanups (exposing and fixing some bugs along the way), and builds on top of
> this series. The goal of the larger series is to make the trailer interface
> ready for unit testing. By "trailer API" I mean those functions exposed in
> trailer.h.

Are there places in the current code that deals with trailers but
does not use the trailer API (e.g., manually parse and/or insert the
trailer in an in-core buffer)?  Is it part of the larger goal to
update these places so that we will always use the trailer API to
touch trailers, and if so, have these places been identified?

Obviously the reason why I ask is that testing cannot be the goal by
itself.  The "alternative" ...

> As an alternative to this patch series, we could keep trailer.h intact and
> decide to unit-test the existing "trailer_info_get()" function which does
> most of the trailer parsing work.

... may allow you to "test", but it would make it more difficult in
the future to revamp the trailer API, if it is needed, in order to
cover code paths that ought to be using but currently bypassing the
trailer API.

> This series breaks up "process_trailers()" into smaller pieces, exposing
> many of the parts relevant to trailer-related processing in trailer.h. This
> forces us to start writing unit tests for these now public functions, but
> that is a good thing because those same unit tests should be easy to write
> (due to their small(er) sizes), but also, because those unit tests will now
> ensure some degree of stability across new versions of trailer.h (we will
> start noticing when the behavior of any of these API functions change).

And helper functions, each of which does one small thing well, may
be more applicable to other code paths that are currently bypassing
the API.

> Thanks to the aggressive refactoring in this series, I've been able to
> identify and fix a couple bugs in our existing implementation. Those fixes
> build on top of this series but were not included here, in order to keep
> this series small.

It would be nicer to have a concise list of these fixes (in the form
of "git shortlog") as a teaser here ;-).  That would hopefully
entice others into reviewing this part that forms the foundation.

Thanks.
Linus Arver Jan. 13, 2024, 1:35 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> This patch series is the first 10 patches of a much larger series I've been
>> working. The main goal of this series is to enrich the API in trailer.h. The
>> larger series brings a number of additional code simplifications and
>> cleanups (exposing and fixing some bugs along the way), and builds on top of
>> this series. The goal of the larger series is to make the trailer interface
>> ready for unit testing. By "trailer API" I mean those functions exposed in
>> trailer.h.
>
> Are there places in the current code that deals with trailers but
> does not use the trailer API (e.g., manually parse and/or insert the
> trailer in an in-core buffer)?

While working on this series I unfortunately did not search for such use
cases (I limited the scope of my work to mainly the interpret-trailers
builtin). But, a quick search just now on master branch turned up
append_signoff() in sequencer.c which constructs a Signed-off-by trailer
manually with a raw strbuf [1].

This is somewhat understandable though, because AFAICS the current
trailer API does not support creating and formatting single trailer
objects.

> Is it part of the larger goal to
> update these places so that we will always use the trailer API to
> touch trailers

That sounds like The Right Thing to do.

> , and if so, have these places been identified?

Not yet, but, it should be easy to check any remaining cases other than
the one I identified up above.

> Obviously the reason why I ask is that testing cannot be the goal by
> itself.

I now seem to recall an offline discussion where I said I wanted to
enrich the API to make other parts of Git also use this new API. Somehow
I've left that motivation out of the cover letter (will include in the
next reroll).

> The "alternative" ...
>
>> As an alternative to this patch series, we could keep trailer.h intact and
>> decide to unit-test the existing "trailer_info_get()" function which does
>> most of the trailer parsing work.
>
> ... may allow you to "test", but it would make it more difficult in
> the future to revamp the trailer API, if it is needed, in order to
> cover code paths that ought to be using but currently bypassing the
> trailer API.

Agreed. I should include this rationale as well in the next cover
letter, thanks.

>> This series breaks up "process_trailers()" into smaller pieces, exposing
>> many of the parts relevant to trailer-related processing in trailer.h. This
>> forces us to start writing unit tests for these now public functions, but
>> that is a good thing because those same unit tests should be easy to write
>> (due to their small(er) sizes), but also, because those unit tests will now
>> ensure some degree of stability across new versions of trailer.h (we will
>> start noticing when the behavior of any of these API functions change).
>
> And helper functions, each of which does one small thing well, may
> be more applicable to other code paths that are currently bypassing
> the API.

Yep.

>> Thanks to the aggressive refactoring in this series, I've been able to
>> identify and fix a couple bugs in our existing implementation. Those fixes
>> build on top of this series but were not included here, in order to keep
>> this series small.
>
> It would be nicer to have a concise list of these fixes (in the form
> of "git shortlog") as a teaser here ;-).  That would hopefully
> entice others into reviewing this part that forms the foundation.

Ah, good idea. Here's a shortlog (with a short summary of each one) of
bug fixes that are forthcoming:

    trailer: trailer replacement should not change its position
      (Summary: If we found a trailer we'd like to replace, preserve its
      position relative to the other trailers found in the trailer
      block, instead of always moving it to the beginning or end of the
      entire trailer block.)

    interpret-trailers: preserve trailers coming from the input
      (Summary: Sometimes, the parsed trailers from the input will be
      formatted differently depending on whether we provide
      --only-trailers or not.  Make the trailers that were not modified
      and are coming directly from the input get formatted the same way,
      regardless of this flag.)

    interpret-trailers: fail if given unrecognized arguments
      (Summary: E.g., for "--where", only accept recognized WHERE_* enum
      values. If we get something unrecognized, fail with an error
      instead of silently doing nothing. Ditto for "--if-exists" and
      "--if-missing".)

The last one is a different class of bug than the first two, and perhaps
less interesting.

Thanks.

[1] https://github.com/git/git/blob/d4dbce1db5cd227a57074bcfc7ec9f0655961bba/sequencer.c#L5299-L5301
Linus Arver Jan. 14, 2024, 8:05 p.m. UTC | #3
Linus Arver <linusa@google.com> writes:

>     interpret-trailers: fail if given unrecognized arguments
>       (Summary: E.g., for "--where", only accept recognized WHERE_* enum
>       values. If we get something unrecognized, fail with an error
>       instead of silently doing nothing. Ditto for "--if-exists" and
>       "--if-missing".)
>
> The last one is a different class of bug than the first two, and perhaps
> less interesting.

Actually, upon closer inspection I realize that we already fail if given
unrecognized arguments to --where, --if-exists, and --if-missing. But we
don't explain why to the user because no error message is printed. This
commit has been retitled to "interpret-trailers: print error if given
unrecognized arguments".

Thanks.
Josh Steadmon Jan. 25, 2024, 11:54 p.m. UTC | #4
On 2024.01.10 06:51, Linus Arver via GitGitGadget wrote:
> This patch series is the first 10 patches of a much larger series I've been
> working. The main goal of this series is to enrich the API in trailer.h. The
> larger series brings a number of additional code simplifications and
> cleanups (exposing and fixing some bugs along the way), and builds on top of
> this series. The goal of the larger series is to make the trailer interface
> ready for unit testing. By "trailer API" I mean those functions exposed in
> trailer.h.

I agree with Junio's points, and I added a small nitpick about patch 8's
commit message. But apart from that, this all looks good to me and I'm
interested in seeing the followup series as well. Thanks!

Reviewed-by: Josh Steadmon <steadmon@google.com>