diff mbox series

[v2,4/6] trailer: teach find_patch_start about --no-divider

Message ID f5f507c4c6c4514af7dca35e307ca68e72435afb.1694240177.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Commit ee8c5ee08cee9aee6732ab315e94bc88631c7431
Headers show
Series Trailer readability cleanups | expand

Commit Message

Linus Arver Sept. 9, 2023, 6:16 a.m. UTC
From: Linus Arver <linusa@google.com>

Currently, find_patch_start only finds the start of the patch part of
the input (by looking at the "---" divider) for cases where the
"--no-divider" flag has not been provided. If the user provides this
flag, we do not rely on find_patch_start at all and just call strlen()
directly on the input.

Instead, make find_patch_start aware of "--no-divider" and make it
handle that case as well. This means we no longer need to call strlen at
all and can just rely on the existing code in find_patch_start. By
forcing callers to consider this important option, we avoid the kind of
mistake described in be3d654343 (commit: pass --no-divider to
interpret-trailers, 2023-06-17).

This patch will make unit testing a bit more pleasant in this area in
the future when we adopt a unit testing framework, because we would not
have to test multiple functions to check how finding the start of a
patch part works (we would only need to test find_patch_start).

Signed-off-by: Linus Arver <linusa@google.com>
---
 trailer.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

Comments

Junio C Hamano Sept. 11, 2023, 5:25 p.m. UTC | #1
"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Linus Arver <linusa@google.com>
>
> Currently, find_patch_start only finds the start of the patch part of
> the input (by looking at the "---" divider) for cases where the
> "--no-divider" flag has not been provided. If the user provides this
> flag, we do not rely on find_patch_start at all and just call strlen()
> directly on the input.
>
> Instead, make find_patch_start aware of "--no-divider" and make it
> handle that case as well. This means we no longer need to call strlen at
> all and can just rely on the existing code in find_patch_start. By
> forcing callers to consider this important option, we avoid the kind of
> mistake described in be3d654343 (commit: pass --no-divider to
> interpret-trailers, 2023-06-17).

OK.  The code pays attention to "---" so making it stop doing so
when the "--no-*" option is given will make the function responsible
for finding the beginning of the patch.

I wonder if we should rename this function while we are at it,
though.  When "--no-divider" is given, the expected use case is
*not* to have a patch at all, and it is dubious that a function
whose name is find_patch_start() can possibly do anything useful.

The real purpose of this function is to find the end of the log
message, isn't it?  And the caller uses the end of the log message
it found and gives it to find_trailer_start() and find_trailer_end()
as the upper boundary of the search for the trailer block.
Linus Arver Sept. 14, 2023, 2:19 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Linus Arver <linusa@google.com>
>>
>> Currently, find_patch_start only finds the start of the patch part of
>> the input (by looking at the "---" divider) for cases where the
>> "--no-divider" flag has not been provided. If the user provides this
>> flag, we do not rely on find_patch_start at all and just call strlen()
>> directly on the input.
>>
>> Instead, make find_patch_start aware of "--no-divider" and make it
>> handle that case as well. This means we no longer need to call strlen at
>> all and can just rely on the existing code in find_patch_start. By
>> forcing callers to consider this important option, we avoid the kind of
>> mistake described in be3d654343 (commit: pass --no-divider to
>> interpret-trailers, 2023-06-17).
>
> OK.  The code pays attention to "---" so making it stop doing so
> when the "--no-*" option is given will make the function responsible
> for finding the beginning of the patch.
>
> I wonder if we should rename this function while we are at it,
> though.  When "--no-divider" is given, the expected use case is
> *not* to have a patch at all, and it is dubious that a function
> whose name is find_patch_start() can possibly do anything useful.

IOW, saying as the caller of this function, "find the patch start in
this input I'm giving you, but also FYI the input has no patch in it"
sounds wrong. Agreed.

> The real purpose of this function is to find the end of the log
> message, isn't it?

Indeed.

> And the caller uses the end of the log message
> it found and gives it to find_trailer_start() and find_trailer_end()
> as the upper boundary of the search for the trailer block.

Right! So a better name might be something like
"find_trailer_search_boundary" with a comment like

    Find the point at which we should stop searching for trailers (to
    parse them). This is either the end of the input string (obviously),
    or the point when we see "---" indicating the start of the "patch
    part".

Will update.
Junio C Hamano Sept. 14, 2023, 3:12 a.m. UTC | #3
Linus Arver <linusa@google.com> writes:

>> The real purpose of this function is to find the end of the log
>> message, isn't it?
>
> Indeed.
> ...
> Right! So a better name might be something like
> "find_trailer_search_boundary" with a comment like

Or "find_end_of_log_message()", which we agreed to be the real
purpose of this function ;-)
Linus Arver Sept. 14, 2023, 5:31 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Linus Arver <linusa@google.com> writes:
>
>>> The real purpose of this function is to find the end of the log
>>> message, isn't it?
>>
>> Indeed.
>> ...
>> Right! So a better name might be something like
>> "find_trailer_search_boundary" with a comment like
>
> Or "find_end_of_log_message()", which we agreed to be the real
> purpose of this function ;-)

I did this locally, but in doing so realized that we have in trailer.c

    /* Return the position of the end of the trailers. */
    static size_t find_trailer_end(const char *buf, size_t len)
    {
        return len - ignore_non_trailer(buf, len);
    }

and the ignore_non_trailer() comes from commit.c, which says

    /*
    * Inspect the given string and determine the true "end" of the log message, in
    * order to find where to put a new Signed-off-by trailer.  Ignored are
    * trailing comment lines and blank lines.  To support "git commit -s
    * --amend" on an existing commit, we also ignore "Conflicts:".  To
    * support "git commit -v", we truncate at cut lines.
    *
    * Returns the number of bytes from the tail to ignore, to be fed as
    * the second parameter to append_signoff().
    */
    size_t ignore_non_trailer(const char *buf, size_t len)
    {

...and I am not so sure the new "find_end_of_log_message" name for
find_patch_start() is desirable because of the overlap in meaning with
the comment for ignore_non_trailer(). To recap, we have in
trailer_info_get() in trailer.c which (without this patch) has

    if (opts->no_divider)
        patch_start = strlen(str);
    else
        patch_start = find_patch_start(str);

    trailer_end = find_trailer_end(str, patch_start);
    trailer_start = find_trailer_start(str, trailer_end);
                
to find the trailer_end and trailer_start positions in the input. These
positions are the boundaries for parsing for actual trailers (if any).
More precisely, the "patch_start" variable helps us _skip_ the "patch
part" of the input (if any, denoted by "---"). The call to
find_trailer_end() helps us (again) _skip_ any parts that are not part
of the actual log message (per the comment in ignore_non_trailer()). So
as far as trailer_info_get() is concerned, we are just trying to skip
over areas where we shouldn't search/parse for trailers.

The above analysis leads me to some new ideas:

(1) For "find_end_of_log_message()", I think this name should really
    belong to ignore_non_trailer() in commit.c instead of
    find_patch_start() in trailer.c. The existing comment for
    ignore_non_trailer() already says that its job is to determine the
    true end of the log message (and does a lot of the necessary work to
    do this job).

(2) find_patch_start() should be named something like
    "shrink_trailer_search_space" (although this meaning belongs equally
    well to find_trailer_end()), because the point is to reduce the
    search space for parsing trailers.

(3) "find_trailer_end" is not a great function name because on first
    glance it implies that it found the end of trailers, but this is
    only true for the "happy path" of actually finding trailers.

I need to consider all of the above ideas and reroll this patch. Because
the ideas here closely relate to the "trailer_end" and "trailer_start"
variables, I will probably reorder the series so that this patch and the
last patch are closer together.
diff mbox series

Patch

diff --git a/trailer.c b/trailer.c
index b6de5d9cb2d..f646e484a23 100644
--- a/trailer.c
+++ b/trailer.c
@@ -812,14 +812,14 @@  static ssize_t last_line(const char *buf, size_t len)
  * Return the position of the start of the patch or the length of str if there
  * is no patch in the message.
  */
-static size_t find_patch_start(const char *str)
+static size_t find_patch_start(const char *str, int no_divider)
 {
 	const char *s;
 
 	for (s = str; *s; s = next_line(s)) {
 		const char *v;
 
-		if (skip_prefix(s, "---", &v) && isspace(*v))
+		if (!no_divider && skip_prefix(s, "---", &v) && isspace(*v))
 			return s - str;
 	}
 
@@ -1109,11 +1109,7 @@  void trailer_info_get(struct trailer_info *info, const char *str,
 
 	ensure_configured();
 
-	if (opts->no_divider)
-		patch_start = strlen(str);
-	else
-		patch_start = find_patch_start(str);
-
+	patch_start = find_patch_start(str, opts->no_divider);
 	trailer_end = find_trailer_end(str, patch_start);
 	trailer_start = find_trailer_start(str, trailer_end);