Message ID | ce25420db29c9953095db652584dbed4e35d67ad.1697828495.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Trailer readability cleanups | expand |
"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Linus Arver <linusa@google.com> > > Previously, trailer_info_get() computed the trailer block end position > by > > (1) checking for the opts->no_divider flag and optionally calling > find_patch_start() to find the "patch start" location (patch_start), and > (2) calling find_trailer_end() to find the end of the trailer block > using patch_start as a guide, saving the return value into > "trailer_end". > > The logic in (1) was awkward because the variable "patch_start" is > misleading if there is no patch in the input. The logic in (2) was > misleading because it could be the case that no trailers are in the > input (yet we are setting a "trailer_end" variable before even searching > for trailers, which happens later in find_trailer_start()). The name > "find_trailer_end" was misleading because that function did not look for > any trailer block itself --- instead it just computed the end position > of the log message in the input where the end of the trailer block (if > it exists) would be (because trailer blocks must always come after the > end of the log message). > > Combine the logic in (1) and (2) together into find_patch_start() by > renaming it to find_end_of_log_message(). The end of the log message is > the starting point which find_trailer_start() needs to start searching > backward to parse individual trailers (if any). > > Helped-by: Jonathan Tan <jonathantanmy@google.com> > Helped-by: Junio C Hamano <gitster@pobox.com> > Signed-off-by: Linus Arver <linusa@google.com> > --- > trailer.c | 64 +++++++++++++++++++++++++++++++++++-------------------- > 1 file changed, 41 insertions(+), 23 deletions(-) > > diff --git a/trailer.c b/trailer.c > index 3c54b38a85a..70c81fda710 100644 > --- a/trailer.c > +++ b/trailer.c > @@ -809,21 +809,50 @@ static ssize_t last_line(const char *buf, size_t len) > } > > /* > - * Return the position of the start of the patch or the length of str if there > - * is no patch in the message. > + * Find the end of the log message as an offset from the start of the input > + * (where callers of this function are interested in looking for a trailers > + * block in the same input). We have to consider two categories of content that > + * can come at the end of the input which we want to ignore (because they don't > + * belong in the log message): > + * > + * (1) the "patch part" which begins with a "---" divider and has patch > + * information (like the output of git-format-patch), and > + * > + * (2) any trailing comment lines, blank lines like in the output of "git > + * commit -v", or stuff below the "cut" (scissor) line. > + * > + * As a formula, the situation looks like this: > + * > + * INPUT = LOG MESSAGE + IGNORED > + * > + * where IGNORED can be either of the two categories described above. It may be > + * that there is nothing to ignore. Now it may be the case that the LOG MESSAGE > + * contains a trailer block, but that's not the concern of this function. > */ > -static size_t find_patch_start(const char *str) > +static size_t find_end_of_log_message(const char *input, int no_divider) > { > + size_t end; > const char *s; > > - for (s = str; *s; s = next_line(s)) { > + /* Assume the naive end of the input is already what we want. */ > + end = strlen(input); > + > + if (no_divider) { > + return end; > + } OK. The early return may make sense, as we are essentially declaring that everything is the "INPUT (= message + ignored)". You do not need {braces} around a single-statement block, though. Other than that, I didn't find anything quesionable in any of the patches in this round. Looking good. Thanks.
TL;DR: I'm working on a new approach. Junio C Hamano <gitster@pobox.com> writes: > Other than that, I didn't find anything quesionable in any of the > patches in this round. Looking good. So actually, I'm now taking a much more aggressive approach to libifying the trailer subsystem. Instead of incrementally simplifying/improving things as in this series, I think I need to get to the root problem, which is that the trailer.h API isn't rich enough to make it pleasant for clients to use, including our own builtin/interpret-trailers.c client. That is, the problem we have today is that the trailer subsystem is not very ergonomic for internal use, much less external use (outside of Git itself). As an example, the current API exposes process_trailers() which does a whole bunch of things that only builtin/interpret-trailers.c cares about. Multiple other clients of trailer.h exist in our codebase (e.g., sequencer.c, pretty.c, ref-filter.c) but none of them use process_trailers(). One really useful data structure is the trailer_iterator that was introduced in f0939a0eb1 (trailer: add interface for iterating over commit trailers, 2020-09-27). The only problem is that it is not generic enough such that interpret-trailers.c can use it. My new goal is to introduce a new API in trailer.h so that interpret-trailers.c and everyone else can start using these new data structures and associated functions (while preserving the trailer_iterator interface). So the order of operations should be: (1) enrich the trailer API (make trailer.h have simpler data structures and practical functions that clients can readily use), and (2) make builtin/interpret-trailers.c, and other clients in the Git codebase use this new API. This way when the unit test framework selection process is finalized we can (3) write unit tests for the functions in the (enriched) trailer API, which is one of the major goals for my efforts around this area. The work I've started locally for (1) does not depend on this series, and I think it'll be cleaner (less churn) that way. So, feel free to drop this series in favor of the forthcoming work described in this message. Thanks.
> (1) enrich the trailer API (make trailer.h have simpler data structures > and practical functions that clients can readily use), and > (2) make builtin/interpret-trailers.c, and other clients in the Git > codebase use this new API. I've done some more thinking/hacking and I'm realizing that changing the data structures significantly as a first step will be difficult to get right without being able to unit-test things as we go. As we don't have unit tests (sorry, I keep forgetting...), I think changing the shape of data structures right now is a showstopper. Step (1) will have to be done without changing any of the existing data structures.
diff --git a/trailer.c b/trailer.c index 3c54b38a85a..70c81fda710 100644 --- a/trailer.c +++ b/trailer.c @@ -809,21 +809,50 @@ static ssize_t last_line(const char *buf, size_t len) } /* - * Return the position of the start of the patch or the length of str if there - * is no patch in the message. + * Find the end of the log message as an offset from the start of the input + * (where callers of this function are interested in looking for a trailers + * block in the same input). We have to consider two categories of content that + * can come at the end of the input which we want to ignore (because they don't + * belong in the log message): + * + * (1) the "patch part" which begins with a "---" divider and has patch + * information (like the output of git-format-patch), and + * + * (2) any trailing comment lines, blank lines like in the output of "git + * commit -v", or stuff below the "cut" (scissor) line. + * + * As a formula, the situation looks like this: + * + * INPUT = LOG MESSAGE + IGNORED + * + * where IGNORED can be either of the two categories described above. It may be + * that there is nothing to ignore. Now it may be the case that the LOG MESSAGE + * contains a trailer block, but that's not the concern of this function. */ -static size_t find_patch_start(const char *str) +static size_t find_end_of_log_message(const char *input, int no_divider) { + size_t end; const char *s; - for (s = str; *s; s = next_line(s)) { + /* Assume the naive end of the input is already what we want. */ + end = strlen(input); + + if (no_divider) { + return end; + } + + /* Optionally skip over any patch part ("---" line and below). */ + for (s = input; *s; s = next_line(s)) { const char *v; - if (skip_prefix(s, "---", &v) && isspace(*v)) - return s - str; + if (skip_prefix(s, "---", &v) && isspace(*v)) { + end = s - input; + break; + } } - return s - str; + /* Skip over other ignorable bits. */ + return end - ignored_log_message_bytes(input, end); } /* @@ -925,12 +954,6 @@ continue_outer_loop: return len; } -/* Return the position of the end of the trailers. */ -static size_t find_trailer_end(const char *buf, size_t len) -{ - return len - ignored_log_message_bytes(buf, len); -} - static int ends_with_blank_line(const char *buf, size_t len) { ssize_t ll = last_line(buf, len); @@ -1101,7 +1124,7 @@ void process_trailers(const char *file, void trailer_info_get(struct trailer_info *info, const char *str, const struct process_trailer_options *opts) { - int patch_start, trailer_end, trailer_start; + int end_of_log_message, trailer_start; struct strbuf **trailer_lines, **ptr; char **trailer_strings = NULL; size_t nr = 0, alloc = 0; @@ -1109,16 +1132,11 @@ void trailer_info_get(struct trailer_info *info, const char *str, ensure_configured(); - if (opts->no_divider) - patch_start = strlen(str); - else - patch_start = find_patch_start(str); - - trailer_end = find_trailer_end(str, patch_start); - trailer_start = find_trailer_start(str, trailer_end); + end_of_log_message = find_end_of_log_message(str, opts->no_divider); + trailer_start = find_trailer_start(str, end_of_log_message); trailer_lines = strbuf_split_buf(str + trailer_start, - trailer_end - trailer_start, + end_of_log_message - trailer_start, '\n', 0); for (ptr = trailer_lines; *ptr; ptr++) { @@ -1141,7 +1159,7 @@ void trailer_info_get(struct trailer_info *info, const char *str, info->blank_line_before_trailer = ends_with_blank_line(str, trailer_start); info->trailer_start = str + trailer_start; - info->trailer_end = str + trailer_end; + info->trailer_end = str + end_of_log_message; info->trailers = trailer_strings; info->trailer_nr = nr; }