Message ID | 20220712160634.213956-2-siddharthasthana31@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add support for mailmap in cat-file | expand |
On Tue, Jul 12 2022, Siddharth Asthana wrote: > -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap) > +/* > + * Returns the difference between the new and old length of the ident line. > + */ > +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap) All tests pass with this as size_t, instead of size_t. Let's use that here instead?
On Wed, Jul 13, 2022 at 3:25 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > > On Tue, Jul 12 2022, Siddharth Asthana wrote: > > > -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap) > > +/* > > + * Returns the difference between the new and old length of the ident line. > > + */ > > +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap) > > All tests pass with this as size_t, instead of size_t. Let's use that > here instead? Do you mean you would like to use size_t instead of ssize_t for the type of the value returned by the function? I think it can return a negative value if the new length of the ident line is shorter than the old one though.
Siddharth Asthana <siddharthasthana31@gmail.com> writes: > +/* > + * Returns the difference between the new and old length of the ident line. > + */ > +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap) "const char *person". Asterisk sticks to the variable, not to the type. Wrap such an overly long line. > +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap) Likewise. The second parameter should be called "header", not "headers", because the way this array is used is primarily to access its individual members, i.e. header[i] and it is more grammatical to say that header[0] is the zero-th header, not headers[0]. > +{ > + size_t buf_offset = 0; > + > + if (!mailmap) > + return; > + > + for (;;) { > + const char *person, *line; > + size_t i, linelen; > + > + line = buf->buf + buf_offset; buf_offset is initialized to 0. The idea is to keep track of the position of the current line in buf->buf as the byte offset from the beginning. This makes it safe to let rewrite_ident_line() reallocate buf->buf in the loop. So "line" points at the beginning of the current line. > + linelen = strchrnul(line, '\n') - line + 1; This "linelen" counts the LF (or NUL) termination. Hence ... > + if (linelen <= 1) > + /* End of header */ > + return; ... linelen==0 means we are at an empty line. > + buf_offset += linelen; And by adding linelen to buf_offset, we prepare buf_offset to point at the beginning of the next line. BUT. What happens when the buffer has only headers, no body, without an empty line after the header? buf_offset will be pointing at the byte past the final NUL at the end of the buffer. The next round of the loop will point line at an invalid piece of memory. I _think_ that the current generation of high-level tools like "git commit" and "git tag" may leave an blank line at the end even when they are told to create a message with no body, but this code should not depend on that. An object with no body and without a blank line after the header is a valid object (cf. fsck.c::verify_headers()). > + for (i = 0; headers[i]; i++) > + if (skip_prefix(line, headers[i], &person)) > + buf_offset += rewrite_ident_line(person, buf, mailmap); > + } As the commit_headers[] array the caller gives us does not have duplicates, as soon as we find a match, we should break out of the loop, i.e. for (i = 0; header[i]; i++) { if (skip_prefix(line, header[i], &person)) buf_offset += rewrite_ident_line(person, buf, maimap); break; } > +} > + > static int commit_match(struct commit *commit, struct rev_info *opt) > { > int retval; > @@ -3832,11 +3859,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt) > strbuf_addstr(&buf, message); > > if (opt->grep_filter.header_list && opt->mailmap) { > + const char *commit_headers[] = { "author ", "committer ", NULL }; It is OK to call the array "commit_headers", because the way we use the identifier is only to refer to the collection of headers as a whole. > if (!buf.len) > strbuf_addstr(&buf, message); > > - commit_rewrite_person(&buf, "\nauthor ", opt->mailmap); > - commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap); > + commit_rewrite_person(&buf, commit_headers, opt->mailmap); > } > > /* Append "fake" message parts as needed */ Thanks.
diff --git a/revision.c b/revision.c index 211352795c..1939c56c67 100644 --- a/revision.c +++ b/revision.c @@ -3755,19 +3755,17 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit, return 0; } -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap) +/* + * Returns the difference between the new and old length of the ident line. + */ +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap) { - char *person, *endp; + char *endp; size_t len, namelen, maillen; const char *name; const char *mail; struct ident_split ident; - person = strstr(buf->buf, what); - if (!person) - return 0; - - person += strlen(what); endp = strchr(person, '\n'); if (!endp) return 0; @@ -3784,6 +3782,7 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st if (map_user(mailmap, &mail, &maillen, &name, &namelen)) { struct strbuf namemail = STRBUF_INIT; + size_t newlen; strbuf_addf(&namemail, "%.*s <%.*s>", (int)namelen, name, (int)maillen, mail); @@ -3792,14 +3791,42 @@ static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st ident.mail_end - ident.name_begin + 1, namemail.buf, namemail.len); + newlen = namemail.len; + strbuf_release(&namemail); - return 1; + return newlen - (ident.mail_end - ident.name_begin + 1); } return 0; } +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap) +{ + size_t buf_offset = 0; + + if (!mailmap) + return; + + for (;;) { + const char *person, *line; + size_t i, linelen; + + line = buf->buf + buf_offset; + linelen = strchrnul(line, '\n') - line + 1; + + if (linelen <= 1) + /* End of header */ + return; + + buf_offset += linelen; + + for (i = 0; headers[i]; i++) + if (skip_prefix(line, headers[i], &person)) + buf_offset += rewrite_ident_line(person, buf, mailmap); + } +} + static int commit_match(struct commit *commit, struct rev_info *opt) { int retval; @@ -3832,11 +3859,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt) strbuf_addstr(&buf, message); if (opt->grep_filter.header_list && opt->mailmap) { + const char *commit_headers[] = { "author ", "committer ", NULL }; + if (!buf.len) strbuf_addstr(&buf, message); - commit_rewrite_person(&buf, "\nauthor ", opt->mailmap); - commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap); + commit_rewrite_person(&buf, commit_headers, opt->mailmap); } /* Append "fake" message parts as needed */
The function, commit_rewrite_person(), is designed to find and replace an ident string in the header part, and the way it avoids a random occurrence of "author A U Thor <author@example.com" in the text is by insisting "author" to appear at the beginning of line by passing "\nauthor " as "what". The implementation also doesn't make any effort to limit itself to the commit header by locating the blank line that appears after the header part and stopping the search there. Also, the interface forces the caller to make multiple calls if it wants to rewrite idents on multiple headers. It shouldn't be the case. To support the existing caller better, update commit_rewrite_person() to: - Make a single pass in the input buffer to locate headers named "author" and "committer" and replace idents on them. - Stop at the end of the header, ensuring that nothing in the body of the commit object is modified. The return type of the function commit_rewrite_person() has also been changed from int to void. This has been done because the caller of the function doesn't do anything with the return value of the function. By simplifying the interface of the commit_rewrite_person(), we also intend to expose it as a public function. We will also be renaming the function in a future commit to a different name which clearly tells that the function replaces idents in the header of the commit buffer. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> --- revision.c | 48 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-)