diff mbox series

[v4,1/4] revision: improve commit_rewrite_person()

Message ID 20220712160634.213956-2-siddharthasthana31@gmail.com (mailing list archive)
State Superseded
Headers show
Series Add support for mailmap in cat-file | expand

Commit Message

Siddharth Asthana July 12, 2022, 4:06 p.m. UTC
The function, commit_rewrite_person(), is designed to find and replace
an ident string in the header part, and the way it avoids a random
occurrence of "author A U Thor <author@example.com" in the text is by
insisting "author" to appear at the beginning of line by passing
"\nauthor " as "what".

The implementation also doesn't make any effort to limit itself to the
commit header by locating the blank line that appears after the header
part and stopping the search there. Also, the interface forces the
caller to make multiple calls if it wants to rewrite idents on multiple
headers. It shouldn't be the case.

To support the existing caller better, update commit_rewrite_person()
to:
- Make a single pass in the input buffer to locate headers named
  "author" and "committer" and replace idents on them.
- Stop at the end of the header, ensuring that nothing in the body of
  the commit object is modified.

The return type of the function commit_rewrite_person() has also been
changed from int to void. This has been done because the caller of the
function doesn't do anything with the return value of the function.

By simplifying the interface of the commit_rewrite_person(), we also
intend to expose it as a public function. We will also be renaming the
function in a future commit to a different name which clearly tells that
the function replaces idents in the header of the commit buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
---
 revision.c | 48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

Comments

Ævar Arnfjörð Bjarmason July 13, 2022, 1:25 a.m. UTC | #1
On Tue, Jul 12 2022, Siddharth Asthana wrote:

> -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> +/*
> + * Returns the difference between the new and old length of the ident line.
> + */
> +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)

All tests pass with this as size_t, instead of size_t. Let's use that
here instead?
Christian Couder July 13, 2022, 12:18 p.m. UTC | #2
On Wed, Jul 13, 2022 at 3:25 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Jul 12 2022, Siddharth Asthana wrote:
>
> > -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
> > +/*
> > + * Returns the difference between the new and old length of the ident line.
> > + */
> > +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
>
> All tests pass with this as size_t, instead of size_t. Let's use that
> here instead?

Do you mean you would like to use size_t instead of ssize_t for the
type of the value returned by the function?

I think it can return a negative value if the new length of the ident
line is shorter than the old one though.
Junio C Hamano July 14, 2022, 9:02 p.m. UTC | #3
Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> +/*
> + * Returns the difference between the new and old length of the ident line.
> + */
> +static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)

"const char *person".  Asterisk sticks to the variable, not to the
type.

Wrap such an overly long line.

> +static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)

Likewise.

The second parameter should be called "header", not "headers",
because the way this array is used is primarily to access its
individual members, i.e. header[i] and it is more grammatical
to say that header[0] is the zero-th header, not headers[0].

> +{
> +	size_t buf_offset = 0;
> +
> +	if (!mailmap)
> +		return;
> +
> +	for (;;) {
> +		const char *person, *line;
> +		size_t i, linelen;
> +
> +		line = buf->buf + buf_offset;

buf_offset is initialized to 0.  The idea is to keep track of the
position of the current line in buf->buf as the byte offset from the
beginning.  This makes it safe to let rewrite_ident_line()
reallocate buf->buf in the loop.  So "line" points at the beginning
of the current line.

> +		linelen = strchrnul(line, '\n') - line + 1;

This "linelen" counts the LF (or NUL) termination.  Hence ...

> +		if (linelen <= 1)
> +			/* End of header */
> +			return;

... linelen==0 means we are at an empty line.

> +		buf_offset += linelen;

And by adding linelen to buf_offset, we prepare buf_offset to point
at the beginning of the next line.

BUT.

What happens when the buffer has only headers, no body, without an
empty line after the header?  buf_offset will be pointing at the
byte past the final NUL at the end of the buffer.  The next round of
the loop will point line at an invalid piece of memory.

I _think_ that the current generation of high-level tools like "git
commit" and "git tag" may leave an blank line at the end even when
they are told to create a message with no body, but this code should
not depend on that.  An object with no body and without a blank line
after the header is a valid object (cf. fsck.c::verify_headers()).

> +		for (i = 0; headers[i]; i++)
> +			if (skip_prefix(line, headers[i], &person))
> +				buf_offset += rewrite_ident_line(person, buf, mailmap);
> +	}

As the commit_headers[] array the caller gives us does not have
duplicates, as soon as we find a match, we should break out of the
loop, i.e.

		for (i = 0; header[i]; i++) {
			if (skip_prefix(line, header[i], &person))
				buf_offset += rewrite_ident_line(person, buf, maimap);
			break;
		}

> +}
> +
>  static int commit_match(struct commit *commit, struct rev_info *opt)
>  {
>  	int retval;
> @@ -3832,11 +3859,12 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
>  		strbuf_addstr(&buf, message);
>  
>  	if (opt->grep_filter.header_list && opt->mailmap) {
> +		const char *commit_headers[] = { "author ", "committer ", NULL };

It is OK to call the array "commit_headers", because the way we use
the identifier is only to refer to the collection of headers as a
whole.

>  		if (!buf.len)
>  			strbuf_addstr(&buf, message);
>  
> -		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
> -		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
> +		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
>  	}
>  
>  	/* Append "fake" message parts as needed */

Thanks.
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 211352795c..1939c56c67 100644
--- a/revision.c
+++ b/revision.c
@@ -3755,19 +3755,17 @@  int rewrite_parents(struct rev_info *revs, struct commit *commit,
 	return 0;
 }
 
-static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
+/*
+ * Returns the difference between the new and old length of the ident line.
+ */
+static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
 {
-	char *person, *endp;
+	char *endp;
 	size_t len, namelen, maillen;
 	const char *name;
 	const char *mail;
 	struct ident_split ident;
 
-	person = strstr(buf->buf, what);
-	if (!person)
-		return 0;
-
-	person += strlen(what);
 	endp = strchr(person, '\n');
 	if (!endp)
 		return 0;
@@ -3784,6 +3782,7 @@  static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 
 	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
 		struct strbuf namemail = STRBUF_INIT;
+		size_t newlen;
 
 		strbuf_addf(&namemail, "%.*s <%.*s>",
 			    (int)namelen, name, (int)maillen, mail);
@@ -3792,14 +3791,42 @@  static int commit_rewrite_person(struct strbuf *buf, const char *what, struct st
 			      ident.mail_end - ident.name_begin + 1,
 			      namemail.buf, namemail.len);
 
+		newlen = namemail.len;
+
 		strbuf_release(&namemail);
 
-		return 1;
+		return newlen - (ident.mail_end - ident.name_begin + 1);
 	}
 
 	return 0;
 }
 
+static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
+{
+	size_t buf_offset = 0;
+
+	if (!mailmap)
+		return;
+
+	for (;;) {
+		const char *person, *line;
+		size_t i, linelen;
+
+		line = buf->buf + buf_offset;
+		linelen = strchrnul(line, '\n') - line + 1;
+
+		if (linelen <= 1)
+			/* End of header */
+			return;
+
+		buf_offset += linelen;
+
+		for (i = 0; headers[i]; i++)
+			if (skip_prefix(line, headers[i], &person))
+				buf_offset += rewrite_ident_line(person, buf, mailmap);
+	}
+}
+
 static int commit_match(struct commit *commit, struct rev_info *opt)
 {
 	int retval;
@@ -3832,11 +3859,12 @@  static int commit_match(struct commit *commit, struct rev_info *opt)
 		strbuf_addstr(&buf, message);
 
 	if (opt->grep_filter.header_list && opt->mailmap) {
+		const char *commit_headers[] = { "author ", "committer ", NULL };
+
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
 
-		commit_rewrite_person(&buf, "\nauthor ", opt->mailmap);
-		commit_rewrite_person(&buf, "\ncommitter ", opt->mailmap);
+		commit_rewrite_person(&buf, commit_headers, opt->mailmap);
 	}
 
 	/* Append "fake" message parts as needed */