mbox series

[v2,0/4] Add support for mailmap in cat-file

Message ID 20220707161554.6900-1-siddharthasthana31@gmail.com (mailing list archive)
Headers show
Series Add support for mailmap in cat-file | expand

Message

Siddharth Asthana July 7, 2022, 4:15 p.m. UTC
Thanks a lot for the review and suggestions Phillip, Danh, Ævar and Junio.
Really grateful for that :)

= Description

This patch series adds mailmap support to the git-cat-file command. It
adds the mailmap support only for the commit and tag objects by
replacing the idents for "author", "committer" and "tagger" headers. The
mailmap only takes effect when --[no-]-use-mailmap or --[no-]-mailmap
option is passed to the git cat-file command. The changes will work with
the batch mode as well.

So, if one wants to enable mailmap they can use either of the following
commands:
$ git cat-file --use-mailmap -p <object>
$ git cat-file --use-mailmap <type> <object>

To use it in the batch mode, one can use the following command:
$ git cat-file --use-mailmap --batch

= Patch Organization

- The first patch improves the commit_rewrite_person() by restricting it 
  to traverse only through the header part of the commit object buffer.
  It also adds an argument called headers which the callers can pass. 
  The function will replace idents only on these  passed headers. 
  Thus, the caller won't have to make repeated calls to the function.
- The second patch moves commit_rewrite_person() to ident.c to expose it
  as a public function so that it can be used to replace idents in the
  headers of desired objects.
- The third patch renames commit_rewrite_person() to a name which
  describes its functionality clearly. It is renamed to
  apply_mailmap_to_header().
- The last patch adds mailmap support to the git cat-file command. It
  adds the required documentation and tests as well.

Changes in v2:
- The commit_rewrite_person() has been improved by restricting it to
  traverse only the header part of the object buffers.
- The callers of commit_rewrite_person() now don't require to call it
  multiple times for different headers. They can pass an array of
  headers and commit_rewrite_person() replaces idents only on those
  headers.
- commit_rewrite_person() has been renamed to a suitable name which
  expresses its functionality clearly.
- More tests have been added to test the --[no-]-use-mailmap option for
  the tag objects.
- Redundant operations from the tests have been removed.

Siddharth Asthana (4):
  revision: improve commit_rewrite_person()
  ident: move commit_rewrite_person() to ident.c
  ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  cat-file: add mailmap support

 Documentation/git-cat-file.txt |  6 +++
 builtin/cat-file.c             | 31 ++++++++++++++-
 cache.h                        |  6 +++
 ident.c                        | 69 ++++++++++++++++++++++++++++++++++
 revision.c                     | 49 +-----------------------
 t/t4203-mailmap.sh             | 54 ++++++++++++++++++++++++++
 6 files changed, 167 insertions(+), 48 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  64e1f750e1 revision: improve commit_rewrite_person()
1:  8749b6024f ! 2:  b18ced0ece ident: move commit_rewrite_person() to ident.c
    @@ Metadata
      ## Commit message ##
         ident: move commit_rewrite_person() to ident.c
     
    -    commit_rewrite_person() is a static function defined in revision.c.
    -    As the name suggests, this function can be used to replace author's,
    -    committer's or tagger's name in the commit/tag object buffer.
    +    commit_rewrite_person() and rewrite_ident_line() are static functions
    +    defined in revision.c.
     
    -    This patch moves this function from revision.c to ident.c which contains
    -    many other functions related to identification like split_ident_line. By
    -    moving this function to ident.c, we intend to use it in git-cat-file to
    -    replace committer's, author's and tagger's names and emails with their
    -    canonical name and email using the mailmap mechanism. The function
    -    is moved as is for now to make it clear that there are no other changes,
    -    but will be renamed in a following commit.
    +    Their usages are as follows:
    +    - commit_rewrite_person() takes a commit buffer and replaces the author
    +      and committer idents with their canonical versions using the mailmap
    +      mechanism
    +    - rewrite_ident_line() takes author/committer header lines from the
    +      commit buffer and replaces the idents with their canonical versions
    +      using the mailmap mechanism.
     
    +    This patch moves commit_rewrite_person() and rewrite_ident_line() to
    +    ident.c which contains many other functions related to idents like
    +    split_ident_line(). By moving commit_rewrite_person() to ident.c, we
    +    also intend to use it in git-cat-file to replace committer and author
    +    idents from the headers to their canonical versions using the mailmap
    +    mechanism. The function is moved as is for now to make it clear that
    +    there are no other changes, but it will be renamed in a following
    +    commit.
    +
    +    Mentored-by: Christian Couder <christian.couder@gmail.com>
    +    Mentored-by: John Cai <johncai86@gmail.com>
         Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
     
      ## cache.h ##
    @@ cache.h: struct ident_split {
      int split_ident_line(struct ident_split *, const char *, int);
      
     +/*
    -+ * Given a commit or tag object buffer, replaces the person's
    -+ * (author/committer/tagger) name and email with their canonical
    -+ * name and email using mailmap mechanism. Signals a success with
    -+ * 1 and failure with a 0.
    ++ * Given a commit object buffer and the commit headers, replaces the idents
    ++ * in the headers with their canonical versions using the mailmap mechanism.
     + */
    -+int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap);
    ++void commit_rewrite_person(struct strbuf *buf, const char **commit_headers, struct string_list *mailmap);
     +
      /*
       * Compare split idents for equality or strict ordering. Note that we
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
      	return 0;
      }
      
    -+int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
    ++/*
    ++ * Returns the difference between the new and old length of the ident line.
    ++ */
    ++static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
     +{
    -+	char *person, *endp;
    ++	char *endp;
     +	size_t len, namelen, maillen;
     +	const char *name;
     +	const char *mail;
     +	struct ident_split ident;
     +
    -+	person = strstr(buf->buf, what);
    -+	if (!person)
    -+		return 0;
    -+
    -+	person += strlen(what);
     +	endp = strchr(person, '\n');
     +	if (!endp)
     +		return 0;
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +
     +	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
     +		struct strbuf namemail = STRBUF_INIT;
    ++		size_t newlen;
     +
     +		strbuf_addf(&namemail, "%.*s <%.*s>",
     +			    (int)namelen, name, (int)maillen, mail);
    @@ ident.c: int split_ident_line(struct ident_split *split, const char *line, int l
     +			      ident.mail_end - ident.name_begin + 1,
     +			      namemail.buf, namemail.len);
     +
    ++		newlen = namemail.len;
    ++
     +		strbuf_release(&namemail);
     +
    -+		return 1;
    ++		return newlen - (ident.mail_end - ident.name_begin + 1);
     +	}
     +
     +	return 0;
    ++}
    ++
    ++void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    ++{
    ++	size_t buf_offset = 0;
    ++
    ++	if (!mailmap)
    ++		return;
    ++
    ++	for (;;) {
    ++		const char *person, *line = buf->buf + buf_offset;
    ++		int i, linelen = strchrnul(line, '\n') - line + 1;
    ++
    ++		if (!linelen || linelen == 1)
    ++			/* End of header */
    ++			return;
    ++
    ++		buf_offset += linelen;
    ++
    ++		for (i = 0; headers[i]; i++)
    ++			if (skip_prefix(line, headers[i], &person))
    ++				buf_offset += rewrite_ident_line(person, buf, mailmap);
    ++	}
     +}
      
      static void ident_env_hint(enum want_ident whose_ident)
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
      	return 0;
      }
      
    --static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap)
    +-/*
    +- * Returns the difference between the new and old length of the ident line.
    +- */
    +-static ssize_t rewrite_ident_line(const char* person, struct strbuf *buf, struct string_list *mailmap)
     -{
    --	char *person, *endp;
    +-	char *endp;
     -	size_t len, namelen, maillen;
     -	const char *name;
     -	const char *mail;
     -	struct ident_split ident;
     -
    --	person = strstr(buf->buf, what);
    --	if (!person)
    --		return 0;
    --
    --	person += strlen(what);
     -	endp = strchr(person, '\n');
     -	if (!endp)
     -		return 0;
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -
     -	if (map_user(mailmap, &mail, &maillen, &name, &namelen)) {
     -		struct strbuf namemail = STRBUF_INIT;
    +-		size_t newlen;
     -
     -		strbuf_addf(&namemail, "%.*s <%.*s>",
     -			    (int)namelen, name, (int)maillen, mail);
    @@ revision.c: int rewrite_parents(struct rev_info *revs, struct commit *commit,
     -			      ident.mail_end - ident.name_begin + 1,
     -			      namemail.buf, namemail.len);
     -
    +-		newlen = namemail.len;
    +-
     -		strbuf_release(&namemail);
     -
    --		return 1;
    +-		return newlen - (ident.mail_end - ident.name_begin + 1);
     -	}
     -
     -	return 0;
     -}
    +-
    +-static void commit_rewrite_person(struct strbuf *buf, const char **headers, struct string_list *mailmap)
    +-{
    +-	size_t buf_offset = 0;
    +-
    +-	if (!mailmap)
    +-		return;
    +-
    +-	for (;;) {
    +-		const char *person, *line = buf->buf + buf_offset;
    +-		int i, linelen = strchrnul(line, '\n') - line + 1;
    +-
    +-		if (!linelen || linelen == 1)
    +-			/* End of header */
    +-			return;
    +-
    +-		buf_offset += linelen;
    +-
    +-		for (i = 0; headers[i]; i++)
    +-			if (skip_prefix(line, headers[i], &person))
    +-				buf_offset += rewrite_ident_line(person, buf, mailmap);
    +-	}
    +-}
     -
      static int commit_match(struct commit *commit, struct rev_info *opt)
      {
2:  aff60f541b < -:  ---------- ident: rename commit_rewrite_person() to rewrite_ident_line()
-:  ---------- > 3:  2494ce1ed2 ident: rename commit_rewrite_person() to apply_mailmap_to_header()
3:  2a697167db ! 4:  94838a2566 cat-file: add mailmap support
    @@ Metadata
      ## Commit message ##
         cat-file: add mailmap support
     
    -    git cat-file is not a plumbing command anymore, especially as it gained
    -    more and more high level features like its `--batch-command` mode. So
    -    tools do use it to get commit and tag contents that are then displayed
    -    to users. This content which has author, committer or tagger
    -    information, could benefit from passing through the mailmap mechanism,
    -    before being sent or displayed.
    +    git-cat-file is used by tools like GitLab to get commit tag contents
    +    that are then displayed to users. This content which has author,
    +    committer or tagger information, could benefit from passing through the
    +    mailmap mechanism before being sent or displayed.
     
         This patch adds --[no-]use-mailmap command line option to the git
         cat-file command. It also adds --[no-]mailmap option as an alias to
         --[no-]use-mailmap.
     
    -    At this time, this patch only adds a command line
    -    option, but perhaps a `cat-file.mailmap` config option could be added as
    -    well in the same way as for `git log`.
    -
    +    Mentored-by: Christian Couder <christian.couder@gmail.com>
    +    Mentored-by: John Cai <johncai86@gmail.com>
    +    Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
         Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
     
      ## Documentation/git-cat-file.txt ##
    @@ Documentation/git-cat-file.txt: OPTIONS
      
     +--[no-]mailmap::
     +--[no-]use-mailmap::
    -+	Use mailmap file to map author, committer and tagger names
    -+	and email addresses to canonical real names and email addresses.
    -+	See linkgit:git-shortlog[1].
    ++       Use mailmap file to map author, committer and tagger names
    ++       and email addresses to canonical real names and email addresses.
    ++       See linkgit:git-shortlog[1].
     +
      --textconv::
      	Show the content as transformed by a textconv filter. In this case,
    @@ builtin/cat-file.c: struct batch_options {
     +{
     +	struct strbuf sb = STRBUF_INIT;
     +	strbuf_attach(&sb, object_buf, *size, *size + 1);
    -+	rewrite_ident_line(&sb, "\nauthor ", &mailmap);
    -+	rewrite_ident_line(&sb, "\ncommitter ", &mailmap);
    -+	rewrite_ident_line(&sb, "\ntagger ", &mailmap);
    ++	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
    ++	apply_mailmap_to_header(&sb, headers, &mailmap);
     +	*size = sb.len;
     +	return strbuf_detach(&sb, NULL);
     +}
    @@ t/t4203-mailmap.sh: test_expect_success SYMLINKS 'symlinks not respected in-tree
     +	author Orig <orig@example.com>
     +	EOF
     +	git cat-file --no-use-mailmap commit HEAD >log &&
    -+	grep author log >actual &&
    -+	sed -e "/^author/q" actual >log &&
    -+	sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&
    ++	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
     +	test_cmp expect actual
     +'
     +
    @@ t/t4203-mailmap.sh: test_expect_success SYMLINKS 'symlinks not respected in-tree
     +	author A U Thor <author@example.com>
     +	EOF
     +	git cat-file --use-mailmap commit HEAD >log &&
    -+	grep author log >actual &&
    -+	sed -e "/^author/q" actual >log &&
    -+	sed -e "s/ [0-9][0-9]* [-+][0-9][0-9][0-9][0-9]$//" log >actual &&
    ++	sed -n "/^author /s/\([^>]*>\).*/\1/p" log >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success '--no-mailmap disables mailmap in cat-file for annotated tag objects' '
    ++	test_when_finished "rm .mailmap" &&
    ++	cat >.mailmap <<-EOF &&
    ++	Orig <orig@example.com> C O Mitter <committer@example.com>
    ++	EOF
    ++	cat >expect <<-EOF &&
    ++	tagger C O Mitter <committer@example.com>
    ++	EOF
    ++	git tag -a -m "annotated tag" v1 &&
    ++	git cat-file --no-mailmap -p v1 >log &&
    ++	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success '--mailmap enables mailmap in cat-file for annotated tag objects' '
    ++	test_when_finished "rm .mailmap" &&
    ++	cat >.mailmap <<-EOF &&
    ++	Orig <orig@example.com> C O Mitter <committer@example.com>
    ++	EOF
    ++	cat >expect <<-EOF &&
    ++	tagger Orig <orig@example.com>
    ++	EOF
    ++	git tag -a -m "annotated tag" v2 &&
    ++	git cat-file --mailmap -p v2 >log &&
    ++	sed -n "/^tagger /s/\([^>]*>\).*/\1/p" log >actual &&
     +	test_cmp expect actual
     +'
     +

Comments

Junio C Hamano July 7, 2022, 10:06 p.m. UTC | #1
Siddharth Asthana <siddharthasthana31@gmail.com> writes:

> Changes in v2:
> - The commit_rewrite_person() has been improved by restricting it to
>   traverse only the header part of the object buffers.
> - The callers of commit_rewrite_person() now don't require to call it
>   multiple times for different headers. They can pass an array of
>   headers and commit_rewrite_person() replaces idents only on those
>   headers.
> - commit_rewrite_person() has been renamed to a suitable name which
>   expresses its functionality clearly.
> - More tests have been added to test the --[no-]-use-mailmap option for
>   the tag objects.
> - Redundant operations from the tests have been removed.

I agree with the general direction and the implementation strategy.
I've noticed a few decl-after-statement and also at least one public
helper function that does not need to be public.  Are you building
with "make DEVELOPER=YesPlease"?  It enables -pedantic and -Werror,
-Wdeclaration-after-statement, among other options (see the
config.mak.dev file for the complete list) to help you catch these
locally before sendign your patches to the list.

Thanks.
Junio C Hamano July 7, 2022, 10:58 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Siddharth Asthana <siddharthasthana31@gmail.com> writes:
>
>> Changes in v2:
>> - The commit_rewrite_person() has been improved by restricting it to
>>   traverse only the header part of the object buffers.
>> - The callers of commit_rewrite_person() now don't require to call it
>>   multiple times for different headers. They can pass an array of
>>   headers and commit_rewrite_person() replaces idents only on those
>>   headers.
>> - commit_rewrite_person() has been renamed to a suitable name which
>>   expresses its functionality clearly.
>> - More tests have been added to test the --[no-]-use-mailmap option for
>>   the tag objects.
>> - Redundant operations from the tests have been removed.
>
> I agree with the general direction and the implementation strategy.
> I've noticed a few decl-after-statement and also at least one public
> helper function that does not need to be public.  Are you building
> with "make DEVELOPER=YesPlease"?  It enables -pedantic and -Werror,
> -Wdeclaration-after-statement, among other options (see the
> config.mak.dev file for the complete list) to help you catch these
> locally before sendign your patches to the list.

Here is what I prepared on top of your series to make them compile
while queuing them on a topic branch.

 builtin/cat-file.c | 7 +++++--
 revision.c         | 4 ++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 6dc750a367..4ca024a018 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -40,11 +40,14 @@ static const char *force_path;
 static struct string_list mailmap = STRING_LIST_INIT_NODUP;
 static int use_mailmap;
 
-char *replace_idents_using_mailmap(char *object_buf, size_t *size)
+static char *replace_idents_using_mailmap(char *, size_t *);
+
+static char *replace_idents_using_mailmap(char *object_buf, size_t *size)
 {
 	struct strbuf sb = STRBUF_INIT;
-	strbuf_attach(&sb, object_buf, *size, *size + 1);
 	const char *headers[] = { "author ", "committer ", "tagger ", NULL };
+
+	strbuf_attach(&sb, object_buf, *size, *size + 1);
 	apply_mailmap_to_header(&sb, headers, &mailmap);
 	*size = sb.len;
 	return strbuf_detach(&sb, NULL);
diff --git a/revision.c b/revision.c
index b561d6b5b5..767c6225df 100644
--- a/revision.c
+++ b/revision.c
@@ -3787,10 +3787,10 @@ static int commit_match(struct commit *commit, struct rev_info *opt)
 		strbuf_addstr(&buf, message);
 
 	if (opt->grep_filter.header_list && opt->mailmap) {
+		const char *commit_headers[] = { "author ", "committer ", NULL };
+
 		if (!buf.len)
 			strbuf_addstr(&buf, message);
-
-		const char *commit_headers[] = { "author ", "committer ", NULL };
 		apply_mailmap_to_header(&buf, commit_headers, opt->mailmap);
 	}