diff mbox series

format-patch: add support for mailmap file

Message ID 20240813-jk-support-mailmap-git-format-patch-v1-1-1aea690ea5dd@gmail.com (mailing list archive)
State New
Headers show
Series format-patch: add support for mailmap file | expand

Commit Message

Keller, Jacob E Aug. 13, 2024, 9:45 p.m. UTC
From: Jacob Keller <jacob.keller@gmail.com>

Git has support for a mailmap file which translates author and committer
names and email addresses to canonical values.

Git log has log.mailmap, and the associated --mailmap and --use-mailmap
options.

Teach git format-patch the format.mailmap and --mailmap options so that
formatting a patch can also reflect the canonical values from the
mailmap file.

Reported-by: Anthony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 builtin/log.c                      | 13 ++++++++++
 Documentation/git-format-patch.txt |  7 ++++++
 t/t4014-format-patch.sh            | 49 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 68 insertions(+), 1 deletion(-)


---
base-commit: 406f326d271e0bacecdb00425422c5fa3f314930
change-id: 20240813-jk-support-mailmap-git-format-patch-439073f25010

Best regards,

Comments

Josh Steadmon Aug. 13, 2024, 10:45 p.m. UTC | #1
On 2024.08.13 14:45, Jacob Keller wrote:
> From: Jacob Keller <jacob.keller@gmail.com>
> 
> Git has support for a mailmap file which translates author and committer
> names and email addresses to canonical values.
> 
> Git log has log.mailmap, and the associated --mailmap and --use-mailmap
> options.
> 
> Teach git format-patch the format.mailmap and --mailmap options so that
> formatting a patch can also reflect the canonical values from the
> mailmap file.
> 
> Reported-by: Anthony Nguyen <anthony.l.nguyen@intel.com>
> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>

I am not sure I understand the utility here; using mailmap at log time
makes sense because these are old, established commits that may have
outdated contact information. But when writing patches with
format-patch, presumably these are still somewhat WIP patches. Is it not
better to just reset the author information before running
git-format-patch in this case?

If I've misunderstood the use case, please let me know.
Keller, Jacob E Aug. 14, 2024, 12:20 a.m. UTC | #2
On 8/13/2024 3:45 PM, Josh Steadmon wrote:
> On 2024.08.13 14:45, Jacob Keller wrote:
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> Git has support for a mailmap file which translates author and committer
>> names and email addresses to canonical values.
>>
>> Git log has log.mailmap, and the associated --mailmap and --use-mailmap
>> options.
>>
>> Teach git format-patch the format.mailmap and --mailmap options so that
>> formatting a patch can also reflect the canonical values from the
>> mailmap file.
>>
>> Reported-by: Anthony Nguyen <anthony.l.nguyen@intel.com>
>> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> 
> I am not sure I understand the utility here; using mailmap at log time
> makes sense because these are old, established commits that may have
> outdated contact information. But when writing patches with
> format-patch, presumably these are still somewhat WIP patches. Is it not
> better to just reset the author information before running
> git-format-patch in this case?
> 

I suppose that this is rather an uncommon circumstance.

> If I've misunderstood the use case, please let me know.
> 

I've had a few cases where I was formatting an old commit. The example
in this case was a change made to an internal tree by one author quite
some time ago. In the meantime, that person has left the company, and
his company address is no longer valid. We still typically put the
original author on such a patch in order to give them credit even if
they're no longer on our team when sending the change, as patches are
made by people, not companies :)

If we left the address alone, it would cause a bounce on the mailing
list if it gets included in the cc. In some cases, the upstream project
mailmap already includes a mapping from their old company address to
their current public address.

The internal tree commits are already baked and can't be changed. We can
of course fix the generated patches from these commits manually. It
seemed convenient to get mailmap to do this for us.

I guess this is not very common, and may not be worth the trouble to
maintain in format-patch, I can understand that.

Another alternative we considered was something like a "--no-cc" option
or configuration to prevent including CC for a dead email address when
sending such patches. Given that the original author no longer works for
us, it is not like we expect them to actually respond or maintain the
code, and in all cases the patches have a sign-off from someone else on
the team.

Thanks,
Jake
Jeff King Aug. 14, 2024, 7:26 a.m. UTC | #3
On Tue, Aug 13, 2024 at 05:20:41PM -0700, Jacob Keller wrote:

> I've had a few cases where I was formatting an old commit. The example
> in this case was a change made to an internal tree by one author quite
> some time ago. In the meantime, that person has left the company, and
> his company address is no longer valid. We still typically put the
> original author on such a patch in order to give them credit even if
> they're no longer on our team when sending the change, as patches are
> made by people, not companies :)
> 
> If we left the address alone, it would cause a bounce on the mailing
> list if it gets included in the cc. In some cases, the upstream project
> mailmap already includes a mapping from their old company address to
> their current public address.
> 
> The internal tree commits are already baked and can't be changed. We can
> of course fix the generated patches from these commits manually. It
> seemed convenient to get mailmap to do this for us.

I think that makes sense, especially if the caller is specifically
asking to enable address mapping. I do wonder if the new format.mailmap
might be surprising for some callers, though. For example, would a
rebase using the "apply" backend quietly rewrite commit authors using
the mailmap?

-Peff
Jacob Keller Aug. 14, 2024, 5:43 p.m. UTC | #4
On Wed, Aug 14, 2024, 12:26 AM Jeff King <peff@peff.net> wrote:
> On Tue, Aug 13, 2024 at 05:20:41PM -0700, Jacob Keller wrote:
> > The internal tree commits are already baked and can't be changed. We can
> > of course fix the generated patches from these commits manually. It
> > seemed convenient to get mailmap to do this for us.
>
> I think that makes sense, especially if the caller is specifically
> asking to enable address mapping. I do wonder if the new format.mailmap
> might be surprising for some callers, though. For example, would a
> rebase using the "apply" backend quietly rewrite commit authors using
> the mailmap?
>
> -Peff

Ya, I think the config probably doesn't make sense thinking about it.

I also realized the actual problem we have is that the mail we send is
including a dead address. It would make more sense to just allow
send-email to translate from the old addresses to the canonical ones.

Specifically the issue we had is that an old commit to the internal
repo was prepped for submission by a new owner, and when he sent the
email, it automatically included the address of the original author,
which ofcourse no longer is valid. This resulted in the mail annoying
several people due to bouncing.

So the real problem to solve here is perhaps some way to filter the
mails either via translation or removal so that we don't have it
tagged to or cc a dead email. As I said above, its not like we expect
the original author to actually respond or own the change, but we did
not want to completely remove their name from the change, because they
are the original person who wrote it.

In our case, it may be more useful to have the mailmap not at
format-patch time but instead at email time...

Thanks,
Jake
Junio C Hamano Aug. 14, 2024, 7:12 p.m. UTC | #5
Jacob Keller <jacob.keller@gmail.com> writes:

> On Wed, Aug 14, 2024, 12:26 AM Jeff King <peff@peff.net> wrote:
>> On Tue, Aug 13, 2024 at 05:20:41PM -0700, Jacob Keller wrote:
>> > The internal tree commits are already baked and can't be changed. We can
>> > of course fix the generated patches from these commits manually. It
>> > seemed convenient to get mailmap to do this for us.
>>
>> I think that makes sense, especially if the caller is specifically
>> asking to enable address mapping. I do wonder if the new format.mailmap
>> might be surprising for some callers, though. For example, would a
>> rebase using the "apply" backend quietly rewrite commit authors using
>> the mailmap?
>>
>> -Peff
>
> Ya, I think the config probably doesn't make sense thinking about it.

This depends on what an average user's perception is.  People like I
would know intimately that "format-patch" is just a glorified "log"
with special features, so I wouldn't be surprised if somebody sends
in a bug report saying "git -c log.mailmap=1 format-patch" does not
honor the mailmap.  In other words, people may expect "git log -1 -p
--format=email HEAD" and "git format-patch -1 --stdout HEAD" to be
pretty similar.

> In our case, it may be more useful to have the mailmap not at
> format-patch time but instead at email time...

And probably you want not the regular mailmap but the custom one
meant to be used only by send-email to ignore the recipient.  You'd
still want to see who wrote it back when they were employed with you
in "git log" output, and not having them in your regular mailmap
would be a way to do so.  You can instead choose to use your regular
mailmap to map them to their current address, which is what we seem
to do here in this project, but it would probably be less common in
$Corp settings.  If the mailmap you give to send-email maps these
dead mailboxes to some known pattern (e.g. a fixed string in the
e-mail part, like "A U Thor <invalid@invalid>"), teaching send-email
to recognise them would be trivial.

Thanks.
Junio C Hamano Aug. 14, 2024, 9:53 p.m. UTC | #6
Jacob Keller <jacob.e.keller@intel.com> writes:

> diff --git a/builtin/log.c b/builtin/log.c
> index 4d4b60caa76a..94560add6fbc 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -975,6 +975,7 @@ struct format_config {
>  	struct log_config log;
>  	enum thread_level thread;
>  	int do_signoff;
> +	int use_mailmap;

As we share the "--[no-]mailmap" option from the command line with
"git log", shouldn't we be able to reuse log.use_mailmap_config as
well without adding yet another member to the struct?  "git log"
defaults use_mailmap_config to true, but this command would want to
default it to false to avoid disrupting existing users, or
something, perhaps?

> diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
> index 8708b3159309..f3de349990bf 100644
> --- a/Documentation/git-format-patch.txt
> +++ b/Documentation/git-format-patch.txt
> @@ -30,6 +30,7 @@ SYNOPSIS
>  		   [--range-diff=<previous> [--creation-factor=<percent>]]
>  		   [--filename-max-length=<n>]
>  		   [--progress]
> +		   [(--mailmap|--no-mailmap|--use-mailmap|--no-use-mailmap)]

We seem to say "[--[no-]cover-letter]" to abbreviate, and because
"--[no-]use-mailmap" is merely a synonym, shouldn't it be sufficient
to say

		   [--[no-]mailmap]

without any other frills?  I find the use of the (al|terna|tive)
here especially annoying, as it is not like it is an error if you
give "--mailmap" and then say "--no-mailmap" later on the same
command line---it's just the usual "last one wins".

I haven't decided what my response to Peff's concern on the fallout
to "rebase --apply".  On one hand, those who conciously choose to
rebase by creating patches and applying them would find it puzzling
if it did not honor format.mailmap setting.  But I would not be
strongly opposed if we hardcoded to pass "--no-mailmap" to the
internal invocation of "format-patch", just like we hardcode "-k"
and other options and justified it with "the use of format-patch is
a mere implementation detail".

Thanks.
diff mbox series

Patch

diff --git a/builtin/log.c b/builtin/log.c
index 4d4b60caa76a..94560add6fbc 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -975,6 +975,7 @@  struct format_config {
 	struct log_config log;
 	enum thread_level thread;
 	int do_signoff;
+	int use_mailmap;
 	enum auto_base_setting auto_base;
 	char *base_commit;
 	char *from;
@@ -1131,6 +1132,10 @@  static int git_format_config(const char *var, const char *value,
 		cfg->do_signoff = git_config_bool(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "format.mailmap")) {
+		cfg->use_mailmap = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "format.signature")) {
 		FREE_AND_NULL(cfg->signature);
 		return git_config_string(&cfg->signature, var, value);
@@ -2042,6 +2047,8 @@  int cmd_format_patch(int argc, const char **argv, const char *prefix)
 			    N_("generate a cover letter")),
 		OPT_BOOL(0, "numbered-files", &just_numbers,
 			    N_("use simple number sequence for output file names")),
+		OPT_BOOL(0, "use-mailmap", &cfg.use_mailmap, N_("use mail map file")),
+		OPT_ALIAS(0, "mailmap", "use-mailmap"),
 		OPT_STRING(0, "suffix", &fmt_patch_suffix, N_("sfx"),
 			    N_("use <sfx> instead of '.patch'")),
 		OPT_INTEGER(0, "start-number", &start_number,
@@ -2160,6 +2167,12 @@  int cmd_format_patch(int argc, const char **argv, const char *prefix)
 
 	rev.force_in_body_from = force_in_body_from;
 
+	if (cfg.use_mailmap) {
+		rev.mailmap = xmalloc(sizeof(struct string_list));
+		string_list_init_nodup(rev.mailmap);
+		read_mailmap(rev.mailmap);
+	}
+
 	if (!fmt_patch_suffix)
 		fmt_patch_suffix = cfg.fmt_patch_suffix;
 
diff --git a/Documentation/git-format-patch.txt b/Documentation/git-format-patch.txt
index 8708b3159309..f3de349990bf 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -30,6 +30,7 @@  SYNOPSIS
 		   [--range-diff=<previous> [--creation-factor=<percent>]]
 		   [--filename-max-length=<n>]
 		   [--progress]
+		   [(--mailmap|--no-mailmap|--use-mailmap|--no-use-mailmap)]
 		   [<common-diff-options>]
 		   [ <since> | <revision-range> ]
 
@@ -145,6 +146,12 @@  include::diff-options.txt[]
 	Print all commits to the standard output in mbox format,
 	instead of creating a file for each one.
 
+--[no-]mailmap::
+--[no-]use-mailmap::
+	Use mailmap file to map author and committer names and email
+	addresses to canonical real names and email addresses. See
+	linkgit:git-shortlog[1].
+
 --attach[=<boundary>]::
 	Create multipart/mixed attachment, the first part of
 	which is the commit message and the patch itself in the
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 884f83fb8a45..3a3ebddfe5c4 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1215,7 +1215,7 @@  check_author() {
 	echo content >>file &&
 	git add file &&
 	GIT_AUTHOR_NAME=$1 git commit -m author-check &&
-	git format-patch --stdout -1 >patch &&
+	git format-patch $2 --stdout -1 >patch &&
 	sed -n "/^From: /p; /^ /p; /^$/q" patch >actual &&
 	test_cmp expect actual
 }
@@ -1285,6 +1285,53 @@  test_expect_success 'format-patch wraps extremely long from-header (rfc2047)' '
 	check_author "Foö Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar"
 '
 
+cat >mail.map <<'EOF'
+Foo B. Baz <author@example.com>
+EOF
+
+cat >expect <<'EOF'
+From: "Foo B. Baz" <author@example.com>
+EOF
+test_expect_success 'format-patch format.mailmap maps properly' '
+	test_config format.mailmap true &&
+	test_config mailmap.file mail.map &&
+	check_author "Foo B. Bar"
+'
+
+cat >expect <<'EOF'
+From: "Foo B. Bar" <author@example.com>
+EOF
+test_expect_success 'format-patch --no-mailmap overrides format.mailmap' '
+	test_config format.mailmap true &&
+	test_config mailmap.file mail.map &&
+	check_author "Foo B. Bar" "--no-mailmap"
+'
+
+cat >expect <<'EOF'
+From: "Foo B. Bar" <author@example.com>
+EOF
+test_expect_success 'format-patch --no-use-mailmap overrides format.mailmap' '
+	test_config format.mailmap true &&
+	test_config mailmap.file mail.map &&
+	check_author "Foo B. Bar" "--no-use-mailmap"
+'
+
+cat >expect <<'EOF'
+From: "Foo B. Baz" <author@example.com>
+EOF
+test_expect_success 'format-patch --mailmap' '
+	test_config mailmap.file mail.map &&
+	check_author "Foo B. Bar" "--mailmap"
+'
+
+cat >expect <<'EOF'
+From: "Foo B. Baz" <author@example.com>
+EOF
+test_expect_success 'format-patch --use-mailmap' '
+	test_config mailmap.file mail.map &&
+	check_author "Foo B. Bar" "--use-mailmap"
+'
+
 cat >expect <<'EOF'
 From: Foö Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar
  Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo