diff mbox series

[v2] gitweb: map names/emails with mailmap

Message ID 20200809230436.2152-1-me@pluvano.com (mailing list archive)
State New, archived
Headers show
Series [v2] gitweb: map names/emails with mailmap | expand

Commit Message

Emma Brooks Aug. 9, 2020, 11:04 p.m. UTC
Add an option to map names and emails to their canonical forms via a
.mailmap file. This is enabled by default, consistent with the behavior
of Git itself.

Signed-off-by: Emma Brooks <me@pluvano.com>
---

No code changes. I just fixed a typo in the commit subject (made "map"
lower-case).

 Documentation/gitweb.conf.txt |  5 +++
 gitweb/gitweb.perl            | 81 +++++++++++++++++++++++++++++++++--
 2 files changed, 82 insertions(+), 4 deletions(-)

Comments

Eric Sunshine Aug. 10, 2020, 12:49 a.m. UTC | #1
On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> Add an option to map names and emails to their canonical forms via a
> .mailmap file. This is enabled by default, consistent with the behavior
> of Git itself.
>
> Signed-off-by: Emma Brooks <me@pluvano.com>
> ---
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
> +mailmap::
> +       Use mailmap to find the canonical name/email for
> +       committers/authors (see linkgit:git-shortlog[1]). Enabled by
> +       default.

Is this setting global or per-repository? (I ask because documentation
for other options in this section document whether they can be set
per-repository.)

Should there be any sort of support for functionality similar to the
"mailmap.file" and "mailmap.blob" configuration options in Git itself?
(Genuine question, not a demand for you to implement such support.)

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> +# Contents of mailmap stored as a referance to a hash with keys in the format

s/referance/reference/

> +# of "name <email>" or "<email>", and values that are hashes containing a
> +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> +# already been read.
> +my $mailmap;
> +
> +sub read_mailmap {
> +       my %mailmap = ();
> +       open my $fd, '-|', quote_command(
> +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> +               or die_error(500, 'Failed to read mailmap');

Am I reading this correctly that this will die if the project does not
have a .mailmap file? If so, that seems like harsh behavior since
there are many projects in the wild lacking a .mailmap file.

> +       return \%mailmap if eof $fd;
> +       foreach (split '\n', <$fd>) {

If the .mailmap has no content, then the 'foreach' loop won't be
entered, which means the early 'return' above it is unneeded, correct?
(Not necessarily asking for the early 'return' to be removed, but more
a case of checking that I'm understanding the logic.)

> +               next if (/^#/);
> +               if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> +                   /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> +                       # New Name <new@email> <old@email>
> +                       # New Name <new@email> Old Name <old@email>

The first regex is intended to handle a trailing "# comment", whereas
the second regex is for lines lacking a comment, correct? However,
because neither of these expressions are anchored, the second regex
will match both types of lines, thus the first regex is redundant. I'm
guessing, therefore, that your intent was actually to anchor the
expressions, perhaps like this:

    if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
        /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

Also, if you're matching lines of the form:

    name1 <email1> [optional-name] <email2>

in which you expect to see "name1", then is the loose "(.*)\s+"
desirable? Shouldn't it be tighter "(.+)\s+"? For instance:

    if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
        /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

> +                       $mailmap{$3} = ();

I wonder if you should be doing some sort of whitespace normalization
on $3 before using it as a hash key. For instance, if someone has a
.mailmap that looks like this (where I've used "." to represent
space):

    name1.<email1>.name2...<email2>

then $3 will have three spaces between 'name2' and '<email2>' when
used as a key, and that won't match later when you construct a "name
<email>" key later in map_author() with a single space.

> +                       ($mailmap{$3}{name} = $1) =~ s/^\s+|\s+$//g;
> +                       $mailmap{$3}{email} = $2;
> +               } elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
> +                        /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {

Same comment as above about anchoring these patterns...

> +                       # New Name <old@email>
> +                       # <new@email> <old@email>
> +                       $mailmap{$3} = ();
> +                       if ($1) {
> +                               $mailmap{$3}{email} = $1;
> +                       } else {
> +                               ($mailmap{$3}{name} = $2) =~ s/^\s+|\s+$//g;
> +                       }
> +               }
> +       }
> +       return \%mailmap;
> +}
Emma Brooks Aug. 10, 2020, 3:12 a.m. UTC | #2
On 2020-08-09 20:49:59-0400, Eric Sunshine wrote:
> On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> > Add an option to map names and emails to their canonical forms via a
> > .mailmap file. This is enabled by default, consistent with the behavior
> > of Git itself.
> >
> > Signed-off-by: Emma Brooks <me@pluvano.com>
> > ---
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra
> > +mailmap::
> > +       Use mailmap to find the canonical name/email for
> > +       committers/authors (see linkgit:git-shortlog[1]). Enabled by
> > +       default.
> 
> Is this setting global or per-repository? (I ask because documentation
> for other options in this section document whether they can be set
> per-repository.)

Global. I'll add a note that it cannot be set per-project, or I could
add support for setting it per-project if that's wanted.

> Should there be any sort of support for functionality similar to the
> "mailmap.file" and "mailmap.blob" configuration options in Git itself?
> (Genuine question, not a demand for you to implement such support.)

Yes, that would be useful and should probably be supported.

> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > +# Contents of mailmap stored as a referance to a hash with keys in the format
> 
> s/referance/reference/

OK.

> > +# of "name <email>" or "<email>", and values that are hashes containing a
> > +# replacement "name" and/or "email". If set (even if empty) the mailmap has
> > +# already been read.
> > +my $mailmap;
> > +
> > +sub read_mailmap {
> > +       my %mailmap = ();
> > +       open my $fd, '-|', quote_command(
> > +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> > +               or die_error(500, 'Failed to read mailmap');
> 
> Am I reading this correctly that this will die if the project does not
> have a .mailmap file? If so, that seems like harsh behavior since
> there are many projects in the wild lacking a .mailmap file.

No, this error message is misleading. The die_error is called if there
is a problem executing git cat-file, but not if cat-file returns an
error. I'll revise this message to be more accurate.

> > +       return \%mailmap if eof $fd;
> > +       foreach (split '\n', <$fd>) {
> 
> If the .mailmap has no content, then the 'foreach' loop won't be
> entered, which means the early 'return' above it is unneeded, correct?
> (Not necessarily asking for the early 'return' to be removed, but more
> a case of checking that I'm understanding the logic.)

The early return is intended to catch when there is no mailmap, so $fd
does not get initialized. Without it, you would get an error when you
try to split $fd's content:

    Use of uninitialized value $fd in split at [the foreach]

> > +               next if (/^#/);
> > +               if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
> > +                   /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
> > +                       # New Name <new@email> <old@email>
> > +                       # New Name <new@email> Old Name <old@email>
> 
> The first regex is intended to handle a trailing "# comment", whereas
> the second regex is for lines lacking a comment, correct? However,
> because neither of these expressions are anchored, the second regex
> will match both types of lines, thus the first regex is redundant. I'm
> guessing, therefore, that your intent was actually to anchor the
> expressions, perhaps like this:
> 
>     if (/^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
>         /^\s* (.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {
> 
> Also, if you're matching lines of the form:
> 
>     name1 <email1> [optional-name] <email2>
> 
> in which you expect to see "name1", then is the loose "(.*)\s+"
> desirable? Shouldn't it be tighter "(.+)\s+"? For instance:
> 
>     if (/^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
>         /^\s* (.+)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) \s*$/x) {

Yes and yes. I'll update those.

> > +                       $mailmap{$3} = ();
> 
> I wonder if you should be doing some sort of whitespace normalization
> on $3 before using it as a hash key. For instance, if someone has a
> .mailmap that looks like this (where I've used "." to represent
> space):
> 
>     name1.<email1>.name2...<email2>
> 
> then $3 will have three spaces between 'name2' and '<email2>' when
> used as a key, and that won't match later when you construct a "name
> <email>" key later in map_author() with a single space.

Yes, I hadn't considered that case.

Thanks.
Eric Sunshine Aug. 10, 2020, 5:41 a.m. UTC | #3
On Sun, Aug 9, 2020 at 11:12 PM Emma Brooks <me@pluvano.com> wrote:
> On 2020-08-09 20:49:59-0400, Eric Sunshine wrote:
> > On Sun, Aug 9, 2020 at 7:06 PM Emma Brooks <me@pluvano.com> wrote:
> > > +mailmap::
> >
> > Is this setting global or per-repository? (I ask because documentation
> > for other options in this section document whether they can be set
> > per-repository.)
>
> Global. I'll add a note that it cannot be set per-project, or I could
> add support for setting it per-project if that's wanted.

If it's not much extra work, it might make sense to support
per-project, if for no other reason, to be consistent with other
nearby options.

> > Should there be any sort of support for functionality similar to the
> > "mailmap.file" and "mailmap.blob" configuration options in Git itself?
> > (Genuine question, not a demand for you to implement such support.)
>
> Yes, that would be useful and should probably be supported.

I don't insist upon it. It can always be added later if someone needs
it. I was asking about it now because it might have an affect on the
design or type of value which the 'mailmap' option you added above can
accept, and I was concerned about getting locked into a design without
taking these other possibilities into account. For instance, rather
than being a simple boolean, perhaps the 'mailmap' option you added
could be more expressive, eventually allowing support for an explicit
file or blob. This is another reason why I asked if 'mailmap' can be
per-project, since an explicit mailmap file, and especially a blob,
would belong to a particular project. It's just something to think
about. (Then again, I'm not a gitweb user, nor am I familiar with its
configuration, so take my observations with a grain of salt.)

> > > +       open my $fd, '-|', quote_command(
> > > +               git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
> > > +               or die_error(500, 'Failed to read mailmap');
> >
> > Am I reading this correctly that this will die if the project does not
> > have a .mailmap file? If so, that seems like harsh behavior since
> > there are many projects in the wild lacking a .mailmap file.
>
> No, this error message is misleading. The die_error is called if there
> is a problem executing git cat-file, but not if cat-file returns an
> error. I'll revise this message to be more accurate.

Okay, that makes sense.

> > > +       return \%mailmap if eof $fd;
> > > +       foreach (split '\n', <$fd>) {
> >
> > If the .mailmap has no content, then the 'foreach' loop won't be
> > entered, which means the early 'return' above it is unneeded, correct?
> > (Not necessarily asking for the early 'return' to be removed, but more
> > a case of checking that I'm understanding the logic.)
>
> The early return is intended to catch when there is no mailmap, so $fd
> does not get initialized. Without it, you would get an error when you
> try to split $fd's content:
>
>     Use of uninitialized value $fd in split at [the foreach]

Right. This follows from my misunderstanding what happened if .mailmap
was missing.
Jeff King Aug. 10, 2020, 10:02 a.m. UTC | #4
On Sun, Aug 09, 2020 at 11:04:37PM +0000, Emma Brooks wrote:

> Add an option to map names and emails to their canonical forms via a
> .mailmap file. This is enabled by default, consistent with the behavior
> of Git itself.
> 
> Signed-off-by: Emma Brooks <me@pluvano.com>
> ---
> 
> No code changes. I just fixed a typo in the commit subject (made "map"
> lower-case).

There was a little discussion in response to v1 on whether we could
reuse the existing C mailmap code:

  https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/

Did you have any thoughts on that?

-Peff
Emma Brooks Aug. 11, 2020, 4:17 a.m. UTC | #5
On 2020-08-10 06:02:49-0400, Jeff King wrote:
> There was a little discussion in response to v1 on whether we could
> reuse the existing C mailmap code:
> 
>   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> 
> Did you have any thoughts on that?

I think it's probably not worth the effort to make the necessary changes
to "rev-list --header" Junio mentioned, just for gitweb.

I agree it's a bit worrisome to have a second parser that could
potentially behave slightly differently than the main implementation.
What if we added tests for gitweb's mailmap parsing based on the same
cases used for Git itself?
Eric Sunshine Aug. 11, 2020, 4:48 a.m. UTC | #6
On Tue, Aug 11, 2020 at 12:17 AM Emma Brooks <me@pluvano.com> wrote:
> On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > There was a little discussion in response to v1 on whether we could
> > reuse the existing C mailmap code:
>
> I think it's probably not worth the effort to make the necessary changes
> to "rev-list --header" Junio mentioned, just for gitweb.
>
> I agree it's a bit worrisome to have a second parser that could
> potentially behave slightly differently than the main implementation.
> What if we added tests for gitweb's mailmap parsing based on the same
> cases used for Git itself?

Another option which people probably won't like is to have gitweb
start "git check-mailmap --stdin" in the background, leave it running,
and just feed it author/commit info as needed and read back its
replies. The benefit is that you get the .mailmap parsing and
resolution built into Git itself without needing any extra
parsing/resolution Perl code or tests. The downside is that people
might balk at an extra process hanging around for the duration of
gitweb itself. (You could also start up "git check-mailmap" repeatedly
on-demand, but that would probably be too slow and resource intensive
for real-world use.)
Jeff King Aug. 11, 2020, 4:55 a.m. UTC | #7
On Tue, Aug 11, 2020 at 04:17:28AM +0000, Emma Brooks wrote:

> On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > There was a little discussion in response to v1 on whether we could
> > reuse the existing C mailmap code:
> > 
> >   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> > 
> > Did you have any thoughts on that?
> 
> I think it's probably not worth the effort to make the necessary changes
> to "rev-list --header" Junio mentioned, just for gitweb.

Yeah, I agree that probably doesn't make sense to change "rev-list
--header". I wonder if git could be using "rev-list --format" instead,
though, and asking for the specific things it wants. That could improve
more than just this case, too (e.g., the C code would be parsing and
normalizing author/committer idents, which could make handling of badly
formatted ones more consistent with other Git tools).

It may be a big change, though. I don't know the gitweb code very well.

> I agree it's a bit worrisome to have a second parser that could
> potentially behave slightly differently than the main implementation.
> What if we added tests for gitweb's mailmap parsing based on the same
> cases used for Git itself?

That would certainly help, though I don't know how easy it would be to
replicate all of the tests in a maintainable way.

-Peff
Eric Wong Aug. 11, 2020, 6:17 a.m. UTC | #8
Emma Brooks <me@pluvano.com> wrote:
> On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > There was a little discussion in response to v1 on whether we could
> > reuse the existing C mailmap code:
> > 
> >   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> > 
> > Did you have any thoughts on that?
> 
> I think it's probably not worth the effort to make the necessary changes
> to "rev-list --header" Junio mentioned, just for gitweb.
> 
> I agree it's a bit worrisome to have a second parser that could
> potentially behave slightly differently than the main implementation.

+Cc Joe Perches

Fwiw, there's already a GPL-2.0 Perl .mailmap parser in
scripts/get_maintainer.pl of the Linux kernel which Joe
maintains:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/get_maintainer.pl

Been thinking about adding mailmap support to public-inbox in
the send-email reply instructions, too.  (but public-inbox is
AGPL-3+, so I can't steal the code w/o permission)

> What if we added tests for gitweb's mailmap parsing based on the same
> cases used for Git itself?

That's probably fine IMHO; especially if it's just for gitweb display
(and not writing anything that's meant to be stored forever).

There's already dozens of different parsers for email addresses,
MIME, mailbox formats, etc. all with slightly different edge cases;
things still mostly work well enough to not be a huge problem.
(Same goes for Markdown, HTML, formats and even JSON :x)
Joe Perches Aug. 11, 2020, 6:33 a.m. UTC | #9
On Tue, 2020-08-11 at 06:17 +0000, Eric Wong wrote:
> Emma Brooks <me@pluvano.com> wrote:
> > On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > > There was a little discussion in response to v1 on whether we could
> > > reuse the existing C mailmap code:
> > > 
> > >   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> > > 
> > > Did you have any thoughts on that?
> > 
> > I think it's probably not worth the effort to make the necessary changes
> > to "rev-list --header" Junio mentioned, just for gitweb.
> > 
> > I agree it's a bit worrisome to have a second parser that could
> > potentially behave slightly differently than the main implementation.
> 
> +Cc Joe Perches
> 
> Fwiw, there's already a GPL-2.0 Perl .mailmap parser in
> scripts/get_maintainer.pl of the Linux kernel which Joe
> maintains:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/get_maintainer.pl

+cc Florian Mickler

Might be different behavior, dunno.

Florian Mickler wrote most of that and I
believe I rewrote it a bit, mostly for style.

If the perl code is useful to you, do what
you will with it, I give you my permission.

I don't believe get_maintainer needs to be
changed unless it's shown to be different
than what git does already.  I think it's
the same output.

> Been thinking about adding mailmap support to public-inbox in
> the send-email reply instructions, too.  (but public-inbox is
> AGPL-3+, so I can't steal the code w/o permission)
> 
> > What if we added tests for gitweb's mailmap parsing based on the same
> > cases used for Git itself?
> 
> That's probably fine IMHO; especially if it's just for gitweb display
> (and not writing anything that's meant to be stored forever).
> 
> There's already dozens of different parsers for email addresses,
> MIME, mailbox formats, etc. all with slightly different edge cases;
> things still mostly work well enough to not be a huge problem.
> (Same goes for Markdown, HTML, formats and even JSON :x)
Emma Brooks Sept. 5, 2020, 2:55 a.m. UTC | #10
On 2020-08-11 00:55:09-0400, Jeff King wrote:
> On Tue, Aug 11, 2020 at 04:17:28AM +0000, Emma Brooks wrote:
> 
> > On 2020-08-10 06:02:49-0400, Jeff King wrote:
> > > There was a little discussion in response to v1 on whether we could
> > > reuse the existing C mailmap code:
> > > 
> > >   https://lore.kernel.org/git/20200731010129.GD240563@coredump.intra.peff.net/
> > > 
> > > Did you have any thoughts on that?
> > 
> > I think it's probably not worth the effort to make the necessary changes
> > to "rev-list --header" Junio mentioned, just for gitweb.
> 
> Yeah, I agree that probably doesn't make sense to change "rev-list
> --header". I wonder if git could be using "rev-list --format" instead,
> though, and asking for the specific things it wants. That could improve
> more than just this case, too (e.g., the C code would be parsing and
> normalizing author/committer idents, which could make handling of badly
> formatted ones more consistent with other Git tools).
> 
> It may be a big change, though. I don't know the gitweb code very well.

This idea works in my testing, and it should be a small change.

However, I couldn't find a way to get "rev-list --format" to separate
commits with NULs. Is there a way to do this that I'm missing? I was
able to try the concept in gitweb by switching the "rev-list --format"
call I would've used to a similar log call, and it seems like a fairly
small change.
Junio C Hamano Sept. 5, 2020, 3:26 a.m. UTC | #11
Emma Brooks <me@pluvano.com> writes:

> However, I couldn't find a way to get "rev-list --format" to separate
> commits with NULs.

A workaround would be "git rev-list --format='%s%x00'", iow,
manually insert NUL

I would have expected "-z" to replace LF with NUL, but that does not
appear to work X-<.
Emma Brooks Sept. 7, 2020, 10:10 p.m. UTC | #12
On 2020-09-04 20:26:11-0700, Junio C Hamano wrote:
> Emma Brooks <me@pluvano.com> writes:
> 
> > However, I couldn't find a way to get "rev-list --format" to separate
> > commits with NULs.
> 
> A workaround would be "git rev-list --format='%s%x00'", iow,
> manually insert NUL
> 
> I would have expected "-z" to replace LF with NUL, but that does not
> appear to work X-<.

Thanks. I'll need to ignore the extra LF when parsing then. Later, "-z"
support could be added/fixed in rev-list (#leftoverbits?) and gitweb
could be updated to use that instead.
diff mbox series

Patch

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index 7963a79ba9..2d7551a6a5 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -751,6 +751,11 @@  default font sizes or lineheights are changed (e.g. via adding extra
 CSS stylesheet in `@stylesheets`), it may be appropriate to change
 these values.
 
+mailmap::
+	Use mailmap to find the canonical name/email for
+	committers/authors (see linkgit:git-shortlog[1]). Enabled by
+	default.
+
 highlight::
 	Server-side syntax highlight support in "blob" view.  It requires
 	`$highlight_bin` program to be available (see the description of
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 0959a782ec..1ca495b8b4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -505,6 +505,12 @@  sub evaluate_uri {
 		'override' => 0,
 		'default' => ['']},
 
+	# Enable reading mailmap to determine canonical author
+	# information. Enabled by default.
+	'mailmap' => {
+		'override' => 0,
+		'default' => [1]},
+
 	# Enable displaying how much time and how many git commands
 	# it took to generate and display page.  Disabled by default.
 	# Project specific override is not supported.
@@ -3490,6 +3496,63 @@  sub parse_tag {
 	return %tag
 }
 
+# Contents of mailmap stored as a referance to a hash with keys in the format
+# of "name <email>" or "<email>", and values that are hashes containing a
+# replacement "name" and/or "email". If set (even if empty) the mailmap has
+# already been read.
+my $mailmap;
+
+sub read_mailmap {
+	my %mailmap = ();
+	open my $fd, '-|', quote_command(
+		git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap') . ' 2> /dev/null'
+		or die_error(500, 'Failed to read mailmap');
+	return \%mailmap if eof $fd;
+	foreach (split '\n', <$fd>) {
+		next if (/^#/);
+		if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x ||
+		    /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) {
+			# New Name <new@email> <old@email>
+			# New Name <new@email> Old Name <old@email>
+			$mailmap{$3} = ();
+			($mailmap{$3}{name} = $1) =~ s/^\s+|\s+$//g;
+			$mailmap{$3}{email} = $2;
+		} elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x ||
+		         /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) {
+			# New Name <old@email>
+			# <new@email> <old@email>
+			$mailmap{$3} = ();
+			if ($1) {
+				$mailmap{$3}{email} = $1;
+			} else {
+				($mailmap{$3}{name} = $2) =~ s/^\s+|\s+$//g;
+			}
+		}
+	}
+	return \%mailmap;
+}
+
+# Map author name and email based on mailmap. A more specific match
+# ("name <email>") is preferred to a less specific one ("<email>").
+sub map_author {
+	my $name = shift;
+	my $email = shift;
+
+	if (!$mailmap) {
+		$mailmap = read_mailmap;
+	}
+
+	if ($mailmap->{"$name <$email>"}) {
+		$name = $mailmap->{"$name <$email>"}{name} || $name;
+		$email = $mailmap->{"$name <$email>"}{email} || $email;
+	} elsif ($mailmap->{"<$email>"}) {
+		$name = $mailmap->{"<$email>"}{name} || $name;
+		$email = $mailmap->{"<$email>"}{email} || $email;
+	}
+
+	return ($name, $email);
+}
+
 sub parse_commit_text {
 	my ($commit_text, $withparents) = @_;
 	my @commit_lines = split '\n', $commit_text;
@@ -3517,8 +3580,13 @@  sub parse_commit_text {
 			$co{'author_epoch'} = $2;
 			$co{'author_tz'} = $3;
 			if ($co{'author'} =~ m/^([^<]+) <([^>]*)>/) {
-				$co{'author_name'}  = $1;
-				$co{'author_email'} = $2;
+				my ($name, $email) = @_;
+				if (gitweb_check_feature('mailmap')) {
+					($name, $email) = map_author($1, $2);
+					$co{'author'} = "$name <$email>";
+				}
+				$co{'author_name'}  = $name;
+				$co{'author_email'} = $email;
 			} else {
 				$co{'author_name'} = $co{'author'};
 			}
@@ -3527,8 +3595,13 @@  sub parse_commit_text {
 			$co{'committer_epoch'} = $2;
 			$co{'committer_tz'} = $3;
 			if ($co{'committer'} =~ m/^([^<]+) <([^>]*)>/) {
-				$co{'committer_name'}  = $1;
-				$co{'committer_email'} = $2;
+				my ($name, $email) = @_;
+				if (gitweb_check_feature('mailmap')) {
+					($name, $email) = map_author($1, $2);
+					$co{'committer'} = "$name <$email>";
+				}
+				$co{'committer_name'}  = $name;
+				$co{'committer_email'} = $email;
 			} else {
 				$co{'committer_name'} = $co{'committer'};
 			}