Message ID | 20200730041217.6893-1-me@pluvano.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] gitweb: Map names/emails with mailmap | expand |
[jc: Cc'ing Jakub, hoping he's still our resident gitweb expert, as an "RFC" requests help from experts] Emma Brooks <me@pluvano.com> writes: > Add an option to map names and emails to their canonical forms via a > .mailmap file. This is enabled by default, consistent with the behavior > of Git itself. > > Signed-off-by: Emma Brooks <me@pluvano.com> > --- > > This works, but needs some polish. The read_mailmap code is not > particularly clever. > > Documentation/gitweb.conf.txt | 5 +++ > gitweb/gitweb.perl | 79 +++++++++++++++++++++++++++++++++-- > 2 files changed, 80 insertions(+), 4 deletions(-) > > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt > index 7963a79ba9..2d7551a6a5 100644 > --- a/Documentation/gitweb.conf.txt > +++ b/Documentation/gitweb.conf.txt > @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra > CSS stylesheet in `@stylesheets`), it may be appropriate to change > these values. > > +mailmap:: > + Use mailmap to find the canonical name/email for > + committers/authors (see linkgit:git-shortlog[1]). Enabled by > + default. > + > highlight:: > Server-side syntax highlight support in "blob" view. It requires > `$highlight_bin` program to be available (see the description of > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 0959a782ec..00256704a7 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -505,6 +505,12 @@ sub evaluate_uri { > 'override' => 0, > 'default' => ['']}, > > + # Enable reading mailmap to determine canonical author > + # information. Enabled by default. > + 'mailmap' => { > + 'override' => 0, > + 'default' => [1]}, > + > # Enable displaying how much time and how many git commands > # it took to generate and display page. Disabled by default. > # Project specific override is not supported. > @@ -3490,6 +3496,61 @@ sub parse_tag { > return %tag > } > > +# Contents of mailmap stored as a referance to a hash with keys in the format > +# of "name <email>" or "<email>", and values that are hashes containing a > +# replacement "name" and/or "email". If set (even if empty) the mailmap has > +# already been read. > +my $mailmap; > + > +sub read_mailmap { > + my %mailmap = (); > + open my $fd, '-|', git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap' > + or die_error(500, 'Failed to read mailmap'); > + foreach (split '\n', <$fd>) { > + next if (/^#/); > + if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x || > + /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) { > + # New Name <new@email> <old@email> > + # New Name <new@email> Old Name <old@email> > + $mailmap{$3} = (); > + $mailmap{$3}{name} = $1; > + $mailmap{$3}{email} = $2; > + } elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x || > + /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) { > + # New Name <old@email> > + # <new@email> <old@email> > + $mailmap{$3} = (); > + if ($1) { > + $mailmap{$3}{email} = $1; > + } else { > + $mailmap{$3}{name} = $2; > + } > + } > + } > + return \%mailmap; > +} > + > +# Map author name and email based on mailmap. A more specific match > +# ("name <email>") is preferred to a less specific one ("<email>"). > +sub map_author { > + my $name = shift; > + my $email = shift; > + > + if (!$mailmap) { > + $mailmap = read_mailmap; > + } > + > + if ($mailmap->{"$name <$email>"}) { > + $name = $mailmap->{"$name <$email>"}{name} || $name; > + $email = $mailmap->{"$name <$email>"}{email} || $email; > + } elsif ($mailmap->{"<$email>"}) { > + $name = $mailmap->{"<$email>"}{name} || $name; > + $email = $mailmap->{"<$email>"}{email} || $email; > + } > + > + return ($name, $email); > +} > + > sub parse_commit_text { > my ($commit_text, $withparents) = @_; > my @commit_lines = split '\n', $commit_text; > @@ -3517,8 +3578,13 @@ sub parse_commit_text { > $co{'author_epoch'} = $2; > $co{'author_tz'} = $3; > if ($co{'author'} =~ m/^([^<]+) <([^>]*)>/) { > - $co{'author_name'} = $1; > - $co{'author_email'} = $2; > + my ($name, $email) = @_; > + if (gitweb_check_feature('mailmap')) { > + ($name, $email) = map_author($1, $2); > + $co{'author'} = "$name <$email>"; > + } > + $co{'author_name'} = $name; > + $co{'author_email'} = $email; > } else { > $co{'author_name'} = $co{'author'}; > } > @@ -3527,8 +3593,13 @@ sub parse_commit_text { > $co{'committer_epoch'} = $2; > $co{'committer_tz'} = $3; > if ($co{'committer'} =~ m/^([^<]+) <([^>]*)>/) { > - $co{'committer_name'} = $1; > - $co{'committer_email'} = $2; > + my ($name, $email) = @_; > + if (gitweb_check_feature('mailmap')) { > + ($name, $email) = map_author($1, $2); > + $co{'committer'} = "$name <$email>"; > + } > + $co{'committer_name'} = $name; > + $co{'committer_email'} = $email; > } else { > $co{'committer_name'} = $co{'committer'}; > }
On Thu, Jul 30, 2020 at 04:12:17AM +0000, Emma Brooks wrote: > Add an option to map names and emails to their canonical forms via a > .mailmap file. This is enabled by default, consistent with the behavior > of Git itself. I'm quite far from an expert in gitweb, but this seems like a good feature to have. Having a separate implementation to read and apply mailmaps makes me worried that it will behave slightly differently than the C code, especially around corner cases. Is it possible for us to ask git programs that are called by gitweb to do the conversion for us (e.g., by passing "--use-mailmap" or using "%aE" and "%aN" formatters)? I won't be surprised if the answer is "no, we access commits using lower-level plumbing". But it's worth looking into, I think, if you didn't already. -Peff
Jeff King <peff@peff.net> writes: > On Thu, Jul 30, 2020 at 04:12:17AM +0000, Emma Brooks wrote: > >> Add an option to map names and emails to their canonical forms via a >> .mailmap file. This is enabled by default, consistent with the behavior >> of Git itself. > > I'm quite far from an expert in gitweb, but this seems like a good > feature to have. > > Having a separate implementation to read and apply mailmaps makes me > worried that it will behave slightly differently than the C code, > especially around corner cases. Is it possible for us to ask git > programs that are called by gitweb to do the conversion for us (e.g., > by passing "--use-mailmap" or using "%aE" and "%aN" formatters)? > I won't be surprised if the answer is "no, we access commits using > lower-level plumbing". But it's worth looking into, I think, if you > didn't already. I briefly looked at tweaking "rev-list --header" but because it ends up calling pretty.c::pp_header() for obvious reasons since we are doing as little processing as possible in CMIT_FMT_RAW format, we do not get to pretty.c::pp_user_info() which is where the mailmap conversion happens for the normal "log" output. It is tempting to split pp_user_info() into two parts (i.e. the first few lines up to where map_user() is optionally called, and the remainder), so that the CMIT_FMT_RAW users can optionally ask for mailmap to kick in, but I doubt that it is worth it, if the only potential benefitter is gitweb (which I consider is purely maintenance mode---I am surprised the world hasn't yet switched to gitiles, cgit and others).
diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt index 7963a79ba9..2d7551a6a5 100644 --- a/Documentation/gitweb.conf.txt +++ b/Documentation/gitweb.conf.txt @@ -751,6 +751,11 @@ default font sizes or lineheights are changed (e.g. via adding extra CSS stylesheet in `@stylesheets`), it may be appropriate to change these values. +mailmap:: + Use mailmap to find the canonical name/email for + committers/authors (see linkgit:git-shortlog[1]). Enabled by + default. + highlight:: Server-side syntax highlight support in "blob" view. It requires `$highlight_bin` program to be available (see the description of diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 0959a782ec..00256704a7 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -505,6 +505,12 @@ sub evaluate_uri { 'override' => 0, 'default' => ['']}, + # Enable reading mailmap to determine canonical author + # information. Enabled by default. + 'mailmap' => { + 'override' => 0, + 'default' => [1]}, + # Enable displaying how much time and how many git commands # it took to generate and display page. Disabled by default. # Project specific override is not supported. @@ -3490,6 +3496,61 @@ sub parse_tag { return %tag } +# Contents of mailmap stored as a referance to a hash with keys in the format +# of "name <email>" or "<email>", and values that are hashes containing a +# replacement "name" and/or "email". If set (even if empty) the mailmap has +# already been read. +my $mailmap; + +sub read_mailmap { + my %mailmap = (); + open my $fd, '-|', git_cmd(), 'cat-file', 'blob', 'HEAD:.mailmap' + or die_error(500, 'Failed to read mailmap'); + foreach (split '\n', <$fd>) { + next if (/^#/); + if (/(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>) (?:\s+\#)/x || + /(.*)\s+ <([^<>]+)>\s+ ((?:.*\s+)? <[^<>]+>)/x) { + # New Name <new@email> <old@email> + # New Name <new@email> Old Name <old@email> + $mailmap{$3} = (); + $mailmap{$3}{name} = $1; + $mailmap{$3}{email} = $2; + } elsif (/(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>) (?:\s+\#)/x || + /(?: <([^<>]+)>\s+ | (.+)\s+ ) (<[^<>]+>)/x) { + # New Name <old@email> + # <new@email> <old@email> + $mailmap{$3} = (); + if ($1) { + $mailmap{$3}{email} = $1; + } else { + $mailmap{$3}{name} = $2; + } + } + } + return \%mailmap; +} + +# Map author name and email based on mailmap. A more specific match +# ("name <email>") is preferred to a less specific one ("<email>"). +sub map_author { + my $name = shift; + my $email = shift; + + if (!$mailmap) { + $mailmap = read_mailmap; + } + + if ($mailmap->{"$name <$email>"}) { + $name = $mailmap->{"$name <$email>"}{name} || $name; + $email = $mailmap->{"$name <$email>"}{email} || $email; + } elsif ($mailmap->{"<$email>"}) { + $name = $mailmap->{"<$email>"}{name} || $name; + $email = $mailmap->{"<$email>"}{email} || $email; + } + + return ($name, $email); +} + sub parse_commit_text { my ($commit_text, $withparents) = @_; my @commit_lines = split '\n', $commit_text; @@ -3517,8 +3578,13 @@ sub parse_commit_text { $co{'author_epoch'} = $2; $co{'author_tz'} = $3; if ($co{'author'} =~ m/^([^<]+) <([^>]*)>/) { - $co{'author_name'} = $1; - $co{'author_email'} = $2; + my ($name, $email) = @_; + if (gitweb_check_feature('mailmap')) { + ($name, $email) = map_author($1, $2); + $co{'author'} = "$name <$email>"; + } + $co{'author_name'} = $name; + $co{'author_email'} = $email; } else { $co{'author_name'} = $co{'author'}; } @@ -3527,8 +3593,13 @@ sub parse_commit_text { $co{'committer_epoch'} = $2; $co{'committer_tz'} = $3; if ($co{'committer'} =~ m/^([^<]+) <([^>]*)>/) { - $co{'committer_name'} = $1; - $co{'committer_email'} = $2; + my ($name, $email) = @_; + if (gitweb_check_feature('mailmap')) { + ($name, $email) = map_author($1, $2); + $co{'committer'} = "$name <$email>"; + } + $co{'committer_name'} = $name; + $co{'committer_email'} = $email; } else { $co{'committer_name'} = $co{'committer'}; }
Add an option to map names and emails to their canonical forms via a .mailmap file. This is enabled by default, consistent with the behavior of Git itself. Signed-off-by: Emma Brooks <me@pluvano.com> --- This works, but needs some polish. The read_mailmap code is not particularly clever. Documentation/gitweb.conf.txt | 5 +++ gitweb/gitweb.perl | 79 +++++++++++++++++++++++++++++++++-- 2 files changed, 80 insertions(+), 4 deletions(-)