From patchwork Sun Jan 3 21:18:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "brian m. carlson" X-Patchwork-Id: 11995973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DA0BC43381 for ; Sun, 3 Jan 2021 21:19:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6442720784 for ; Sun, 3 Jan 2021 21:19:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727529AbhACVTr (ORCPT ); Sun, 3 Jan 2021 16:19:47 -0500 Received: from injection.crustytoothpaste.net ([192.241.140.119]:45746 "EHLO injection.crustytoothpaste.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727452AbhACVTr (ORCPT ); Sun, 3 Jan 2021 16:19:47 -0500 Received: from camp.crustytoothpaste.net (unknown [IPv6:2001:470:b978:101:b610:a2f0:36c1:12e3]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by injection.crustytoothpaste.net (Postfix) with ESMTPSA id D53CC60811; Sun, 3 Jan 2021 21:19:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=crustytoothpaste.net; s=default; t=1609708746; bh=rM8C4HlERsqv2L9g1AnnG9PExC+oyzdpcH4GaE2iv6I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Reply-To: Subject:Date:To:CC:Resent-Date:Resent-From:Resent-To:Resent-Cc: In-Reply-To:References:Content-Type:Content-Disposition; b=z1qjCJBtfYsc4wariN6x7qksrkTP2zPsPS8ZdD74SvFM4uGF/KjSKoi0oVQn1jV21 B6fQBxF2Izw80kJgbBlCCSlmvG4KUYBZLtYiSRwmd8zAxExt8iloD5TMqd/0DkmF57 WVWSqzq6zXBimBJ+gC3jSRU7uM2Ns2eyyf3RMRaAyq8acwvkeDcrBN9V5UWx8LCEYc /HhvO57KcE7hw9tJCwj9gtc4IQBenYhclgH/gaaEWk1Occgwanj+7X3Cp6NLvZmNBE Mr65kye8Lyf2k2YCseSKsSpRlzXvAo/NgJ45mQTvc3Drw+RFJG3CV/mdVldwP6oi6T S4aNzNEiwlGmiTQUo7FNE6j3bFwrxZM8N96xRhwNc92Yw5bd47Q+l5ySjkd96Autg1 Roi+t6z1Ru7b2+ROnuyzQE8BAxo81fCcjxDdbTwOWgK709vBr8uxWhc8NOETSGc7p8 TN9SNQhvvRHpZK/9ERxQ3abnIzZ2UCF7UPwXTmT/trC1DND/Sye From: "brian m. carlson" To: Cc: Jeff King , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFz?= =?utf-8?b?b24=?= , Phillip Wood Subject: [PATCH v2 1/5] mailmap: add a function to inspect the number of entries Date: Sun, 3 Jan 2021 21:18:45 +0000 Message-Id: <20210103211849.2691287-2-sandals@crustytoothpaste.net> X-Mailer: git-send-email 2.30.0.284.gd98b1dd5eaa7 In-Reply-To: <20210103211849.2691287-1-sandals@crustytoothpaste.net> References: <20210103211849.2691287-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We're soon going to change the type of our mailmap into an opaque struct so we can add features and improve performance. When we do so, it won't be possible for users to inspect its internals to determine how many items are present, so let's introduce a function that lets users inquire how many objects are in the mailmap and use it where we want this information. Signed-off-by: brian m. carlson --- mailmap.c | 5 +++++ mailmap.h | 1 + pretty.c | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/mailmap.c b/mailmap.c index 962fd86d6d..c9a538f4e2 100644 --- a/mailmap.c +++ b/mailmap.c @@ -361,3 +361,8 @@ int map_user(struct string_list *map, debug_mm("map_user: --\n"); return 0; } + +int mailmap_entries(struct string_list *map) +{ + return map->nr; +} diff --git a/mailmap.h b/mailmap.h index d0e65646cb..ff57b05a15 100644 --- a/mailmap.h +++ b/mailmap.h @@ -5,6 +5,7 @@ struct string_list; int read_mailmap(struct string_list *map, char **repo_abbrev); void clear_mailmap(struct string_list *map); +int mailmap_entries(struct string_list *map); int map_user(struct string_list *map, const char **email, size_t *emaillen, const char **name, size_t *namelen); diff --git a/pretty.c b/pretty.c index 7a7708a0ea..43a0039870 100644 --- a/pretty.c +++ b/pretty.c @@ -681,7 +681,7 @@ static int mailmap_name(const char **email, size_t *email_len, mail_map = xcalloc(1, sizeof(*mail_map)); read_mailmap(mail_map, NULL); } - return mail_map->nr && map_user(mail_map, email, email_len, name, name_len); + return mailmap_entries(mail_map) && map_user(mail_map, email, email_len, name, name_len); } static size_t format_person_part(struct strbuf *sb, char part, From patchwork Sun Jan 3 21:18:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "brian m. carlson" X-Patchwork-Id: 11995977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E932C433E0 for ; Sun, 3 Jan 2021 21:20:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5779F20784 for ; Sun, 3 Jan 2021 21:20:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727535AbhACVTy (ORCPT ); Sun, 3 Jan 2021 16:19:54 -0500 Received: from injection.crustytoothpaste.net ([192.241.140.119]:45752 "EHLO injection.crustytoothpaste.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727453AbhACVTs (ORCPT ); Sun, 3 Jan 2021 16:19:48 -0500 Received: from camp.crustytoothpaste.net (unknown [IPv6:2001:470:b978:101:b610:a2f0:36c1:12e3]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by injection.crustytoothpaste.net (Postfix) with ESMTPSA id 6D9DD60812; Sun, 3 Jan 2021 21:19:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=crustytoothpaste.net; s=default; t=1609708746; bh=3P8kUieolsojzjyIY6yQJ6r3SrSHAwoHPzE832QP8G0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Reply-To: Subject:Date:To:CC:Resent-Date:Resent-From:Resent-To:Resent-Cc: In-Reply-To:References:Content-Type:Content-Disposition; b=yf7iIrWoWl3qof2kw30na6KD74FosXDN7+cLhSgr/K/4A4EOF7leXFcFTNwXusXTO MAWjy9NSqALEUbQgalIRfxVhHVlZ2mP/EwIeWmCU8tJC4hJKphXi41bWsGoz7Z5L15 fXAkItlRVcpKrtyy62J6kL5wlgwmuW3TmQcH6Pe9kE8SLbiA8cKAxnj033pZCd9orR 7aD9elzHG1oqreUbnLsQFjzFWYgzLCVi5JKlCZ81HnwkSdG3NNnQN8h0iF+SuJRHKL Dk1ppdNtfahhff5H+zCMIhz4A0Irkk6iKHgNXPACrd9DznKWp3TcU+JBpI9gv+JP/m XwlwxMLXUh1WkA+JBbMXVb3HV4LGuBUPW5Jckxx/lE7KoiCWBpQAtMdBpdPy+hGlQ1 /bWRWhAbnUr825RTheML9c+HSHLOBRqzpzRMbI1fS1f7oTM4Tw1FN1YQG0Fq4von94 bd8J6YnAw9rIE8bOmHXnyOJUvePVxW46KVrTy0cecW3KUuZlMTe From: "brian m. carlson" To: Cc: Jeff King , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFz?= =?utf-8?b?b24=?= , Phillip Wood Subject: [PATCH v2 2/5] mailmap: switch to opaque struct Date: Sun, 3 Jan 2021 21:18:46 +0000 Message-Id: <20210103211849.2691287-3-sandals@crustytoothpaste.net> X-Mailer: git-send-email 2.30.0.284.gd98b1dd5eaa7 In-Reply-To: <20210103211849.2691287-1-sandals@crustytoothpaste.net> References: <20210103211849.2691287-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Currently a mailmap is a simple sorted string list. However, we'll soon want to add additional data to our mailmap, and in order to do so effectively, we'll need a struct of our own. To do that, let's create a struct mailmap and use that everywhere we're creating a mailmap. Note that we no longer explicitly initialize our mailmaps, since read_mailmap will do that for us. We also don't need to explicitly set the flag to strdup strings, since we've passed that argument when we initialize the string list. Signed-off-by: brian m. carlson --- builtin/blame.c | 2 +- builtin/check-mailmap.c | 4 ++-- builtin/commit.c | 2 +- mailmap.c | 20 +++++++++++++------- mailmap.h | 14 +++++++++----- pretty.c | 2 +- pretty.h | 2 +- revision.c | 2 +- revision.h | 3 ++- shortlog.h | 3 ++- 10 files changed, 33 insertions(+), 21 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index 6f7e32411a..d48dbbd005 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -60,7 +60,7 @@ static int mark_ignored_lines; static struct date_mode blame_date_mode = { DATE_ISO8601 }; static size_t blame_date_width; -static struct string_list mailmap = STRING_LIST_INIT_NODUP; +static struct mailmap mailmap; #ifndef DEBUG_BLAME #define DEBUG_BLAME 0 diff --git a/builtin/check-mailmap.c b/builtin/check-mailmap.c index cdce144f3b..ad155e9092 100644 --- a/builtin/check-mailmap.c +++ b/builtin/check-mailmap.c @@ -15,7 +15,7 @@ static const struct option check_mailmap_options[] = { OPT_END() }; -static void check_mailmap(struct string_list *mailmap, const char *contact) +static void check_mailmap(struct mailmap *mailmap, const char *contact) { const char *name, *mail; size_t namelen, maillen; @@ -39,7 +39,7 @@ static void check_mailmap(struct string_list *mailmap, const char *contact) int cmd_check_mailmap(int argc, const char **argv, const char *prefix) { int i; - struct string_list mailmap = STRING_LIST_INIT_NODUP; + struct mailmap mailmap; git_config(git_default_config, NULL); argc = parse_options(argc, argv, prefix, check_mailmap_options, diff --git a/builtin/commit.c b/builtin/commit.c index 505fe60956..2d69847c49 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1027,7 +1027,7 @@ static const char *find_author_by_nickname(const char *name) struct rev_info revs; struct commit *commit; struct strbuf buf = STRBUF_INIT; - struct string_list mailmap = STRING_LIST_INIT_NODUP; + struct mailmap mailmap; const char *av[20]; int ac = 0; diff --git a/mailmap.c b/mailmap.c index c9a538f4e2..d3287b409a 100644 --- a/mailmap.c +++ b/mailmap.c @@ -237,11 +237,14 @@ static int read_mailmap_blob(struct string_list *map, return 0; } -int read_mailmap(struct string_list *map, char **repo_abbrev) +int read_mailmap(struct mailmap *mailmap, char **repo_abbrev) { int err = 0; + struct string_list *map; + + map = mailmap->mailmap = malloc(sizeof(*mailmap->mailmap)); + string_list_init(map, 1); - map->strdup_strings = 1; map->cmp = namemap_cmp; if (!git_mailmap_blob && is_bare_repository()) @@ -254,11 +257,14 @@ int read_mailmap(struct string_list *map, char **repo_abbrev) return err; } -void clear_mailmap(struct string_list *map) +void clear_mailmap(struct mailmap *mailmap) { + struct string_list *map = mailmap->mailmap; debug_mm("mailmap: clearing %d entries...\n", map->nr); map->strdup_strings = 1; string_list_clear_func(map, free_mailmap_entry); + string_list_clear(map, 1); + free(map); debug_mm("mailmap: cleared\n"); } @@ -313,7 +319,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map, return NULL; } -int map_user(struct string_list *map, +int map_user(struct mailmap *map, const char **email, size_t *emaillen, const char **name, size_t *namelen) { @@ -324,7 +330,7 @@ int map_user(struct string_list *map, (int)*namelen, debug_str(*name), (int)*emaillen, debug_str(*email)); - item = lookup_prefix(map, *email, *emaillen); + item = lookup_prefix(map->mailmap, *email, *emaillen); if (item != NULL) { me = (struct mailmap_entry *)item->util; if (me->namemap.nr) { @@ -362,7 +368,7 @@ int map_user(struct string_list *map, return 0; } -int mailmap_entries(struct string_list *map) +int mailmap_entries(struct mailmap *map) { - return map->nr; + return map->mailmap->nr; } diff --git a/mailmap.h b/mailmap.h index ff57b05a15..4cdce3b064 100644 --- a/mailmap.h +++ b/mailmap.h @@ -1,13 +1,17 @@ #ifndef MAILMAP_H #define MAILMAP_H -struct string_list; +#include "string-list.h" -int read_mailmap(struct string_list *map, char **repo_abbrev); -void clear_mailmap(struct string_list *map); -int mailmap_entries(struct string_list *map); +struct mailmap { + struct string_list *mailmap; +}; -int map_user(struct string_list *map, +int read_mailmap(struct mailmap *map, char **repo_abbrev); +void clear_mailmap(struct mailmap *map); +int mailmap_entries(struct mailmap *map); + +int map_user(struct mailmap *map, const char **email, size_t *emaillen, const char **name, size_t *namelen); #endif diff --git a/pretty.c b/pretty.c index 43a0039870..0dc2c98e4a 100644 --- a/pretty.c +++ b/pretty.c @@ -676,7 +676,7 @@ const char *repo_logmsg_reencode(struct repository *r, static int mailmap_name(const char **email, size_t *email_len, const char **name, size_t *name_len) { - static struct string_list *mail_map; + static struct mailmap *mail_map; if (!mail_map) { mail_map = xcalloc(1, sizeof(*mail_map)); read_mailmap(mail_map, NULL); diff --git a/pretty.h b/pretty.h index 7ce6c0b437..15735a4c51 100644 --- a/pretty.h +++ b/pretty.h @@ -40,7 +40,7 @@ struct pretty_print_context { struct reflog_walk_info *reflog_info; struct rev_info *rev; const char *output_encoding; - struct string_list *mailmap; + struct mailmap *mailmap; int color; struct ident_split *from_ident; unsigned encode_email_headers:1; diff --git a/revision.c b/revision.c index 9dff845bed..848f43d88b 100644 --- a/revision.c +++ b/revision.c @@ -3659,7 +3659,7 @@ int rewrite_parents(struct rev_info *revs, struct commit *commit, return 0; } -static int commit_rewrite_person(struct strbuf *buf, const char *what, struct string_list *mailmap) +static int commit_rewrite_person(struct strbuf *buf, const char *what, struct mailmap *mailmap) { char *person, *endp; size_t len, namelen, maillen; diff --git a/revision.h b/revision.h index 086ff10280..06bc127e90 100644 --- a/revision.h +++ b/revision.h @@ -7,6 +7,7 @@ #include "notes.h" #include "pretty.h" #include "diff.h" +#include "mailmap.h" #include "commit-slab-decl.h" /** @@ -241,7 +242,7 @@ struct rev_info { int patch_name_max; int no_inline; int show_log_size; - struct string_list *mailmap; + struct mailmap *mailmap; /* Filter by commit log message */ struct grep_opt grep_filter; diff --git a/shortlog.h b/shortlog.h index 64be879b24..9ebf7bbb9c 100644 --- a/shortlog.h +++ b/shortlog.h @@ -2,6 +2,7 @@ #define SHORTLOG_H #include "string-list.h" +#include "mailmap.h" struct commit; @@ -25,7 +26,7 @@ struct shortlog { char *common_repo_prefix; int email; - struct string_list mailmap; + struct mailmap mailmap; FILE *file; }; From patchwork Sun Jan 3 21:18:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "brian m. carlson" X-Patchwork-Id: 11995979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5609C433E0 for ; Sun, 3 Jan 2021 21:20:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B13B3207FB for ; Sun, 3 Jan 2021 21:20:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727556AbhACVUU (ORCPT ); Sun, 3 Jan 2021 16:20:20 -0500 Received: from injection.crustytoothpaste.net ([192.241.140.119]:45776 "EHLO injection.crustytoothpaste.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727453AbhACVUS (ORCPT ); Sun, 3 Jan 2021 16:20:18 -0500 Received: from camp.crustytoothpaste.net (unknown [IPv6:2001:470:b978:101:b610:a2f0:36c1:12e3]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by injection.crustytoothpaste.net (Postfix) with ESMTPSA id 031BD60813; Sun, 3 Jan 2021 21:19:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=crustytoothpaste.net; s=default; t=1609708747; bh=bvaKPPUf9jCHS07WB0sIrmL/eG253OE3KgZpLiWKS3c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:Content-Type:From: Reply-To:Subject:Date:To:CC:Resent-Date:Resent-From:Resent-To: Resent-Cc:In-Reply-To:References:Content-Type:Content-Disposition; b=05bVC5m8O1GQoGICwTB9qyiCFhoqbnFzwtcznflgTLu4zc8mmZKxjHiWBASvkODmM 70pSsc2//oPZVQQRphx9d0OEM7wLrJpKMqWl66imHiPIHKI/7YHC+Rdc+4jll+EjSQ 5FVVu99Eg7Z5690VgL4g+nNA0vkyv6/p0x1A3bNmmEK3YdK0yc9cVYo2gYkt1G0LeZ JEHGf2Yk/oJIgIDaKHJ9DxQg1vOAeicy7ZTbSgZNx+6Wr3ctD3uX/EHMDmHIDqevb4 dkoTPWPWJX3pDJKKkWiJmTrddasZe/zNd+YzIXh4+icopxUWFWCTDiPO4O7HWV7xv4 9D5dIY/ZZs8GtIDBeHlEAtEkz0ccIvoMFAk1xFZ6sx460L2b6LmhA1GLWWdBk2518z L06TOYPmdu97tfemE0dvCLtd1gx1H9RO5rSSMsZrxZh6EoYnlnIWjGnxsEoZpnKnJ3 mLq3+FmpFgOvAYnHtbDvMbBkS0rieXvk4/wyCi0t7zahy9vQqq8 From: "brian m. carlson" To: Cc: Jeff King , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFz?= =?utf-8?b?b24=?= , Phillip Wood Subject: [PATCH v2 3/5] t4203: add failing test for case-sensitive local-parts and names Date: Sun, 3 Jan 2021 21:18:47 +0000 Message-Id: <20210103211849.2691287-4-sandals@crustytoothpaste.net> X-Mailer: git-send-email 2.30.0.284.gd98b1dd5eaa7 In-Reply-To: <20210103211849.2691287-1-sandals@crustytoothpaste.net> References: <20210103211849.2691287-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Currently, Git always looks up entries in the mailmap in a case-insensitive way, both for names and addresses, which is, as explained below, suboptimal. First, for email addresses, RFC 5321 is clear that only domains are case insensitive; local-parts (the portion before the at sign) are not. It states this: The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. There exist systems today where local-parts remain case sensitive (and this author has one), and as such, it's incorrect for us to case fold them in any way. Let's add a failing test that indicates this is a problem, while still keeping the test for case-insensitive domains. Note that it's also incorrect for us to case-fold names because we don't guarantee that we're using the locale of the author, and it's impossible to case-fold names in a locale-insensitive way. Turkish and Azeri contain both a dotted and dotless I, and the uppercase ASCII I folds not to the lowercase ASCII I, but to a dotless version, and vice versa with the lowercase I. There are many words in Turkish which differ only in the dottedness of the I, so it is likely that there are also personal names which differ in the same way. That would be a problem even if our implementation were perfect, which it is not. We currently fold only ASCII characters, so this feature has never worked correctly for the vast majority of the users on the planet, regardless of the locale. For example, on Linux, even in a Spanish locale, we don't handle "Simón" properly. Even if we did handle that, we'd probably also want to implement Unicode normalization, which we don't. In general, case-folding text is extremely language- and locale-specific and requires intimacy with the spelling and grammar of the language in question and careful attention to the Unicode details in order to produce a result that is meaningful to humans and conforms with linguistic and societal norms. Because we do not have any of the required context with a plain personal name, we cannot hope to possibly case-fold personal names correctly. We should stop trying to do so and just treat them as a series of bytes, so let's add a test that we don't case-fold personal names as well. Signed-off-by: brian m. carlson --- t/t4203-mailmap.sh | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 586c3a86b1..32e849504c 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -170,10 +170,35 @@ Repo Guy (1): EOF -test_expect_success 'name entry after email entry, case-insensitive' ' +test_expect_success 'name entry after email entry, case-insensitive domain' ' mkdir -p internal_mailmap && echo " " >internal_mailmap/.mailmap && - echo "Internal Guy " >>internal_mailmap/.mailmap && + echo "Internal Guy " >>internal_mailmap/.mailmap && + git shortlog HEAD >actual && + test_cmp expect actual +' + +cat >expect <<\EOF +Repo Guy (1): + initial + +nick1 (1): + second + +EOF + +test_expect_failure 'name entry after email entry, case-sensitive local-part' ' + mkdir -p internal_mailmap && + echo " " >internal_mailmap/.mailmap && + echo "Internal Guy " >>internal_mailmap/.mailmap && + git shortlog HEAD >actual && + test_cmp expect actual +' + +test_expect_failure 'name entry after email entry, case-sensitive personal name' ' + mkdir -p internal_mailmap && + echo " " >internal_mailmap/.mailmap && + echo "Nick1 NICK1 " >internal_mailmap/.mailmap && git shortlog HEAD >actual && test_cmp expect actual ' From patchwork Sun Jan 3 21:18:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "brian m. carlson" X-Patchwork-Id: 11995981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD2BFC433DB for ; Sun, 3 Jan 2021 21:20:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9ACF2207C9 for ; Sun, 3 Jan 2021 21:20:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727562AbhACVUU (ORCPT ); Sun, 3 Jan 2021 16:20:20 -0500 Received: from injection.crustytoothpaste.net ([192.241.140.119]:45780 "EHLO injection.crustytoothpaste.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727543AbhACVUT (ORCPT ); Sun, 3 Jan 2021 16:20:19 -0500 Received: from camp.crustytoothpaste.net (unknown [IPv6:2001:470:b978:101:b610:a2f0:36c1:12e3]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by injection.crustytoothpaste.net (Postfix) with ESMTPSA id 9063B60814; Sun, 3 Jan 2021 21:19:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=crustytoothpaste.net; s=default; t=1609708747; bh=5nExuClPgVRxIpg4/j3sI9qUo8mRWkqz68j+JBrODSk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Reply-To: Subject:Date:To:CC:Resent-Date:Resent-From:Resent-To:Resent-Cc: In-Reply-To:References:Content-Type:Content-Disposition; b=SsYWQQMRmzoJYOvuBrhvMx1kOlQOe4ew0s0jVM7JjMCFPQLDR3d+XDQitR6krxTxy u0xXs24cBacG76hBjgS3waPgsacChuvfU7CDLhbeIRHmZHpX4EUSaVKA1PXNppYKMF +PagScrxcC1St7a1xkFk3UPHiclhrI6vxGYoTptQtnp8y7zwfXwijFyLUNISvCAng6 FX68aq1vtinuIjvk5TwvO1v2uy/noubQ7xAzATcKvbs45/kCckCYG98fs1ar+ZURMX 82Bu8pzknIhZ7TMoSTstbmTJMJkO0HLOqZOljQRc+2UJ6R6HOwUj2+zP3AI/c5aPix o9A3Hv6Gq4SLYhqCTyDqfbbkuJKGE7Jy9m6mPK2dnSOFL4D/7nFEZmbDV0PQRgKqGk uqzWGRf/0ApmoRTrQx4zv2COu7ihgpzhIsiIHqYprt6vf4b4oSNHXAT5fFh7KMUbVp iWwyAfA1TvVoAbPButSm20suL4QZMBCqf25+lwjnL4qu/47gEcV From: "brian m. carlson" To: Cc: Jeff King , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFz?= =?utf-8?b?b24=?= , Phillip Wood Subject: [PATCH v2 4/5] mailmap: use case-sensitive comparisons for local-parts and names Date: Sun, 3 Jan 2021 21:18:48 +0000 Message-Id: <20210103211849.2691287-5-sandals@crustytoothpaste.net> X-Mailer: git-send-email 2.30.0.284.gd98b1dd5eaa7 In-Reply-To: <20210103211849.2691287-1-sandals@crustytoothpaste.net> References: <20210103211849.2691287-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org RFC 5321 is clear that the local-part of an email address (the part before the at sign) is case sensitive, and this has been the case since the original RFC 821. It directs us that "the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address." Since we are not that party, it's not correct for us to compare them case insensitively. However, we do still want to compare the domain parts case insensitively, so let's add a helper that downcases the domain and then compare byte by byte. Similarly, it's not possible for us to correctly case-fold text in a locale-insensitive way, so our handling of personal names has also been subject to bugs. Additionally, we've never handled non-ASCII characters correctly, which means that our previous comparisons really only worked well for a fraction of the people on the planet. Since our code wasn't right and it's basically impossible to compare personal names without regard to case, let's also switch our matching of names to be byte by byte. Signed-off-by: brian m. carlson --- mailmap.c | 27 ++++++++++++++++++++++++--- t/t4203-mailmap.sh | 4 ++-- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/mailmap.c b/mailmap.c index d3287b409a..5c52dbb7e0 100644 --- a/mailmap.c +++ b/mailmap.c @@ -64,7 +64,22 @@ static void free_mailmap_entry(void *p, const char *s) */ static int namemap_cmp(const char *a, const char *b) { - return strcasecmp(a, b); + return strcmp(a, b); +} + +/* + * Lowercases the domain (and only the domain) part of an email address. The + * local-part, which is defined by RFC 5321 to be case sensitive, is not + * affected. + */ +static char *lowercase_email(char *s) +{ + char *end = strchrnul(s, '@'); + while (*end) { + *end = tolower(*end); + end++; + } + return s; } static void add_mapping(struct string_list *map, @@ -74,9 +89,13 @@ static void add_mapping(struct string_list *map, struct mailmap_entry *me; struct string_list_item *item; + lowercase_email(new_email); + if (old_email == NULL) { old_email = new_email; new_email = NULL; + } else { + lowercase_email(old_email); } item = string_list_insert(map, old_email); @@ -300,7 +319,7 @@ static struct string_list_item *lookup_prefix(struct string_list *map, * real location of the key if one exists. */ while (0 <= --i && i < map->nr) { - int cmp = strncasecmp(map->items[i].string, string, len); + int cmp = strncmp(map->items[i].string, string, len); if (cmp < 0) /* * "i" points at a key definitely below the prefix; @@ -323,6 +342,7 @@ int map_user(struct mailmap *map, const char **email, size_t *emaillen, const char **name, size_t *namelen) { + char *searchable_email = xstrndup(*email, *emaillen); struct string_list_item *item; struct mailmap_entry *me; @@ -330,7 +350,8 @@ int map_user(struct mailmap *map, (int)*namelen, debug_str(*name), (int)*emaillen, debug_str(*email)); - item = lookup_prefix(map->mailmap, *email, *emaillen); + item = lookup_prefix(map->mailmap, searchable_email, *emaillen); + free(searchable_email); if (item != NULL) { me = (struct mailmap_entry *)item->util; if (me->namemap.nr) { diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 32e849504c..df4a0e03cc 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -187,7 +187,7 @@ nick1 (1): EOF -test_expect_failure 'name entry after email entry, case-sensitive local-part' ' +test_expect_success 'name entry after email entry, case-sensitive local-part' ' mkdir -p internal_mailmap && echo " " >internal_mailmap/.mailmap && echo "Internal Guy " >>internal_mailmap/.mailmap && @@ -195,7 +195,7 @@ test_expect_failure 'name entry after email entry, case-sensitive local-part' ' test_cmp expect actual ' -test_expect_failure 'name entry after email entry, case-sensitive personal name' ' +test_expect_success 'name entry after email entry, case-sensitive personal name' ' mkdir -p internal_mailmap && echo " " >internal_mailmap/.mailmap && echo "Nick1 NICK1 " >internal_mailmap/.mailmap && From patchwork Sun Jan 3 21:18:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "brian m. carlson" X-Patchwork-Id: 11995983 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07D9CC433E6 for ; Sun, 3 Jan 2021 21:20:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C90E720784 for ; Sun, 3 Jan 2021 21:20:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727593AbhACVU3 (ORCPT ); Sun, 3 Jan 2021 16:20:29 -0500 Received: from injection.crustytoothpaste.net ([192.241.140.119]:45782 "EHLO injection.crustytoothpaste.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727543AbhACVU3 (ORCPT ); Sun, 3 Jan 2021 16:20:29 -0500 Received: from camp.crustytoothpaste.net (unknown [IPv6:2001:470:b978:101:b610:a2f0:36c1:12e3]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by injection.crustytoothpaste.net (Postfix) with ESMTPSA id 2807F60815; Sun, 3 Jan 2021 21:19:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=crustytoothpaste.net; s=default; t=1609708748; bh=kCe9FJ/gwE+bPsM2vIcOW5M5MbhKot5CavKY6VHqT34=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From:Reply-To: Subject:Date:To:CC:Resent-Date:Resent-From:Resent-To:Resent-Cc: In-Reply-To:References:Content-Type:Content-Disposition; b=wSPUT8phFMRT+JGvpB0Ao7I1zTDP1mpRJNmoeMjifZLsmbAjX5B5Obg5ALdS3wo3b 3vMfArkUwJfc3WsdDT92sVHfxCGDCbZ7OJd12SYX2WFP9RvnenkC19Nhes6/QUvSSd 5BckdQBAEebF28cQU9+iv5FiGpUElMgECmDQkH0PGDMLrNGS8oVS3DA9//M3oB59cS mfqAhW/KTsMjXx9/I36GxDKXIszRzxuRhJCBqW3pMIFW1HI3IumrRTOUWUfwu/B1hI QT7HIOtfO7bR1wETAfGZJ2qIooARQexAbfmw97mmVZyoE+J9dqeNSttfS0R1dwAmGU 6ZwbCMU1vDd7NXsajfAOvOm7XRnxv/CDWA+Dl3+GY+Yz7m2/OZrKvamRiaXCWl61Ll U3TQqhkjiv689FDdgC9fBnWlSvREi5hm5IVGYWKg/mElQ8HX8jic+Wbrxf6eF5BOg1 vW+xTpfO3BGpFps5HFSujMiEIVL+PFyEePgbHIubhmhxzNgW7g3 From: "brian m. carlson" To: Cc: Jeff King , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFz?= =?utf-8?b?b24=?= , Phillip Wood Subject: [PATCH v2 5/5] mailmap: support hashed entries in mailmaps Date: Sun, 3 Jan 2021 21:18:49 +0000 Message-Id: <20210103211849.2691287-6-sandals@crustytoothpaste.net> X-Mailer: git-send-email 2.30.0.284.gd98b1dd5eaa7 In-Reply-To: <20210103211849.2691287-1-sandals@crustytoothpaste.net> References: <20210103211849.2691287-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Many people, through the course of their lives, will change either a name or an email address. For this reason, we have the mailmap, to map from a user's former name or email address to their current, canonical forms. Normally, this works well as it is. However, sometimes people change a name or an email address and wish to wholly disassociate themselves from that former name or email address. For example, a person may transition from one gender to another, changing their name, or they may have changed their name to disassociate themselves from an abusive family or partner. In such a case, using the former name or address in any way may be undesirable and the person may wish to replace it as completely as possible. For projects which wish to support this, introduce hashed forms into the mailmap. These forms, which start with "@sha256:" followed by a SHA-256 hash of the entry, can be used in place of the form used in the commit field. This form is intentionally designed to be unlikely to conflict with legitimate use cases. For example, this is not a valid email address according to RFC 5322. In the unlikely event that a user has put such a form into the actual commit as their name, we will accept it. While the form of the data is designed to accept multiple hash algorithms, we intentionally do not support SHA-1. There is little reason to support such a weak algorithm in new use cases and no backwards compatibility to consider. Moreover, SHA-256 is faster than the SHA1DC implementation we use, so this not only improves performance, but simplifies the current implementation somewhat as well. Note that it is, of course, possible to perform a lookup on all commit objects to determine the actual entry which matches the hashed form of the data. However, this is an improvement over the status quo. The performance of this patch with no hashed entries is very similar to the performance without this patch. Considering a git log command to look up author and committer information on 981,680 commits in the Linux kernel history, either with an unhashed mailmap or a mailmap with all old values hashed: Shortest Longest Average Change Git 2.30 7.876 8.297 8.143 This patch, unhashed 7.923 8.484 8.237 + 1.15% This patch, hashed 14.510 14.783 14.672 +80.17% This patch, hashed, unoptimized 15.425 16.318 15.901 +95.27% Thus, the average performance after this patch is within normal variation of the pre-patch performance. It's unlikely that users will notice the difference in practice, even on much larger repositories, unless they're using the new feature. To minimize the performance impact of the hashing process, we maintain a reference count of each mailmap entry and when we encounter an entry we must hash, we insert the same object under the unhashed key as well. We also keep a count of the number of hashed entries. This means we must hash an object at most once and once we've seen all the hashed objects, we won't hash any more objects. Times without this optimization are listed above in the unoptimized entry. This has the potential to cause a performance problem as we insert items into a sorted list, but changing the implementation to use a khash map instead does not result in a significantly faster implementation, despite the improved insertion speed. Performance in the unhashed case is slightly worse, so this approach was not adopted since it provides few benefits. Signed-off-by: brian m. carlson --- Documentation/mailmap.txt | 28 +++++++++++ mailmap.c | 99 ++++++++++++++++++++++++++++++++++----- mailmap.h | 2 + t/t4203-mailmap.sh | 35 ++++++++++++++ 4 files changed, 152 insertions(+), 12 deletions(-) diff --git a/Documentation/mailmap.txt b/Documentation/mailmap.txt index 4a8c276529..b21194bf3e 100644 --- a/Documentation/mailmap.txt +++ b/Documentation/mailmap.txt @@ -73,3 +73,31 @@ Santa Claus Use hash '#' for comments that are either on their own line, or after the email address. + +In addition to specifying a former name or email literally, it is also possible +to specify it in a hashed form, which consists of the string `@sha256:`, +followed by an all-lowercase SHA-256 hash of the entry in hexadecimal. For +example, to take the example above, instead of specifying the replacement for +"Some Dude" as such, you could specify one of these lines: + +------------ +Some Dude nick1 <@sha256:bee4fdd8c5e2e85009c8ae231d5a395adb24d5a597f2b75489926460680b8ce1> +Some Dude @sha256:56030827e2765e8878c94c4cc43f5410b22f3b8c2b1ef8f631ac3953f8299279 +Some Dude @sha256:56030827e2765e8878c94c4cc43f5410b22f3b8c2b1ef8f631ac3953f8299279 <@sha256:bee4fdd8c5e2e85009c8ae231d5a395adb24d5a597f2b75489926460680b8ce1> +------------ + +These hash is a hash of the literal name or email without any trailing newlines. +For example, you can compute the values above like so, using the Perl `shasum` +command (or a similar command of your choice): + +------------ +$ printf '%s' bugs@company.xx | shasum -a 256 +bee4fdd8c5e2e85009c8ae231d5a395adb24d5a597f2b75489926460680b8ce1 - +------------ + +SHA-1 is not accepted as a hash algorithm in mailmaps. + +Using the hashed form may be desirable to obscure one's former name or email, +but be aware that it is just obfuscation: it's still possible for someone with +access to the repository to iterate through all authors and committers and map +the hashed values to unhashed ones. diff --git a/mailmap.c b/mailmap.c index 5c52dbb7e0..ed401bb1e4 100644 --- a/mailmap.c +++ b/mailmap.c @@ -18,6 +18,8 @@ const char *git_mailmap_blob; struct mailmap_info { char *name; char *email; + + unsigned refcount; }; struct mailmap_entry { @@ -25,6 +27,10 @@ struct mailmap_entry { char *name; char *email; + unsigned refcount; + unsigned hashed_count; + unsigned hashed_seen; + /* name and email for the complex mail and name matching case */ struct string_list namemap; }; @@ -32,6 +38,9 @@ struct mailmap_entry { static void free_mailmap_info(void *p, const char *s) { struct mailmap_info *mi = (struct mailmap_info *)p; + if (--mi->refcount) + return; + debug_mm("mailmap: -- complex: '%s' -> '%s' <%s>\n", s, debug_str(mi->name), debug_str(mi->email)); free(mi->name); @@ -41,6 +50,9 @@ static void free_mailmap_info(void *p, const char *s) static void free_mailmap_entry(void *p, const char *s) { struct mailmap_entry *me = (struct mailmap_entry *)p; + if (--me->refcount) + return; + debug_mm("mailmap: removing entries for <%s>, with %d sub-entries\n", s, me->namemap.nr); debug_mm("mailmap: - simple: '%s' <%s>\n", @@ -82,10 +94,17 @@ static char *lowercase_email(char *s) return s; } -static void add_mapping(struct string_list *map, +static int is_hashed(const char *s) +{ + const char *prefix = "@sha256:"; + return strncmp(s, prefix, strlen(prefix)) == 0; +} + +static void add_mapping(struct mailmap *mailmap, char *new_name, char *new_email, char *old_name, char *old_email) { + struct string_list *map = mailmap->mailmap; struct mailmap_entry *me; struct string_list_item *item; @@ -95,7 +114,10 @@ static void add_mapping(struct string_list *map, old_email = new_email; new_email = NULL; } else { - lowercase_email(old_email); + if (is_hashed(old_email)) + mailmap->hashed_count++; + else + lowercase_email(old_email); } item = string_list_insert(map, old_email); @@ -105,6 +127,7 @@ static void add_mapping(struct string_list *map, me = xcalloc(1, sizeof(struct mailmap_entry)); me->namemap.strdup_strings = 1; me->namemap.cmp = namemap_cmp; + me->refcount = 1; item->util = me; } @@ -125,6 +148,9 @@ static void add_mapping(struct string_list *map, debug_mm("mailmap: adding (complex) entry for '%s'\n", old_email); mi->name = xstrdup_or_null(new_name); mi->email = xstrdup_or_null(new_email); + mi->refcount = 1; + if (is_hashed(old_name)) + me->hashed_count++; string_list_insert(&me->namemap, old_name)->util = mi; } @@ -162,7 +188,7 @@ static char *parse_name_and_email(char *buffer, char **name, return (*right == '\0' ? NULL : right); } -static void read_mailmap_line(struct string_list *map, char *buffer, +static void read_mailmap_line(struct mailmap *map, char *buffer, char **repo_abbrev) { char *name1 = NULL, *email1 = NULL, *name2 = NULL, *email2 = NULL; @@ -194,7 +220,7 @@ static void read_mailmap_line(struct string_list *map, char *buffer, add_mapping(map, name1, email1, name2, email2); } -static int read_mailmap_file(struct string_list *map, const char *filename, +static int read_mailmap_file(struct mailmap *map, const char *filename, char **repo_abbrev) { char buffer[1024]; @@ -216,7 +242,7 @@ static int read_mailmap_file(struct string_list *map, const char *filename, return 0; } -static void read_mailmap_string(struct string_list *map, char *buf, +static void read_mailmap_string(struct mailmap *map, char *buf, char **repo_abbrev) { while (*buf) { @@ -230,7 +256,7 @@ static void read_mailmap_string(struct string_list *map, char *buf, } } -static int read_mailmap_blob(struct string_list *map, +static int read_mailmap_blob(struct mailmap *map, const char *name, char **repo_abbrev) { @@ -269,10 +295,10 @@ int read_mailmap(struct mailmap *mailmap, char **repo_abbrev) if (!git_mailmap_blob && is_bare_repository()) git_mailmap_blob = "HEAD:.mailmap"; - err |= read_mailmap_file(map, ".mailmap", repo_abbrev); + err |= read_mailmap_file(mailmap, ".mailmap", repo_abbrev); if (startup_info->have_repository) - err |= read_mailmap_blob(map, git_mailmap_blob, repo_abbrev); - err |= read_mailmap_file(map, git_mailmap_file, repo_abbrev); + err |= read_mailmap_blob(mailmap, git_mailmap_blob, repo_abbrev); + err |= read_mailmap_file(mailmap, git_mailmap_file, repo_abbrev); return err; } @@ -282,7 +308,7 @@ void clear_mailmap(struct mailmap *mailmap) debug_mm("mailmap: clearing %d entries...\n", map->nr); map->strdup_strings = 1; string_list_clear_func(map, free_mailmap_entry); - string_list_clear(map, 1); + string_list_clear(map, 0); free(map); debug_mm("mailmap: cleared\n"); } @@ -338,6 +364,55 @@ static struct string_list_item *lookup_prefix(struct string_list *map, return NULL; } +/* + * Convert an email or name into a hashed form for comparison. The hashed form + * will be created in the form + * @sha256:c68b7a430ac8dee9676ec77a387194e23f234d024e03d844050cf6c01775c8f6, + * which would be the hashed form for "doe@example.com". + */ +static char *hashed_form(struct strbuf *buf, const struct git_hash_algo *algop, const char *key, size_t keylen) +{ + git_hash_ctx ctx; + unsigned char hashbuf[GIT_MAX_RAWSZ]; + char hexbuf[GIT_MAX_HEXSZ + 1]; + + algop->init_fn(&ctx); + algop->update_fn(&ctx, key, keylen); + algop->final_fn(hashbuf, &ctx); + hash_to_hex_algop_r(hexbuf, hashbuf, algop); + + strbuf_addf(buf, "@%s:%s", algop->name, hexbuf); + return buf->buf; +} + +static struct string_list_item *lookup_one(struct string_list *map, + const char *string, size_t len, + unsigned hashed_count, + unsigned *hashed_seen) +{ + struct strbuf buf = STRBUF_INIT; + struct string_list_item *item = lookup_prefix(map, string, len); + if (item || !hashed_count || hashed_count == *hashed_seen) + return item; + + hashed_form(&buf, &hash_algos[GIT_HASH_SHA256], string, len); + item = lookup_prefix(map, buf.buf, buf.len); + if (item) { + struct mailmap_info *mi = (struct mailmap_info *)item->util; + char *s = xstrndup(string, len); + map->strdup_strings = 0; + item = string_list_insert(map, s); + map->strdup_strings = 1; + if (!item->util) { + item->util = mi; + mi->refcount++; + (*hashed_seen)++; + } + } + strbuf_release(&buf); + return item; +} + int map_user(struct mailmap *map, const char **email, size_t *emaillen, const char **name, size_t *namelen) @@ -350,7 +425,7 @@ int map_user(struct mailmap *map, (int)*namelen, debug_str(*name), (int)*emaillen, debug_str(*email)); - item = lookup_prefix(map->mailmap, searchable_email, *emaillen); + item = lookup_one(map->mailmap, searchable_email, *emaillen, map->hashed_count, &map->hashed_seen); free(searchable_email); if (item != NULL) { me = (struct mailmap_entry *)item->util; @@ -361,7 +436,7 @@ int map_user(struct mailmap *map, * simple entry. */ struct string_list_item *subitem; - subitem = lookup_prefix(&me->namemap, *name, *namelen); + subitem = lookup_one(&me->namemap, *name, *namelen, me->hashed_count, &me->hashed_seen); if (subitem) item = subitem; } diff --git a/mailmap.h b/mailmap.h index 4cdce3b064..69f8be5705 100644 --- a/mailmap.h +++ b/mailmap.h @@ -5,6 +5,8 @@ struct mailmap { struct string_list *mailmap; + unsigned hashed_count; + unsigned hashed_seen; }; int read_mailmap(struct mailmap *map, char **repo_abbrev); diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index df4a0e03cc..004b4a3d40 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -62,6 +62,41 @@ test_expect_success 'check-mailmap --stdin arguments' ' test_cmp expect actual ' +test_expect_success 'hashed mailmap' ' + test_config mailmap.file ./hashed && + hashed_author_name="@sha256:$(printf "$GIT_AUTHOR_NAME" | test-tool sha256)" && + hashed_author_email="@sha256:$(printf "$GIT_AUTHOR_EMAIL" | test-tool sha256)" && + cat >expect <<-EOF && + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> + EOF + + cat >hashed <<-EOF && + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL> + EOF + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && + test_cmp expect actual && + + cat >hashed <<-EOF && + Wrong $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL> + EOF + # Check that we prefer literal matches over hashed names. + git check-mailmap "$hashed_author_name <$GIT_AUTHOR_EMAIL>" >actual && + test_cmp expect actual && + + cat >hashed <<-EOF && + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$hashed_author_email> + EOF + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && + test_cmp expect actual && + + cat >hashed <<-EOF && + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> <$hashed_author_email> + EOF + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && + test_cmp expect actual +' + test_expect_success 'check-mailmap bogus contact' ' test_must_fail git check-mailmap bogus '