diff mbox series

[v2,3/5] t4203: add failing test for case-sensitive local-parts and names

Message ID 20210103211849.2691287-4-sandals@crustytoothpaste.net (mailing list archive)
State New
Headers show
Series Hashed mailmap | expand

Commit Message

brian m. carlson Jan. 3, 2021, 9:18 p.m. UTC
Currently, Git always looks up entries in the mailmap in a
case-insensitive way, both for names and addresses, which is, as
explained below, suboptimal.

First, for email addresses, RFC 5321 is clear that only domains are case
insensitive; local-parts (the portion before the at sign) are not.  It
states this:

  The local-part of a mailbox MUST BE treated as case sensitive.
  Therefore, SMTP implementations MUST take care to preserve the case of
  mailbox local-parts.

There exist systems today where local-parts remain case sensitive (and
this author has one), and as such, it's incorrect for us to case fold
them in any way.  Let's add a failing test that indicates this is a
problem, while still keeping the test for case-insensitive domains.

Note that it's also incorrect for us to case-fold names because we don't
guarantee that we're using the locale of the author, and it's impossible
to case-fold names in a locale-insensitive way.  Turkish and Azeri
contain both a dotted and dotless I, and the uppercase ASCII I folds not
to the lowercase ASCII I, but to a dotless version, and vice versa with
the lowercase I.  There are many words in Turkish which differ only in
the dottedness of the I, so it is likely that there are also personal
names which differ in the same way.

That would be a problem even if our implementation were perfect, which
it is not.  We currently fold only ASCII characters, so this feature has
never worked correctly for the vast majority of the users on the planet,
regardless of the locale.  For example, on Linux, even in a Spanish
locale, we don't handle "Simón" properly.  Even if we did handle that,
we'd probably also want to implement Unicode normalization, which we

In general, case-folding text is extremely language- and locale-specific
and requires intimacy with the spelling and grammar of the language in
question and careful attention to the Unicode details in order to
produce a result that is meaningful to humans and conforms with
linguistic and societal norms.

Because we do not have any of the required context with a plain personal
name, we cannot hope to possibly case-fold personal names correctly.  We
should stop trying to do so and just treat them as a series of bytes, so
let's add a test that we don't case-fold personal names as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
 t/t4203-mailmap.sh | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)
diff mbox series


diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 586c3a86b1..32e849504c 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -170,10 +170,35 @@  Repo Guy (1):
-test_expect_success 'name entry after email entry, case-insensitive' '
+test_expect_success 'name entry after email entry, case-insensitive domain' '
 	mkdir -p internal_mailmap &&
 	echo "<bugs@company.xy> <bugs@company.xx>" >internal_mailmap/.mailmap &&
-	echo "Internal Guy <BUGS@Company.xx>" >>internal_mailmap/.mailmap &&
+	echo "Internal Guy <bugs@Company.xx>" >>internal_mailmap/.mailmap &&
+	git shortlog HEAD >actual &&
+	test_cmp expect actual
+cat >expect <<\EOF
+Repo Guy (1):
+      initial
+nick1 (1):
+      second
+test_expect_failure 'name entry after email entry, case-sensitive local-part' '
+	mkdir -p internal_mailmap &&
+	echo "<bugs@company.xy> <bugs@company.xx>" >internal_mailmap/.mailmap &&
+	echo "Internal Guy <BUGS@company.xx>" >>internal_mailmap/.mailmap &&
+	git shortlog HEAD >actual &&
+	test_cmp expect actual
+test_expect_failure 'name entry after email entry, case-sensitive personal name' '
+	mkdir -p internal_mailmap &&
+	echo "<bugs@company.xy> <bugs@company.xx>" >internal_mailmap/.mailmap &&
+	echo "Nick1 <bugs@company.xx> NICK1 <bugs@company.xx>" >internal_mailmap/.mailmap &&
 	git shortlog HEAD >actual &&
 	test_cmp expect actual