diff mbox series

[v2,1/3] check-mailmap: accept "user@host" contacts

Message ID 20240819-jk-send-email-mailmap-support-v2-1-d212c3f9e505@gmail.com (mailing list archive)
State Superseded
Headers show
Series send-email: add --mailmap support | expand

Commit Message

Keller, Jacob E Aug. 20, 2024, 12:07 a.m. UTC
From: Jacob Keller <jacob.keller@gmail.com>

git check-mailmap splits each provided contact using split_ident_line.
This function requires that the contact either be of the form "Name
<user@host>" or of the form "<user@host>". In particular, if the mail
portion of the contact is not surrounded by angle brackets,
split_ident_line will reject it.

This results in git check-mailmap rejecting attempts to translate simple
email addresses:

  $ git check-mailmap user@host
  fatal: unable to parse contact: user@host

This limits the usability of check-mailmap as it requires placing angle
brackets around plain email addresses.

In particular, attempting to use git check-mailmap to support mapping
addresses in git send-email is not straight forward. The sanitization
and validation functions in git send-email strip angle brackets from
plain email addresses. It is not trivial to add brackets prior to
invoking git check-mailmap.

Instead, modify check_mailmap() to allow such strings as contacts. In
particular, treat any line which cannot be split by split_ident_line as
a simple email address.

No attempt is made to actually parse the address line to validate that
it is actually an address. Doing so is non-trivial, and provides little
value. Either the provided input will correctly map via the map_user
call, or it will fail and be printed out, surrounded by angle brackets:

  $ git check-mailmap user@host
  <user@host>

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 builtin/check-mailmap.c             | 18 +++++++++++-------
 Documentation/git-check-mailmap.txt |  8 ++++----
 t/t4203-mailmap.sh                  | 33 +++++++++++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 13 deletions(-)

Comments

Josh Steadmon Aug. 21, 2024, 5:50 p.m. UTC | #1
On 2024.08.19 17:07, Jacob Keller wrote:
> From: Jacob Keller <jacob.keller@gmail.com>
> 
> git check-mailmap splits each provided contact using split_ident_line.
> This function requires that the contact either be of the form "Name
> <user@host>" or of the form "<user@host>". In particular, if the mail
> portion of the contact is not surrounded by angle brackets,
> split_ident_line will reject it.
> 
> This results in git check-mailmap rejecting attempts to translate simple
> email addresses:
> 
>   $ git check-mailmap user@host
>   fatal: unable to parse contact: user@host
> 
> This limits the usability of check-mailmap as it requires placing angle
> brackets around plain email addresses.
> 
> In particular, attempting to use git check-mailmap to support mapping
> addresses in git send-email is not straight forward. The sanitization
> and validation functions in git send-email strip angle brackets from
> plain email addresses. It is not trivial to add brackets prior to
> invoking git check-mailmap.
> 
> Instead, modify check_mailmap() to allow such strings as contacts. In
> particular, treat any line which cannot be split by split_ident_line as
> a simple email address.
> 
> No attempt is made to actually parse the address line to validate that
> it is actually an address. Doing so is non-trivial, and provides little
> value. Either the provided input will correctly map via the map_user
> call, or it will fail and be printed out, surrounded by angle brackets:
> 
>   $ git check-mailmap user@host
>   <user@host>
> 
> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> ---
>  builtin/check-mailmap.c             | 18 +++++++++++-------
>  Documentation/git-check-mailmap.txt |  8 ++++----
>  t/t4203-mailmap.sh                  | 33 +++++++++++++++++++++++++++++++--
>  3 files changed, 46 insertions(+), 13 deletions(-)
> 
> diff --git a/builtin/check-mailmap.c b/builtin/check-mailmap.c
> index b8a05b8e07b5..6b7fb53494f0 100644
> --- a/builtin/check-mailmap.c
> +++ b/builtin/check-mailmap.c
> @@ -25,13 +25,17 @@ static void check_mailmap(struct string_list *mailmap, const char *contact)
>  	size_t namelen, maillen;
>  	struct ident_split ident;
>  
> -	if (split_ident_line(&ident, contact, strlen(contact)))
> -		die(_("unable to parse contact: %s"), contact);
> -
> -	name = ident.name_begin;
> -	namelen = ident.name_end - ident.name_begin;
> -	mail = ident.mail_begin;
> -	maillen = ident.mail_end - ident.mail_begin;
> +	if (!split_ident_line(&ident, contact, strlen(contact))) {
> +		name = ident.name_begin;
> +		namelen = ident.name_end - ident.name_begin;
> +		mail = ident.mail_begin;
> +		maillen = ident.mail_end - ident.mail_begin;
> +	} else {
> +		name = NULL;
> +		namelen = 0;
> +		mail = contact;
> +		maillen = strlen(contact);
> +	}
>  
>  	map_user(mailmap, &mail, &maillen, &name, &namelen);
>  
> diff --git a/Documentation/git-check-mailmap.txt b/Documentation/git-check-mailmap.txt
> index 02f441832321..7747e38e25e3 100644
> --- a/Documentation/git-check-mailmap.txt
> +++ b/Documentation/git-check-mailmap.txt
> @@ -15,10 +15,10 @@ SYNOPSIS
>  DESCRIPTION
>  -----------
>  
> -For each ``Name $$<user@host>$$'' or ``$$<user@host>$$'' from the command-line
> -or standard input (when using `--stdin`), look up the person's canonical name
> -and email address (see "Mapping Authors" below). If found, print them;
> -otherwise print the input as-is.
> +For each ``Name $$<user@host>$$'', ``$$<user@host>$$'', or ``$$user@host$$''
> +from the command-line or standard input (when using `--stdin`), look up the
> +person's canonical name and email address (see "Mapping Authors" below). If
> +found, print them; otherwise print the input as-is.
>  
>  
>  OPTIONS
> diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
> index 79e5f42760d9..0c1efe0b2e17 100755
> --- a/t/t4203-mailmap.sh
> +++ b/t/t4203-mailmap.sh
> @@ -73,11 +73,40 @@ test_expect_success 'check-mailmap --stdin arguments: mapping' '
>  '
>  
>  test_expect_success 'check-mailmap bogus contact' '
> -	test_must_fail git check-mailmap bogus
> +	cat >expect <<-EOF &&
> +	<bogus>
> +	EOF
> +	git check-mailmap bogus >actual &&
> +	test_cmp expect actual
>  '

I think I'd just remove this test case altogether, IIUC it' doesn't
provide any additional value vs. the "check-mailmap simple address: no
mapping" test below.


>  test_expect_success 'check-mailmap bogus contact --stdin' '
> -	test_must_fail git check-mailmap --stdin bogus </dev/null
> +	cat >expect <<-EOF &&
> +	<bogus>
> +	EOF
> +	cat >stdin <<-EOF &&
> +	bogus
> +	EOF
> +	git check-mailmap --stdin <stdin >actual &&
> +	test_cmp expect actual
> +'

Similarly, I might change this to use a real address instead of "bogus",
as we're no longer checking for invalid input.


> +test_expect_success 'check-mailmap simple address: mapping' '
> +	test_when_finished "rm .mailmap" &&
> +	cat >.mailmap <<-EOF &&
> +	New Name <$GIT_AUTHOR_EMAIL>
> +	EOF
> +	cat .mailmap >expect &&
> +	git check-mailmap "$GIT_AUTHOR_EMAIL" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'check-mailmap simple address: no mapping' '
> +	cat >expect <<-EOF &&
> +	<bugs@company.xx>
> +	EOF
> +	git check-mailmap "bugs@company.xx" >actual &&
> +	test_cmp expect actual
>  '
>  
>  test_expect_success 'No mailmap' '
> 
> -- 
> 2.46.0.124.g2dc1a81c8933
>
Junio C Hamano Aug. 21, 2024, 6:26 p.m. UTC | #2
Josh Steadmon <steadmon@google.com> writes:

>>  test_expect_success 'check-mailmap bogus contact' '
>> -	test_must_fail git check-mailmap bogus
>> +	cat >expect <<-EOF &&
>> +	<bogus>
>> +	EOF
>> +	git check-mailmap bogus >actual &&
>> +	test_cmp expect actual
>>  '
>
> I think I'd just remove this test case altogether, IIUC it' doesn't
> provide any additional value vs. the "check-mailmap simple address: no
> mapping" test below.

Sorry, but I do not follow.  The other one is <bogus@company.xx>
that looks more globally routable address than a local-only <bogus>
mailbox.  Isn't it worth ensuring that we will keep treating them
the same way?

Having said that ...

>> -For each ``Name $$<user@host>$$'' or ``$$<user@host>$$'' from the command-line
>> -or standard input (when using `--stdin`), look up the person's canonical name
>> -and email address (see "Mapping Authors" below). If found, print them;
>> -otherwise print the input as-is.
>> +For each ``Name $$<user@host>$$'', ``$$<user@host>$$'', or ``$$user@host$$''
>> +from the command-line or standard input (when using `--stdin`), look up the
>> +person's canonical name and email address (see "Mapping Authors" below). If
>> +found, print them; otherwise print the input as-is.

... it seems that <user> without <@host> is a supported format.
Should we update the document, too? 

If the @host-less name is meant to trigger a random unspecified
behaviour, whatever the code happens to do, that is perfectly fine,
but then we probably should not be etching it in the stone by
writing a test for it.  So because of a  reason that is completely
different from yours, I'd support removal of the "bogus" test, if
that is the case.

Thanks.
Eric Sunshine Aug. 21, 2024, 6:27 p.m. UTC | #3
On Wed, Aug 21, 2024 at 1:50 PM Josh Steadmon <steadmon@google.com> wrote:
> On 2024.08.19 17:07, Jacob Keller wrote:
> > diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
> > @@ -73,11 +73,40 @@ test_expect_success 'check-mailmap --stdin arguments: mapping' '
> >  test_expect_success 'check-mailmap bogus contact' '
> > -     test_must_fail git check-mailmap bogus
> > +     cat >expect <<-EOF &&
> > +     <bogus>
> > +     EOF
> > +     git check-mailmap bogus >actual &&
> > +     test_cmp expect actual
> >  '
>
> I think I'd just remove this test case altogether, IIUC it' doesn't
> provide any additional value vs. the "check-mailmap simple address: no
> mapping" test below.

I had the same thought upon reading this.

> >  test_expect_success 'check-mailmap bogus contact --stdin' '
> > -     test_must_fail git check-mailmap --stdin bogus </dev/null
> > +     cat >expect <<-EOF &&
> > +     <bogus>
> > +     EOF
> > +     cat >stdin <<-EOF &&
> > +     bogus
> > +     EOF
> > +     git check-mailmap --stdin <stdin >actual &&
> > +     test_cmp expect actual
> > +'
>
> Similarly, I might change this to use a real address instead of "bogus",
> as we're no longer checking for invalid input.

Ditto for this change, but even more so because this is a fairly
significant semantic change. In particular, the documented and
intended behavior of the command when --stdin is specified is that it
will consume email addresses from *both* the command-line and from
standard input, and I think the point of the original test was to
verify that it still correctly recognized a bogus email address
specified as an argument even when --stdin is requested. Given that
understanding (assuming it's correct), then the original test was
already perhaps somewhat iffy anyhow, but after this change, it is
even less meaningful.
Keller, Jacob E Aug. 21, 2024, 7:07 p.m. UTC | #4
On 8/21/2024 11:26 AM, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
>>>  test_expect_success 'check-mailmap bogus contact' '
>>> -	test_must_fail git check-mailmap bogus
>>> +	cat >expect <<-EOF &&
>>> +	<bogus>
>>> +	EOF
>>> +	git check-mailmap bogus >actual &&
>>> +	test_cmp expect actual
>>>  '
>>
>> I think I'd just remove this test case altogether, IIUC it' doesn't
>> provide any additional value vs. the "check-mailmap simple address: no
>> mapping" test below.
> 
> Sorry, but I do not follow.  The other one is <bogus@company.xx>
> that looks more globally routable address than a local-only <bogus>
> mailbox.  Isn't it worth ensuring that we will keep treating them
> the same way?
> 
> Having said that ...
> 
>>> -For each ``Name $$<user@host>$$'' or ``$$<user@host>$$'' from the command-line
>>> -or standard input (when using `--stdin`), look up the person's canonical name
>>> -and email address (see "Mapping Authors" below). If found, print them;
>>> -otherwise print the input as-is.
>>> +For each ``Name $$<user@host>$$'', ``$$<user@host>$$'', or ``$$user@host$$''
>>> +from the command-line or standard input (when using `--stdin`), look up the
>>> +person's canonical name and email address (see "Mapping Authors" below). If
>>> +found, print them; otherwise print the input as-is.
> 
> ... it seems that <user> without <@host> is a supported format.
> Should we update the document, too? 
> 

Its supported by happenstance, but i didn't want to encourage it.

> If the @host-less name is meant to trigger a random unspecified
> behaviour, whatever the code happens to do, that is perfectly fine,
> but then we probably should not be etching it in the stone by
> writing a test for it.  So because of a  reason that is completely
> different from yours, I'd support removal of the "bogus" test, if
> that is the case.
> 

I prefer removing the test. In an ideal world, I think we would probably
only accept actual <user@host> or user@host, but I did not think I would
create sufficiently correct parsing for addresses, so I left it out.

I did try looking up what the rules for addresses are, but it looks like
it got pretty complicated. I guess we could do very basic "it must have
an @ symbol, but anything else goes?


> Thanks.
>
Keller, Jacob E Aug. 21, 2024, 7:09 p.m. UTC | #5
On 8/21/2024 11:27 AM, Eric Sunshine wrote:
> On Wed, Aug 21, 2024 at 1:50 PM Josh Steadmon <steadmon@google.com> wrote:
>> On 2024.08.19 17:07, Jacob Keller wrote:
>>> diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
>>> @@ -73,11 +73,40 @@ test_expect_success 'check-mailmap --stdin arguments: mapping' '
>>>  test_expect_success 'check-mailmap bogus contact' '
>>> -     test_must_fail git check-mailmap bogus
>>> +     cat >expect <<-EOF &&
>>> +     <bogus>
>>> +     EOF
>>> +     git check-mailmap bogus >actual &&
>>> +     test_cmp expect actual
>>>  '
>>
>> I think I'd just remove this test case altogether, IIUC it' doesn't
>> provide any additional value vs. the "check-mailmap simple address: no
>> mapping" test below.
> 
> I had the same thought upon reading this.
> 

Yea, I think I've been convinced. I'll remove these tests.

>>>  test_expect_success 'check-mailmap bogus contact --stdin' '
>>> -     test_must_fail git check-mailmap --stdin bogus </dev/null
>>> +     cat >expect <<-EOF &&
>>> +     <bogus>
>>> +     EOF
>>> +     cat >stdin <<-EOF &&
>>> +     bogus
>>> +     EOF
>>> +     git check-mailmap --stdin <stdin >actual &&
>>> +     test_cmp expect actual
>>> +'
>>
>> Similarly, I might change this to use a real address instead of "bogus",
>> as we're no longer checking for invalid input.>
> Ditto for this change, but even more so because this is a fairly
> significant semantic change. In particular, the documented and
> intended behavior of the command when --stdin is specified is that it
> will consume email addresses from *both* the command-line and from
> standard input, and I think the point of the original test was to
> verify that it still correctly recognized a bogus email address
> specified as an argument even when --stdin is requested. Given that
> understanding (assuming it's correct), then the original test was
> already perhaps somewhat iffy anyhow, but after this change, it is
> even less meaningful.
> 

I tried to ensure we have test cases covering both --stdin and a
combination. I'll double check this in a v3 and ensure test cases
covering the behavior.

Thanks for the feedback!
diff mbox series

Patch

diff --git a/builtin/check-mailmap.c b/builtin/check-mailmap.c
index b8a05b8e07b5..6b7fb53494f0 100644
--- a/builtin/check-mailmap.c
+++ b/builtin/check-mailmap.c
@@ -25,13 +25,17 @@  static void check_mailmap(struct string_list *mailmap, const char *contact)
 	size_t namelen, maillen;
 	struct ident_split ident;
 
-	if (split_ident_line(&ident, contact, strlen(contact)))
-		die(_("unable to parse contact: %s"), contact);
-
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
+	if (!split_ident_line(&ident, contact, strlen(contact))) {
+		name = ident.name_begin;
+		namelen = ident.name_end - ident.name_begin;
+		mail = ident.mail_begin;
+		maillen = ident.mail_end - ident.mail_begin;
+	} else {
+		name = NULL;
+		namelen = 0;
+		mail = contact;
+		maillen = strlen(contact);
+	}
 
 	map_user(mailmap, &mail, &maillen, &name, &namelen);
 
diff --git a/Documentation/git-check-mailmap.txt b/Documentation/git-check-mailmap.txt
index 02f441832321..7747e38e25e3 100644
--- a/Documentation/git-check-mailmap.txt
+++ b/Documentation/git-check-mailmap.txt
@@ -15,10 +15,10 @@  SYNOPSIS
 DESCRIPTION
 -----------
 
-For each ``Name $$<user@host>$$'' or ``$$<user@host>$$'' from the command-line
-or standard input (when using `--stdin`), look up the person's canonical name
-and email address (see "Mapping Authors" below). If found, print them;
-otherwise print the input as-is.
+For each ``Name $$<user@host>$$'', ``$$<user@host>$$'', or ``$$user@host$$''
+from the command-line or standard input (when using `--stdin`), look up the
+person's canonical name and email address (see "Mapping Authors" below). If
+found, print them; otherwise print the input as-is.
 
 
 OPTIONS
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 79e5f42760d9..0c1efe0b2e17 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -73,11 +73,40 @@  test_expect_success 'check-mailmap --stdin arguments: mapping' '
 '
 
 test_expect_success 'check-mailmap bogus contact' '
-	test_must_fail git check-mailmap bogus
+	cat >expect <<-EOF &&
+	<bogus>
+	EOF
+	git check-mailmap bogus >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'check-mailmap bogus contact --stdin' '
-	test_must_fail git check-mailmap --stdin bogus </dev/null
+	cat >expect <<-EOF &&
+	<bogus>
+	EOF
+	cat >stdin <<-EOF &&
+	bogus
+	EOF
+	git check-mailmap --stdin <stdin >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check-mailmap simple address: mapping' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	New Name <$GIT_AUTHOR_EMAIL>
+	EOF
+	cat .mailmap >expect &&
+	git check-mailmap "$GIT_AUTHOR_EMAIL" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check-mailmap simple address: no mapping' '
+	cat >expect <<-EOF &&
+	<bugs@company.xx>
+	EOF
+	git check-mailmap "bugs@company.xx" >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'No mailmap' '