diff mbox series

[v2] git-send-email: Use sanitized address when reading mbox body

Message ID 20240626132440.3762363-2-csokas.bence@prolan.hu (mailing list archive)
State Superseded
Headers show
Series [v2] git-send-email: Use sanitized address when reading mbox body | expand

Commit Message

Csókás, Bence June 26, 2024, 1:24 p.m. UTC
Commas and other punctuation marks in 'Cc: ', 'Signed-off-by: '
etc. lines mess with git-send-email. In parsing the mbox headers,
this is handled by calling `sanitize_address()`. This function
is called when parsing the message body as well, but was only
used for comparing it to $author. Now we add it to @cc too.

Note that sanitization is only done for the message body, as
`git format-patch` already RFC 2047-encodes mbox headers, so
those are generally trusted to be sane. Also note that
`sanitize_address()` does not process the mailbox addresses,
so it is up to `sendmail` to handle special characters there
(e.g. there are mailboxes in rwgular use with '+'-es in them).

Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
---

Notes:
    Changes in v2:
    * added testcase to t9001
    * added rationale behind trusting mbox headers and the address-parts

 git-send-email.perl   |  4 ++--
 t/t9001-send-email.sh | 30 ++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 2 deletions(-)

Comments

Junio C Hamano June 26, 2024, 5:28 p.m. UTC | #1
"Csókás, Bence" <csokas.bence@prolan.hu> writes:

> Commas and other punctuation marks in 'Cc: ', 'Signed-off-by: '
> etc. lines mess with git-send-email. In parsing the mbox headers,
> this is handled by calling `sanitize_address()`. This function
> is called when parsing the message body as well, but was only
> used for comparing it to $author. Now we add it to @cc too.

The above is misleading, though.  We do use sanitize_address on
addresses we find on e-mail headers, but the result is not used
in @to or @cc, the addresses we use to actually send the message
out.

Perhaps phrase it more like ...

    When we check addresses found on the mbox headers to see if we
    want to add them to Cc:, we use sanitize_address() function to
    normalize the addresses before passing them to the suppression
    mechanism, but we use the original addresses for the purpose of
    sending the message out.

    We use the same logic on the address-looking strings found on
    trailer lines that appear in the message body.  Sanitized
    addresses are used for Cc-suppression purposes, but the original
    addresses as written by the end-user are used as the mail
    destination.

    There are certain quoting rules for e-mail addresses, and unlike
    addresses on e-mail headers that are generated by format-patch,
    hand-written addresses on the trailer lines are more likely to
    violate them by mistake.

    When adding the address found on a trailer line in the message
    body, use the sanitized address for both sending the message
    out, as well as checking with Cc-suppression mechanism, to
    reduce the risk of malformed hand-written addresses to get the
    message rejected (but keep using the original addresses found on
    the e-mail headers in the input message for e-mail destination).

... or something like that?

> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
> index 58699f8e4e..7e0b8ae57c 100755
> --- a/t/t9001-send-email.sh
> +++ b/t/t9001-send-email.sh

This test uses mixture of test_grep and grep, so use of "grep" below
is fine (but in the longer term we should make sure the tests use
the former for better debuggability).

As I always say, we should resist the temptation to write our tests
just to demonstrate our shiny new toy.  In this case, the test is
too focused to show that the system will give a best-effort output
when fed invalid and/or malformed addresses, but it does not see
what happens to a well formed addresses (ideally they are passed
intact, but is that what happens with the new code?).  Perhaps add
one or two trailer lines with valid addresses (and non-address, like
"BugId: 143421", that should not appear at all in the output) on
them?

Thanks.

> +test_expect_success $PREREQ 'cc list is sanitized' '
> +	clean_fake_sendmail &&
> +	test_commit weird_cc_body &&
> +	test_when_finished "git reset --hard HEAD^" &&
> +	git commit --amend -F - <<-EOF &&
> +	Test Cc: sanitization.
> +
> +	Cc: Person, One <one@example.com>
> +	Reviewed-by: Füñný Nâmé <odd_?=mail@example.com>
> +	Signed-off-by: A. U. Thor <thor.au@example.com>
> +	EOF
> +	git send-email -1 --to=recipient@example.com \
> +		--smtp-server="$(pwd)/fake.sendmail" >actual-show-all-headers &&
> +	test_cmp expected-cc commandline1 &&
> +	grep "^(body) Adding cc: \"Person, One\" <one@example.com>" actual-show-all-headers &&
> +	grep "^(body) Adding cc: =?UTF-8?q?F=C3=BC=C3=B1n=C3=BD=20N=C3=A2m=C3=A9?="\
> +" <odd_?=mail@example.com>" actual-show-all-headers &&
> +	grep "^(body) Adding cc: \"A. U. Thor\" <thor.au@example.com>" actual-show-all-headers
> +'

>  test_expect_success $PREREQ 'sendemail.composeencoding works' '
>  	clean_fake_sendmail &&
>  	git config sendemail.composeencoding iso-8859-1 &&
Csókás, Bence June 27, 2024, 8:37 a.m. UTC | #2
On 6/26/24 19:28, Junio C Hamano wrote:
>> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
>> index 58699f8e4e..7e0b8ae57c 100755
>> --- a/t/t9001-send-email.sh
>> +++ b/t/t9001-send-email.sh
> 
> This test uses mixture of test_grep and grep, so use of "grep" below
> is fine (but in the longer term we should make sure the tests use
> the former for better debuggability).

Ok, I'll use test_grep everywhere.

> As I always say, we should resist the temptation to write our tests
> just to demonstrate our shiny new toy.  In this case, the test is
> too focused to show that the system will give a best-effort output
> when fed invalid and/or malformed addresses, but it does not see
> what happens to a well formed addresses (ideally they are passed
> intact, but is that what happens with the new code?).  Perhaps add
> one or two trailer lines with valid addresses (and non-address, like
> "BugId: 143421", that should not appear at all in the output) on
> them?

Valid addresses are already tested by former tests. I don't immediately 
see any tests that would cover non-Cc, non-*-by and non-email-address 
tags, so I might add them; should they be a separate testcase or part of 
this one? Or maybe even a separate patch?

Bence
Junio C Hamano June 27, 2024, 3:09 p.m. UTC | #3
Csókás Bence <csokas.bence@prolan.hu> writes:

> Valid addresses are already tested by former tests. I don't
> immediately see any tests that would cover non-Cc, non-*-by and
> non-email-address tags, so I might add them; should they be a separate
> testcase or part of this one? Or maybe even a separate patch?

The tests you added has only malformed addresses on trailers, and
adding a few trailers with valid addresses *and* string that does
not look like an address at all, would be a minimal fix that makes
the test complete.

It makes it clear that the point of the new test is about strings
found on trailers (that are dealt differently from addresses taken
from other places) are handled appropriately, checked how they are
shown to the MTA (with the comparison !address!  output from
fake.sendmail script) and how their additions are logged in the
verbose output (with the pattern match with "^(body) Adding cc:
...").  Checking that totally irrelevant ones (e.g. bugId) are not
even used as addresses, strings meant as addresses but malformed are
used after correction, and well-formed addresses are used as-is,
would form a complete set of observable behaviour from this feature.

Thanks.
diff mbox series

Patch

diff --git a/git-send-email.perl b/git-send-email.perl
index f0be4b4560..72044e5ef3 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1847,9 +1847,9 @@  sub pre_process_file {
 					$what, $_) unless $quiet;
 				next;
 			}
-			push @cc, $c;
+			push @cc, $sc;
 			printf(__("(body) Adding cc: %s from line '%s'\n"),
-				$c, $_) unless $quiet;
+				$sc, $_) unless $quiet;
 		}
 	}
 	close $fh;
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 58699f8e4e..7e0b8ae57c 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1299,6 +1299,36 @@  test_expect_success $PREREQ 'utf8 sender is not duplicated' '
 	test_line_count = 1 msgfrom
 '
 
+test_expect_success $PREREQ 'setup expect for cc list' "
+cat >expected-cc <<\EOF
+!recipient@example.com!
+!author@example.com!
+!one@example.com!
+!odd_?=mail@example.com!
+!thor.au@example.com!
+EOF
+"
+
+test_expect_success $PREREQ 'cc list is sanitized' '
+	clean_fake_sendmail &&
+	test_commit weird_cc_body &&
+	test_when_finished "git reset --hard HEAD^" &&
+	git commit --amend -F - <<-EOF &&
+	Test Cc: sanitization.
+
+	Cc: Person, One <one@example.com>
+	Reviewed-by: Füñný Nâmé <odd_?=mail@example.com>
+	Signed-off-by: A. U. Thor <thor.au@example.com>
+	EOF
+	git send-email -1 --to=recipient@example.com \
+		--smtp-server="$(pwd)/fake.sendmail" >actual-show-all-headers &&
+	test_cmp expected-cc commandline1 &&
+	grep "^(body) Adding cc: \"Person, One\" <one@example.com>" actual-show-all-headers &&
+	grep "^(body) Adding cc: =?UTF-8?q?F=C3=BC=C3=B1n=C3=BD=20N=C3=A2m=C3=A9?="\
+" <odd_?=mail@example.com>" actual-show-all-headers &&
+	grep "^(body) Adding cc: \"A. U. Thor\" <thor.au@example.com>" actual-show-all-headers
+'
+
 test_expect_success $PREREQ 'sendemail.composeencoding works' '
 	clean_fake_sendmail &&
 	git config sendemail.composeencoding iso-8859-1 &&