diff mbox series

[v3] git-send-email: Use sanitized address when reading mbox body

Message ID 20240628085018.65076-2-csokas.bence@prolan.hu (mailing list archive)
State Superseded
Headers show
Series [v3] git-send-email: Use sanitized address when reading mbox body | expand

Commit Message

Csókás Bence June 28, 2024, 8:50 a.m. UTC
Commas and other punctuation marks in 'Cc: ', 'Signed-off-by: '
etc. lines mess with git-send-email. This is handled by calling
`sanitize_address()` before adding addresses to @cc. This function
was already being called, but was only used for comparing it to
$author for suppression purposes.

Note that sanitization is only done for the message body, as
`git format-patch` already RFC 2047-encodes mbox headers, so
those are generally trusted to be sane. Also note that
`sanitize_address()` does not process the mailbox addresses,
so it is up to `sendmail` to handle special characters there
(e.g. there are mailboxes in regular use with '+'-es in them).

Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
---

[ sorry, I forgot the --notes: ]

Notes:
    Changes in v2:
    * added testcase to t9001
    * added rationale behind trusting mbox headers and the address-parts
    Changes in v3:
    * more testcases
    * clarified wording in message

 git-send-email.perl   |  4 ++--
 t/t9001-send-email.sh | 51 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 2 deletions(-)

Comments

Junio C Hamano June 28, 2024, 7:30 p.m. UTC | #1
"Csókás, Bence" <csokas.bence@prolan.hu> writes:

> Commas and other punctuation marks in 'Cc: ', 'Signed-off-by: '
> etc. lines mess with git-send-email.

Calling these "trailers" would make it more obvious that you are
talking about parts of the log message ("S-o-b" would not be
mistaken as an e-mail header, but you cannot complain if a reader
mistook the mention of "Cc" as a reference to an e-mail header).

More importantly, "mess with" is true when potentially
address-looking strings from the trailer lines are added to @cc
list.  So perhaps

    Addresses that are mentioned on the trailers in the commit log
    messages (e.g., "Reviewed-by") are added to the "Cc:" list
    (unless suppressed) by "git send-email".  These hand-written
    addresses may be malformed (e.g., having unquoted "." and other
    punctutation marks in the display-name part).

> This is handled by calling
> `sanitize_address()` before adding addresses to @cc. This function
> was already being called, but was only used for comparing it to
> $author for suppression purposes.

So this is not telling the truth.  This is *NOT* handled by calling
the helper function, as the result is *NOT* added to @cc.  I would
suggest striking this paragraph (and the rest of the log message)
out and rewrite, perhaps like so:

    The code does use sanitize_address on these address-looking
    strings to make them into valid addresses, but the result was
    used only to see if the address should be suppressed.  The
    original string taken from the message was added to the @cc list.

    Because the addresses on trailer lines are hand-written and more
    likely to contain malformed addresses, when adding to the @cc
    list, use the result from sanitize_address, not the original.

    Note that we do not modify the behaviour for addresses taken
    from the e-mail headers, as they are more likely to be machine
    generated and well-formed.

or something.

> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
> index 58699f8e4e..8bbbf20855 100755
> --- a/t/t9001-send-email.sh
> +++ b/t/t9001-send-email.sh
> @@ -1299,6 +1299,57 @@ test_expect_success $PREREQ 'utf8 sender is not duplicated' '
>  	test_line_count = 1 msgfrom
>  '
>  
> +test_expect_success $PREREQ 'setup expect for cc list' "
> +cat >expected-cc <<\EOF
> +!recipient@example.com!
> +!author@example.com!
> +!one@example.com!
> +!odd_?=mail@example.com!
> +!doug@example.com!
> +!thor.au@example.com!
> +EOF
  > +"
> +
> +test_expect_success $PREREQ 'cc list is sanitized' '
> +	clean_fake_sendmail &&
> +	test_commit weird_cc_body &&
> +	test_when_finished "git reset --hard HEAD^" &&
> +	git commit --amend -F - <<-EOF &&
> +	Test Cc: sanitization.
> +
> +	Cc: Person, One <one@example.com>
> +	Reviewed-by: Füñný Nâmé <odd_?=mail@example.com>
> +	Reported-by: bugger on Jira
> +	Reported-by: Douglas Reporter <doug@example.com> [from Jira profile]
> +	BugID: 12345
> +	Signed-off-by: A. U. Thor <thor.au@example.com>
> +	EOF
> +	git send-email -1 --to=recipient@example.com \
> +		--smtp-server="$(pwd)/fake.sendmail" >actual-show-all-headers &&
> +	test_cmp expected-cc commandline1 &&
> +	test_grep "^(body) Adding cc: \"Person, One\" <one@example.com>" actual-show-all-headers &&
> +	test_grep "^(body) Adding cc: =?UTF-8?q?F=C3=BC=C3=B1n=C3=BD=20N=C3=A2m=C3=A9?="\
> +" <odd_?=mail@example.com>" actual-show-all-headers &&
> +	test_grep "^(body) Adding cc: Douglas Reporter <doug@example.com>" actual-show-all-headers &&
> +	test_grep "^(body) Adding cc: \"A. U. Thor\" <thor.au@example.com>" actual-show-all-headers
> +'

The test does not check that "bugger on Jira" and "12345" are not
used as recipients, but it should.  Otherwise your effort to add the
extra trailers to the test goes wasted.

It is a good choice to use Reported-by and BugID as two extra ones,
by the way.  The mechanisms that causes them omitted from the
resulting message are different.

> +
> +test_expect_success $PREREQ 'quotes are sanitized in cc list' "
> +	clean_fake_sendmail &&
> +	test_commit quote_in_cc_body &&
> +	test_when_finished \"git reset --hard HEAD^\" &&
> +	git commit --amend -F - <<-EOF &&
> +	Quotation marks sanitization in Cc:.
> +
> +	Cc: P'erson, One <one@example.com>
> +	Reported-by: \"Douglas 'Bug' Reporter\" <doug@example.com>
> +	EOF

Quoting the test body in a pair of double quotes is error prone.  Do
not do it (and you do have a bug here that was caused by it).

Rather, write ${SQ} where you would write a single quote in your
test body, and enclose the test body in a pair of single quotes,
like everybody else does.

> +	git send-email -1 --to=recipient@example.com \
> +		--smtp-server=\"$(pwd)/fake.sendmail\" >actual-show-all-headers &&

Here is the bug.  This $(pwd) command substition is done *before*
the test runs---it is expanded while the shell computes the command
line to run test_expect_success (as part of the test body string
that is quoted in a pair of double-quotes).  So if your cwd has a
character that needs to be quoted inside double-quotes to be a valid
value for the "--smtp-server" option, you end up breaking the shell
quoting rules.  In other words: what happens if your $(pwd) has a
double-quote in it?

> +	test_grep \"^(body) Adding cc: \\\"P'erson, One\\\" <one@example.com>\" actual-show-all-headers &&
> +	test_grep \"^(body) Adding cc: \\\"Douglas 'Bug' Reporter\\\" <doug@example.com>\" actual-show-all-headers
> +"
> +

So, I do not see a strong reason why we want to have this as a
separate test.  We can write and test addresses with single quotes
in it as part of the previous test.

A squashable change on top of your version is attached at the end.

 * Added a name with apostrophe in it on the Cc: line, and check
   that the address appears on the recipient list and also in the
   log.

 * Added checks for "Reported-by: bugger on Jira" that explicitly
   gets ignored in the log, and "BugId:" that is not even considered
   for Cc: (in other words, not appearing in the log at all).

Thanks.

 t/t9001-send-email.sh | 21 +++++----------------
 1 file changed, 5 insertions(+), 16 deletions(-)

diff --git c/t/t9001-send-email.sh w/t/t9001-send-email.sh
index 658f3a72ae..64c56677cd 100755
--- c/t/t9001-send-email.sh
+++ w/t/t9001-send-email.sh
@@ -1304,6 +1304,7 @@ cat >expected-cc <<\EOF
 !recipient@example.com!
 !author@example.com!
 !one@example.com!
+!os@example.com!
 !odd_?=mail@example.com!
 !doug@example.com!
 !thor.au@example.com!
@@ -1318,6 +1319,7 @@ test_expect_success $PREREQ 'cc list is sanitized' '
 	Test Cc: sanitization.
 
 	Cc: Person, One <one@example.com>
+	Cc: Ronnie O${SQ}Sullivan <os@example.com>
 	Reviewed-by: Füñný Nâmé <odd_?=mail@example.com>
 	Reported-by: bugger on Jira
 	Reported-by: Douglas Reporter <doug@example.com> [from Jira profile]
@@ -1328,28 +1330,15 @@ test_expect_success $PREREQ 'cc list is sanitized' '
 		--smtp-server="$(pwd)/fake.sendmail" >actual-show-all-headers &&
 	test_cmp expected-cc commandline1 &&
 	test_grep "^(body) Adding cc: \"Person, One\" <one@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: Ronnie O${SQ}Sullivan <os@example.com>" actual-show-all-headers &&
 	test_grep "^(body) Adding cc: =?UTF-8?q?F=C3=BC=C3=B1n=C3=BD=20N=C3=A2m=C3=A9?="\
 " <odd_?=mail@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Ignoring Reported-by .* bugger on Jira" actual-show-all-headers &&
 	test_grep "^(body) Adding cc: Douglas Reporter <doug@example.com>" actual-show-all-headers &&
+	test_grep ! "12345" actual-show-all-headers &&
 	test_grep "^(body) Adding cc: \"A. U. Thor\" <thor.au@example.com>" actual-show-all-headers
 '
 
-test_expect_success $PREREQ 'quotes are sanitized in cc list' "
-	clean_fake_sendmail &&
-	test_commit quote_in_cc_body &&
-	test_when_finished \"git reset --hard HEAD^\" &&
-	git commit --amend -F - <<-EOF &&
-	Quotation marks sanitization in Cc:.
-
-	Cc: P'erson, One <one@example.com>
-	Reported-by: \"Douglas 'Bug' Reporter\" <doug@example.com>
-	EOF
-	git send-email -1 --to=recipient@example.com \
-		--smtp-server=\"$(pwd)/fake.sendmail\" >actual-show-all-headers &&
-	test_grep \"^(body) Adding cc: \\\"P'erson, One\\\" <one@example.com>\" actual-show-all-headers &&
-	test_grep \"^(body) Adding cc: \\\"Douglas 'Bug' Reporter\\\" <doug@example.com>\" actual-show-all-headers
-"
-
 test_expect_success $PREREQ 'sendemail.composeencoding works' '
 	clean_fake_sendmail &&
 	git config sendemail.composeencoding iso-8859-1 &&
diff mbox series

Patch

diff --git a/git-send-email.perl b/git-send-email.perl
index f0be4b4560..72044e5ef3 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1847,9 +1847,9 @@  sub pre_process_file {
 					$what, $_) unless $quiet;
 				next;
 			}
-			push @cc, $c;
+			push @cc, $sc;
 			printf(__("(body) Adding cc: %s from line '%s'\n"),
-				$c, $_) unless $quiet;
+				$sc, $_) unless $quiet;
 		}
 	}
 	close $fh;
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 58699f8e4e..8bbbf20855 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1299,6 +1299,57 @@  test_expect_success $PREREQ 'utf8 sender is not duplicated' '
 	test_line_count = 1 msgfrom
 '
 
+test_expect_success $PREREQ 'setup expect for cc list' "
+cat >expected-cc <<\EOF
+!recipient@example.com!
+!author@example.com!
+!one@example.com!
+!odd_?=mail@example.com!
+!doug@example.com!
+!thor.au@example.com!
+EOF
+"
+
+test_expect_success $PREREQ 'cc list is sanitized' '
+	clean_fake_sendmail &&
+	test_commit weird_cc_body &&
+	test_when_finished "git reset --hard HEAD^" &&
+	git commit --amend -F - <<-EOF &&
+	Test Cc: sanitization.
+
+	Cc: Person, One <one@example.com>
+	Reviewed-by: Füñný Nâmé <odd_?=mail@example.com>
+	Reported-by: bugger on Jira
+	Reported-by: Douglas Reporter <doug@example.com> [from Jira profile]
+	BugID: 12345
+	Signed-off-by: A. U. Thor <thor.au@example.com>
+	EOF
+	git send-email -1 --to=recipient@example.com \
+		--smtp-server="$(pwd)/fake.sendmail" >actual-show-all-headers &&
+	test_cmp expected-cc commandline1 &&
+	test_grep "^(body) Adding cc: \"Person, One\" <one@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: =?UTF-8?q?F=C3=BC=C3=B1n=C3=BD=20N=C3=A2m=C3=A9?="\
+" <odd_?=mail@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: Douglas Reporter <doug@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: \"A. U. Thor\" <thor.au@example.com>" actual-show-all-headers
+'
+
+test_expect_success $PREREQ 'quotes are sanitized in cc list' "
+	clean_fake_sendmail &&
+	test_commit quote_in_cc_body &&
+	test_when_finished \"git reset --hard HEAD^\" &&
+	git commit --amend -F - <<-EOF &&
+	Quotation marks sanitization in Cc:.
+
+	Cc: P'erson, One <one@example.com>
+	Reported-by: \"Douglas 'Bug' Reporter\" <doug@example.com>
+	EOF
+	git send-email -1 --to=recipient@example.com \
+		--smtp-server=\"$(pwd)/fake.sendmail\" >actual-show-all-headers &&
+	test_grep \"^(body) Adding cc: \\\"P'erson, One\\\" <one@example.com>\" actual-show-all-headers &&
+	test_grep \"^(body) Adding cc: \\\"Douglas 'Bug' Reporter\\\" <doug@example.com>\" actual-show-all-headers
+"
+
 test_expect_success $PREREQ 'sendemail.composeencoding works' '
 	clean_fake_sendmail &&
 	git config sendemail.composeencoding iso-8859-1 &&