Message ID | 20240626132440.3762363-2-csokas.bence@prolan.hu (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] git-send-email: Use sanitized address when reading mbox body | expand |
"Csókás, Bence" <csokas.bence@prolan.hu> writes: > Commas and other punctuation marks in 'Cc: ', 'Signed-off-by: ' > etc. lines mess with git-send-email. In parsing the mbox headers, > this is handled by calling `sanitize_address()`. This function > is called when parsing the message body as well, but was only > used for comparing it to $author. Now we add it to @cc too. The above is misleading, though. We do use sanitize_address on addresses we find on e-mail headers, but the result is not used in @to or @cc, the addresses we use to actually send the message out. Perhaps phrase it more like ... When we check addresses found on the mbox headers to see if we want to add them to Cc:, we use sanitize_address() function to normalize the addresses before passing them to the suppression mechanism, but we use the original addresses for the purpose of sending the message out. We use the same logic on the address-looking strings found on trailer lines that appear in the message body. Sanitized addresses are used for Cc-suppression purposes, but the original addresses as written by the end-user are used as the mail destination. There are certain quoting rules for e-mail addresses, and unlike addresses on e-mail headers that are generated by format-patch, hand-written addresses on the trailer lines are more likely to violate them by mistake. When adding the address found on a trailer line in the message body, use the sanitized address for both sending the message out, as well as checking with Cc-suppression mechanism, to reduce the risk of malformed hand-written addresses to get the message rejected (but keep using the original addresses found on the e-mail headers in the input message for e-mail destination). ... or something like that? > diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh > index 58699f8e4e..7e0b8ae57c 100755 > --- a/t/t9001-send-email.sh > +++ b/t/t9001-send-email.sh This test uses mixture of test_grep and grep, so use of "grep" below is fine (but in the longer term we should make sure the tests use the former for better debuggability). As I always say, we should resist the temptation to write our tests just to demonstrate our shiny new toy. In this case, the test is too focused to show that the system will give a best-effort output when fed invalid and/or malformed addresses, but it does not see what happens to a well formed addresses (ideally they are passed intact, but is that what happens with the new code?). Perhaps add one or two trailer lines with valid addresses (and non-address, like "BugId: 143421", that should not appear at all in the output) on them? Thanks. > +test_expect_success $PREREQ 'cc list is sanitized' ' > + clean_fake_sendmail && > + test_commit weird_cc_body && > + test_when_finished "git reset --hard HEAD^" && > + git commit --amend -F - <<-EOF && > + Test Cc: sanitization. > + > + Cc: Person, One <one@example.com> > + Reviewed-by: Füñný Nâmé <odd_?=mail@example.com> > + Signed-off-by: A. U. Thor <thor.au@example.com> > + EOF > + git send-email -1 --to=recipient@example.com \ > + --smtp-server="$(pwd)/fake.sendmail" >actual-show-all-headers && > + test_cmp expected-cc commandline1 && > + grep "^(body) Adding cc: \"Person, One\" <one@example.com>" actual-show-all-headers && > + grep "^(body) Adding cc: =?UTF-8?q?F=C3=BC=C3=B1n=C3=BD=20N=C3=A2m=C3=A9?="\ > +" <odd_?=mail@example.com>" actual-show-all-headers && > + grep "^(body) Adding cc: \"A. U. Thor\" <thor.au@example.com>" actual-show-all-headers > +' > test_expect_success $PREREQ 'sendemail.composeencoding works' ' > clean_fake_sendmail && > git config sendemail.composeencoding iso-8859-1 &&
On 6/26/24 19:28, Junio C Hamano wrote: >> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh >> index 58699f8e4e..7e0b8ae57c 100755 >> --- a/t/t9001-send-email.sh >> +++ b/t/t9001-send-email.sh > > This test uses mixture of test_grep and grep, so use of "grep" below > is fine (but in the longer term we should make sure the tests use > the former for better debuggability). Ok, I'll use test_grep everywhere. > As I always say, we should resist the temptation to write our tests > just to demonstrate our shiny new toy. In this case, the test is > too focused to show that the system will give a best-effort output > when fed invalid and/or malformed addresses, but it does not see > what happens to a well formed addresses (ideally they are passed > intact, but is that what happens with the new code?). Perhaps add > one or two trailer lines with valid addresses (and non-address, like > "BugId: 143421", that should not appear at all in the output) on > them? Valid addresses are already tested by former tests. I don't immediately see any tests that would cover non-Cc, non-*-by and non-email-address tags, so I might add them; should they be a separate testcase or part of this one? Or maybe even a separate patch? Bence
Csókás Bence <csokas.bence@prolan.hu> writes: > Valid addresses are already tested by former tests. I don't > immediately see any tests that would cover non-Cc, non-*-by and > non-email-address tags, so I might add them; should they be a separate > testcase or part of this one? Or maybe even a separate patch? The tests you added has only malformed addresses on trailers, and adding a few trailers with valid addresses *and* string that does not look like an address at all, would be a minimal fix that makes the test complete. It makes it clear that the point of the new test is about strings found on trailers (that are dealt differently from addresses taken from other places) are handled appropriately, checked how they are shown to the MTA (with the comparison !address! output from fake.sendmail script) and how their additions are logged in the verbose output (with the pattern match with "^(body) Adding cc: ..."). Checking that totally irrelevant ones (e.g. bugId) are not even used as addresses, strings meant as addresses but malformed are used after correction, and well-formed addresses are used as-is, would form a complete set of observable behaviour from this feature. Thanks.
diff --git a/git-send-email.perl b/git-send-email.perl index f0be4b4560..72044e5ef3 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -1847,9 +1847,9 @@ sub pre_process_file { $what, $_) unless $quiet; next; } - push @cc, $c; + push @cc, $sc; printf(__("(body) Adding cc: %s from line '%s'\n"), - $c, $_) unless $quiet; + $sc, $_) unless $quiet; } } close $fh; diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh index 58699f8e4e..7e0b8ae57c 100755 --- a/t/t9001-send-email.sh +++ b/t/t9001-send-email.sh @@ -1299,6 +1299,36 @@ test_expect_success $PREREQ 'utf8 sender is not duplicated' ' test_line_count = 1 msgfrom ' +test_expect_success $PREREQ 'setup expect for cc list' " +cat >expected-cc <<\EOF +!recipient@example.com! +!author@example.com! +!one@example.com! +!odd_?=mail@example.com! +!thor.au@example.com! +EOF +" + +test_expect_success $PREREQ 'cc list is sanitized' ' + clean_fake_sendmail && + test_commit weird_cc_body && + test_when_finished "git reset --hard HEAD^" && + git commit --amend -F - <<-EOF && + Test Cc: sanitization. + + Cc: Person, One <one@example.com> + Reviewed-by: Füñný Nâmé <odd_?=mail@example.com> + Signed-off-by: A. U. Thor <thor.au@example.com> + EOF + git send-email -1 --to=recipient@example.com \ + --smtp-server="$(pwd)/fake.sendmail" >actual-show-all-headers && + test_cmp expected-cc commandline1 && + grep "^(body) Adding cc: \"Person, One\" <one@example.com>" actual-show-all-headers && + grep "^(body) Adding cc: =?UTF-8?q?F=C3=BC=C3=B1n=C3=BD=20N=C3=A2m=C3=A9?="\ +" <odd_?=mail@example.com>" actual-show-all-headers && + grep "^(body) Adding cc: \"A. U. Thor\" <thor.au@example.com>" actual-show-all-headers +' + test_expect_success $PREREQ 'sendemail.composeencoding works' ' clean_fake_sendmail && git config sendemail.composeencoding iso-8859-1 &&
Commas and other punctuation marks in 'Cc: ', 'Signed-off-by: ' etc. lines mess with git-send-email. In parsing the mbox headers, this is handled by calling `sanitize_address()`. This function is called when parsing the message body as well, but was only used for comparing it to $author. Now we add it to @cc too. Note that sanitization is only done for the message body, as `git format-patch` already RFC 2047-encodes mbox headers, so those are generally trusted to be sane. Also note that `sanitize_address()` does not process the mailbox addresses, so it is up to `sendmail` to handle special characters there (e.g. there are mailboxes in rwgular use with '+'-es in them). Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> --- Notes: Changes in v2: * added testcase to t9001 * added rationale behind trusting mbox headers and the address-parts git-send-email.perl | 4 ++-- t/t9001-send-email.sh | 30 ++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 2 deletions(-)