diff mbox series

[2/3] send-email: only consider lines containing @ or <> for automatic Cc'ing

Message ID 20181010111351.5045-3-rv@rasmusvillemoes.dk (mailing list archive)
State New, archived
Headers show
Series send-email: Also pick up cc addresses from -by trailers | expand

Commit Message

Rasmus Villemoes Oct. 10, 2018, 11:13 a.m. UTC
While the address sanitizations routines do accept local addresses, that
is almost never what is meant in a Cc or Signed-off-by trailer.

Looking through all the signed-off-by lines in the linux kernel tree
without a @, there are mostly two patterns: Either just a full name, or
a full name followed by <user at domain.com> (i.e., with the word at
instead of a @), and minor variations. For cc lines, the same patterns
appear, along with lots of "cc stable" variations that do not actually
name stable@vger.kernel.org

  Cc: stable # introduced pre-git times
  cc: stable.kernel.org

In the <user at domain.com> cases, one gets a chance to interactively
fix it. But when there is no <> pair, it seems we end up just using the
first word as a (local) address.

As the number of cases where a local address really was meant is
likely (and anecdotally) quite small compared to the number of cases
where we end up cc'ing a garbage address, insist on at least a @ or a <>
pair being present.

This is also preparation for the next patch, where we are likely to
encounter even more non-addresses in -by lines, such as

  Reported-by: Coverity
  Patch-generated-by: Coccinelle

Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
---
 git-send-email.perl | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Ævar Arnfjörð Bjarmason Oct. 10, 2018, 12:57 p.m. UTC | #1
On Wed, Oct 10 2018, Rasmus Villemoes wrote:

> +			if ($c !~ /.+@.+|<.+>/) {
> +				printf("(body) Ignoring %s from line '%s'\n",
> +					$what, $_) unless $quiet;
> +				next;
> +			}
>  			push @cc, $c;
>  			printf(__("(body) Adding cc: %s from line '%s'\n"),
>  				$c, $_) unless $quiet;

There's a extract_valid_address() function in git-send-email already,
shouldn't this be:

    if (!extract_valid_address($c)) {
    [...]

Or is there a good reason not to use that function in this case?
Rasmus Villemoes Oct. 10, 2018, 1:29 p.m. UTC | #2
On 2018-10-10 14:57, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Oct 10 2018, Rasmus Villemoes wrote:
> 
>> +			if ($c !~ /.+@.+|<.+>/) {
>> +				printf("(body) Ignoring %s from line '%s'\n",
>> +					$what, $_) unless $quiet;
>> +				next;
>> +			}
>>  			push @cc, $c;
>>  			printf(__("(body) Adding cc: %s from line '%s'\n"),
>>  				$c, $_) unless $quiet;
> 
> There's a extract_valid_address() function in git-send-email already,
> shouldn't this be:
> 
>     if (!extract_valid_address($c)) {
>     [...]
> 
> Or is there a good reason not to use that function in this case?
> 

I considered that (and also had a version where I simply insisted on a @
being present), but that means the user no longer would get prompted
about the cases where the address was just slightly obfuscated, e.g. the

Cc: John Doe <john at doe.com>

cases, which would be a regression, I guess. So I do want to pass such
cases through, and have them be dealt with when process_address_list
gets called.

So this is just a rather minimal and simple heuristic, which should
still be able to handle the vast majority of cases correctly, and at
least almost never exclude anything that might have a chance of becoming
a real address.

Rasmus
Junio C Hamano Oct. 11, 2018, 6:06 a.m. UTC | #3
Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:

> I considered that (and also had a version where I simply insisted on a @
> being present), but that means the user no longer would get prompted
> about the cases where the address was just slightly obfuscated, e.g. the
>
> Cc: John Doe <john at doe.com>
>
> cases, which would be a regression, I guess. So I do want to pass such
> cases through, and have them be dealt with when process_address_list
> gets called.

We are only tightening with this patch, and we were passing any
random things through with the original code anyway, so without
[PATCH 3/3], this step must be making it only better, but I have to
wonder one thing.

You keep saying "get prompted" but are we sure we always stop and
ask (and preferrably---fail and abort when the end user is not
available at the terminal to interact) when we have such a
questionable address?
Rasmus Villemoes Oct. 11, 2018, 7:06 a.m. UTC | #4
On 2018-10-11 08:06, Junio C Hamano wrote:
> Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:
> 
>> I considered that (and also had a version where I simply insisted on a @
>> being present), but that means the user no longer would get prompted
>> about the cases where the address was just slightly obfuscated, e.g. the
>>
>> Cc: John Doe <john at doe.com>
>>
>> cases, which would be a regression, I guess. So I do want to pass such
>> cases through, and have them be dealt with when process_address_list
>> gets called.
> 
> We are only tightening with this patch, and we were passing any
> random things through with the original code anyway, so without
> [PATCH 3/3], this step must be making it only better, but I have to
> wonder one thing.
> 
> You keep saying "get prompted" but are we sure we always stop and
> ask (and preferrably---fail and abort when the end user is not
> available at the terminal to interact) when we have such a
> questionable address?
> 

I dunno. I guess I've never considered non-interactive use of
send-email. But the ask() in validate_address does have default q[uit],
which I suppose gets used if stdin is /dev/null? I did do an experiment
adding a bunch of the random odd patterns found in kernel commit
messages to see how send-email reacted before/after this, and the only
things that got filtered away (i.e., no longer prompted about) were
things where the user probably couldn't easily fix it anyway. In the
cases where there was a "Cc: stable" that might be fixed to the proper
stable@vger.kernel.org, the logic in extract_valid_address simply saw
that as a local address, so we didn't use to be prompted, but simply
sent to stable@localhost. Now we simply don't pass that through. So, for
non-interactive use, I guess the effect of this patch is to allow more
cases to complete succesfully, since we filter away (some) cases where
extract_valid_address would cause us to prompt (and thus quit).

So, it seems you're ok with this tightening, but some comment on the
non-interactive use case should be made in the commit log? Or am I
misunderstanding?

Thanks,
Rasmus
Junio C Hamano Oct. 11, 2018, 8:22 a.m. UTC | #5
Rasmus Villemoes <rv@rasmusvillemoes.dk> writes:

> So, it seems you're ok with this tightening, but some comment on the
> non-interactive use case should be made in the commit log? Or am I
> misunderstanding?

I do not think we need any immediate action on this step.  I was
just wondering if we want two classes of "I am not running you
interactively, so assume I said 'yes' when you need to ask me any
confirmation on X and Y" and "I am not running you interactively,
so assume I said 'no' for safety when you need to ask me any
confirmation on Z" supported in the future.  Lines with both @ and
<> fall into the first class, while lines with only <> fall into the
second camp, I would guess.
diff mbox series

Patch

diff --git a/git-send-email.perl b/git-send-email.perl
index 2be5dac337..1916159d2a 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1694,6 +1694,11 @@  sub process_file {
 				next if $suppress_cc{'sob'} and $what =~ /Signed-off-by/i;
 				next if $suppress_cc{'bodycc'} and $what =~ /Cc/i;
 			}
+			if ($c !~ /.+@.+|<.+>/) {
+				printf("(body) Ignoring %s from line '%s'\n",
+					$what, $_) unless $quiet;
+				next;
+			}
 			push @cc, $c;
 			printf(__("(body) Adding cc: %s from line '%s'\n"),
 				$c, $_) unless $quiet;