diff mbox series

send-email: move process_address_list earlier to avoid, uninitialized address error

Message ID 7e2c92ff-b42c-4b3f-a509-9d0785448262@amd.com (mailing list archive)
State New, archived
Headers show
Series send-email: move process_address_list earlier to avoid, uninitialized address error | expand

Commit Message

Michael Strawbridge Oct. 11, 2023, 8:22 p.m. UTC
Move processing of email address lists before the sendemail-validate
hook code.  This fixes email address validation errors when the optional
perl module Email::Valid is installed and multiple addresses are passed
in on a single to/cc argument like --to=foo@example.com,bar@example.com.
---
 git-send-email.perl | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Michael Strawbridge Oct. 11, 2023, 8:25 p.m. UTC | #1
Hi Bagas,

If possible, please try the patch I just sent out and let me know if it works for your situation.

Thanks,

Michael

On 10/11/23 16:22, Michael Strawbridge wrote:
> Move processing of email address lists before the sendemail-validate
> hook code.  This fixes email address validation errors when the optional
> perl module Email::Valid is installed and multiple addresses are passed
> in on a single to/cc argument like --to=foo@example.com,bar@example.com.
> ---
>  git-send-email.perl | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 288ea1ae80..cfd80c9d8b 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -799,6 +799,10 @@ sub is_format_patch_arg {
>  
>  $time = time - scalar $#files;
>  
> +@initial_to = process_address_list(@initial_to);
> +@initial_cc = process_address_list(@initial_cc);
> +@initial_bcc = process_address_list(@initial_bcc);
> +
>  if ($validate) {
>         # FIFOs can only be read once, exclude them from validation.
>         my @real_files = ();
> @@ -1099,10 +1103,6 @@ sub expand_one_alias {
>         return $aliases{$alias} ? expand_aliases(@{$aliases{$alias}}) : $alias;
>  }
>  
> -@initial_to = process_address_list(@initial_to);
> -@initial_cc = process_address_list(@initial_cc);
> -@initial_bcc = process_address_list(@initial_bcc);
> -
>  if ($thread && !defined $initial_in_reply_to && $prompting) {
>         $initial_in_reply_to = ask(
>                 __("Message-ID to be used as In-Reply-To for the first email (if any)? "),
Junio C Hamano Oct. 11, 2023, 9:27 p.m. UTC | #2
Michael Strawbridge <michael.strawbridge@amd.com> writes:

> @@ -799,6 +799,10 @@ sub is_format_patch_arg {
>  
>  $time = time - scalar $#files;
>  
> +@initial_to = process_address_list(@initial_to);
> +@initial_cc = process_address_list(@initial_cc);
> +@initial_bcc = process_address_list(@initial_bcc);
> +

This does not look OK.  If we trace how @initial_to gets its value,

 - it first gets its value from @getopt_to and @config_to

 - if that is empty, and there is no $to_cmd, the end-user is
   interactively asked.

 - then process_address_list() is applied.

But this patch just swapped the second one and the third one, so
process_address_list() does not process what the end-user gave
interactively, no?

>  if ($validate) {
>         # FIFOs can only be read once, exclude them from validation.
>         my @real_files = ();

It almost feels like what need to move is not the setting of these
address lists, but the code that calls int validation callchain that
needs access to these address lists---the block that begins with the
above "if ($validate) {" needs to move below ...

> @@ -1099,10 +1103,6 @@ sub expand_one_alias {
>         return $aliases{$alias} ? expand_aliases(@{$aliases{$alias}}) : $alias;
>  }
>  
> -@initial_to = process_address_list(@initial_to);
> -@initial_cc = process_address_list(@initial_cc);
> -@initial_bcc = process_address_list(@initial_bcc);
> -

... this point, or something, perhaps?

>  if ($thread && !defined $initial_in_reply_to && $prompting) {
>         $initial_in_reply_to = ask(
>                 __("Message-ID to be used as In-Reply-To for the first email (if any)? "),
Jeff King Oct. 11, 2023, 10:18 p.m. UTC | #3
On Wed, Oct 11, 2023 at 02:27:49PM -0700, Junio C Hamano wrote:

> Michael Strawbridge <michael.strawbridge@amd.com> writes:
> 
> > @@ -799,6 +799,10 @@ sub is_format_patch_arg {
> >  
> >  $time = time - scalar $#files;
> >  
> > +@initial_to = process_address_list(@initial_to);
> > +@initial_cc = process_address_list(@initial_cc);
> > +@initial_bcc = process_address_list(@initial_bcc);
> > +
> 
> This does not look OK.  If we trace how @initial_to gets its value,
> 
>  - it first gets its value from @getopt_to and @config_to
> 
>  - if that is empty, and there is no $to_cmd, the end-user is
>    interactively asked.
> 
>  - then process_address_list() is applied.
> 
> But this patch just swapped the second one and the third one, so
> process_address_list() does not process what the end-user gave
> interactively, no?

Yep, that is the issue I found when I dug into this earlier.

> >  if ($validate) {
> >         # FIFOs can only be read once, exclude them from validation.
> >         my @real_files = ();
> 
> It almost feels like what need to move is not the setting of these
> address lists, but the code that calls int validation callchain that
> needs access to these address lists---the block that begins with the
> above "if ($validate) {" needs to move below ...

Yes, though then we have the problem that we've asked the user some
interactive questions before validating if the input files are bogus or
not. Which would be annoying if they aren't valid, because when we barf
they've wasted time typing.

Which of course implies that we're not (and cannot) validate what
they're typing at this step, but I think that's OK because we feed it
through extract_valid_address_or_die(). IOW, I think there are actually
two distinct validation steps hidden here:

  1. We want to validate that the patch files we were fed are OK.

  2. We want to validate that the addresses, etc, fed by the user are
     OK.

And after Michael's original patch, we are accidentally hitting some of
that validation code for (2) while doing (1).

This is actually a weird split if you think about it. We are feeding to
the validate hook in (1), so surely it would want to see the full set of
inputs from the user, too? Which argues for pushing the "if ($validate)"
down as you suggest. And then either:

  a. We accept that the user experience is a little worse if validation
     fails after the user typed.

  b. We split (1) into "early" validation that just checks if the files
     are OK, but doesn't call the hook. And then later on we do the full
     validation.

I don't have a strong opinion myself (I don't even use send-email
myself, and if I did, I'd probably mostly be feeding it with "--to" etc
on the command line, rather than interactively).

-Peff
Junio C Hamano Oct. 11, 2023, 10:37 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> Which of course implies that we're not (and cannot) validate what
> they're typing at this step, but I think that's OK because we feed it
> through extract_valid_address_or_die().

OK, let's queue it then.

> IOW, I think there are actually two distinct validation steps
> hidden here:
>
>   1. We want to validate that the patch files we were fed are OK.
>
>   2. We want to validate that the addresses, etc, fed by the user are
>      OK.
>
> And after Michael's original patch, we are accidentally hitting some of
> that validation code for (2) while doing (1).

> This is actually a weird split if you think about it. We are feeding to
> the validate hook in (1), so surely it would want to see the full set of
> inputs from the user, too? Which argues for pushing the "if ($validate)"
> down as you suggest. And then either:
>
>   a. We accept that the user experience is a little worse if validation
>      fails after the user typed.
>
>   b. We split (1) into "early" validation that just checks if the files
>      are OK, but doesn't call the hook. And then later on we do the full
>      validation.
>
> I don't have a strong opinion myself (I don't even use send-email
> myself, and if I did, I'd probably mostly be feeding it with "--to" etc
> on the command line, rather than interactively).

I am not affected, either, and do not have a strong opinion either
way.  As long as the end-user input is validated separately, it
would be OK, but if the end-user supplied validation hook cares
about what addresses the messages are going to be sent to, not
knowing the set of recipients mean the validation hook is not
getting the whole picture, which does smell bad.

On the other hand, I am not sure what is wrong with "after the user
typed", actually.  As you said, anybody sane would be using --to (or
an equivalent configuration variable in the repository) to send
their patches to the project address instead of typing, and to them
it is not a problem.  After getting the recipient address from the
end user, the validation may fail due to a wrong address, in which
case it is a good thing.  If the validation failed due to wrong
contents of the patch (perhaps it included a change to the file with
trade secret that appeared in the context lines), as long as the
reason why the validation hook rejected the patches is clear enough
(e.g., "it's the patches, not the recipients"), such "a rejection
after typing" would be only once per a patch series, so it does not
sound too bad, either.

But perhaps I am not seeing the reason why "fail after the user typed"
is so disliked and being unnecessarily unsympathetic.  I dunno.
Jeff King Oct. 11, 2023, 10:47 p.m. UTC | #5
On Wed, Oct 11, 2023 at 03:37:39PM -0700, Junio C Hamano wrote:

> On the other hand, I am not sure what is wrong with "after the user
> typed", actually.  As you said, anybody sane would be using --to (or
> an equivalent configuration variable in the repository) to send
> their patches to the project address instead of typing, and to them
> it is not a problem.  After getting the recipient address from the
> end user, the validation may fail due to a wrong address, in which
> case it is a good thing.  If the validation failed due to wrong
> contents of the patch (perhaps it included a change to the file with
> trade secret that appeared in the context lines), as long as the
> reason why the validation hook rejected the patches is clear enough
> (e.g., "it's the patches, not the recipients"), such "a rejection
> after typing" would be only once per a patch series, so it does not
> sound too bad, either.
> 
> But perhaps I am not seeing the reason why "fail after the user typed"
> is so disliked and being unnecessarily unsympathetic.  I dunno.

I did not look carefully at the flow of send-email, so this may or may
not be an issue. But what I think would be _really_ annoying is if you
asked to write a cover letter, went through the trouble of writing it,
and then send-email bailed due to some validation failure that could
have been checked earlier.

There is probably a way to recover your work (presumably we leave it in
a temporary file somewhere), but it may not be entirely trivial,
especially for users who are not comfortable with advanced usage of
their editor. ;)

I seem to remember we had one or two such problems in the early days
with "git commit", where you would go to the trouble to type a commit
message only to bail on some condition which _could_ have been checked
earlier. You can recover the message from .git/COMMIT_EDITMSG, but you
need to remember to do so before re-invoking "git commit", otherwise it
gets obliterated.

Now for send-email, if your flow is to generate the patches with
"format-patch", then edit the cover letter separately, and then finally
ship it all out with "send-email", that might not be an issue. But some
workflows use the --compose option instead.

-Peff
Michael Strawbridge Oct. 13, 2023, 8:25 p.m. UTC | #6
On 10/11/23 18:47, Jeff King wrote:
> On Wed, Oct 11, 2023 at 03:37:39PM -0700, Junio C Hamano wrote:
>
>> On the other hand, I am not sure what is wrong with "after the user
>> typed", actually.  As you said, anybody sane would be using --to (or
>> an equivalent configuration variable in the repository) to send
>> their patches to the project address instead of typing, and to them
>> it is not a problem.  After getting the recipient address from the
>> end user, the validation may fail due to a wrong address, in which
>> case it is a good thing.  If the validation failed due to wrong
>> contents of the patch (perhaps it included a change to the file with
>> trade secret that appeared in the context lines), as long as the
>> reason why the validation hook rejected the patches is clear enough
>> (e.g., "it's the patches, not the recipients"), such "a rejection
>> after typing" would be only once per a patch series, so it does not
>> sound too bad, either.
>>
>> But perhaps I am not seeing the reason why "fail after the user typed"
>> is so disliked and being unnecessarily unsympathetic.  I dunno.
> I did not look carefully at the flow of send-email, so this may or may
> not be an issue. But what I think would be _really_ annoying is if you
> asked to write a cover letter, went through the trouble of writing it,
> and then send-email bailed due to some validation failure that could
> have been checked earlier.
>
> There is probably a way to recover your work (presumably we leave it in
> a temporary file somewhere), but it may not be entirely trivial,
> especially for users who are not comfortable with advanced usage of
> their editor. ;)
As I was looking at covering the case of interactive input (--compose) to the fix I noticed that this seems to be at least partly handled by the $compose_filename code.  There is a nice output message telling you exactly where the intermediate version of the email you are composing is located if there are errors.  I took a quick look inside and can verify that any lost work should be minimal as long as someone knows how to edit files with their editor of choice.
>
> I seem to remember we had one or two such problems in the early days
> with "git commit", where you would go to the trouble to type a commit
> message only to bail on some condition which _could_ have been checked
> earlier. You can recover the message from .git/COMMIT_EDITMSG, but you
> need to remember to do so before re-invoking "git commit", otherwise it
> gets obliterated.
>
> Now for send-email, if your flow is to generate the patches with
> "format-patch", then edit the cover letter separately, and then finally
> ship it all out with "send-email", that might not be an issue. But some
> workflows use the --compose option instead.
>
> -Peff
I have been looking into handling the interactive input cases while solving this issue, but have yet to make a breakthrough.  Simply moving the validation code below the original process_address_list code results in a a scenario where I get the email address being seen as something like "ARRAY (0x55ddb951d768)" rather than the email address I wrote in the compose buffer.
Bagas Sanjaya Oct. 20, 2023, 2:50 a.m. UTC | #7
On Wed, Oct 11, 2023 at 04:22:18PM -0400, Michael Strawbridge wrote:
> Move processing of email address lists before the sendemail-validate
> hook code.  This fixes email address validation errors when the optional
> perl module Email::Valid is installed and multiple addresses are passed
> in on a single to/cc argument like --to=foo@example.com,bar@example.com.
> ---
>  git-send-email.perl | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 288ea1ae80..cfd80c9d8b 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -799,6 +799,10 @@ sub is_format_patch_arg {
>  
>  $time = time - scalar $#files;
>  
> +@initial_to = process_address_list(@initial_to);
> +@initial_cc = process_address_list(@initial_cc);
> +@initial_bcc = process_address_list(@initial_bcc);
> +
>  if ($validate) {
>         # FIFOs can only be read once, exclude them from validation.
>         my @real_files = ();
> @@ -1099,10 +1103,6 @@ sub expand_one_alias {
>         return $aliases{$alias} ? expand_aliases(@{$aliases{$alias}}) : $alias;
>  }
>  
> -@initial_to = process_address_list(@initial_to);
> -@initial_cc = process_address_list(@initial_cc);
> -@initial_bcc = process_address_list(@initial_bcc);
> -
>  if ($thread && !defined $initial_in_reply_to && $prompting) {
>         $initial_in_reply_to = ask(
>                 __("Message-ID to be used as In-Reply-To for the first email (if any)? "),

Thanks for the fixup! The patch itself is whitespace-damaged, though, so
I have to manually apply it. Regardless,

Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Jeff King Oct. 20, 2023, 6:45 a.m. UTC | #8
On Fri, Oct 13, 2023 at 04:25:57PM -0400, Michael Strawbridge wrote:

> > I did not look carefully at the flow of send-email, so this may or may
> > not be an issue. But what I think would be _really_ annoying is if you
> > asked to write a cover letter, went through the trouble of writing it,
> > and then send-email bailed due to some validation failure that could
> > have been checked earlier.
> >
> > There is probably a way to recover your work (presumably we leave it in
> > a temporary file somewhere), but it may not be entirely trivial,
> > especially for users who are not comfortable with advanced usage of
> > their editor. ;)
>
> As I was looking at covering the case of interactive input (--compose)
> to the fix I noticed that this seems to be at least partly handled by
> the $compose_filename code.  There is a nice output message telling
> you exactly where the intermediate version of the email you are
> composing is located if there are errors.  I took a quick look inside
> and can verify that any lost work should be minimal as long as someone
> knows how to edit files with their editor of choice.

OK, that makes me feel better about just moving the validation code. A
more complicated solution could be do to do _some_ basic checks up
front, and then more complete validation later. But even if we wanted to
do that, moving the bulk of the validation (as discussed in this thread)
would probably be the first step anyway (and if nobody complains, maybe
we can avoid doing the extra work).

I do wonder if we might find other interesting corner cases where
the validation code (or somebody's hook) isn't happy with seeing the
more "full" picture (i.e., with the extra addresses from interactive and
command-line input). But arguably any such case would be indicative of a
bug, and smoking it out would be a good thing.

> I have been looking into handling the interactive input cases while
> solving this issue, but have yet to make a breakthrough.  Simply
> moving the validation code below the original process_address_list
> code results in a a scenario where I get the email address being seen
> as something like "ARRAY (0x55ddb951d768)" rather than the email
> address I wrote in the compose buffer.

Sounds like something is making a perl ref that shouldn't (or something
that should be dereferencing it not doing so). If you post your patch
and a reproduction command, I might be able to help debug.

But just blindly moving the validation code down, like:

diff --git a/git-send-email.perl b/git-send-email.perl
index 288ea1ae80..76589c7827 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -799,30 +799,6 @@ sub is_format_patch_arg {
 
 $time = time - scalar $#files;
 
-if ($validate) {
-	# FIFOs can only be read once, exclude them from validation.
-	my @real_files = ();
-	foreach my $f (@files) {
-		unless (-p $f) {
-			push(@real_files, $f);
-		}
-	}
-
-	# Run the loop once again to avoid gaps in the counter due to FIFO
-	# arguments provided by the user.
-	my $num = 1;
-	my $num_files = scalar @real_files;
-	$ENV{GIT_SENDEMAIL_FILE_TOTAL} = "$num_files";
-	foreach my $r (@real_files) {
-		$ENV{GIT_SENDEMAIL_FILE_COUNTER} = "$num";
-		pre_process_file($r, 1);
-		validate_patch($r, $target_xfer_encoding);
-		$num += 1;
-	}
-	delete $ENV{GIT_SENDEMAIL_FILE_COUNTER};
-	delete $ENV{GIT_SENDEMAIL_FILE_TOTAL};
-}
-
 @files = handle_backup_files(@files);
 
 if (@files) {
@@ -1121,6 +1097,30 @@ sub expand_one_alias {
 	$reply_to = sanitize_address($reply_to);
 }
 
+if ($validate) {
+	# FIFOs can only be read once, exclude them from validation.
+	my @real_files = ();
+	foreach my $f (@files) {
+		unless (-p $f) {
+			push(@real_files, $f);
+		}
+	}
+
+	# Run the loop once again to avoid gaps in the counter due to FIFO
+	# arguments provided by the user.
+	my $num = 1;
+	my $num_files = scalar @real_files;
+	$ENV{GIT_SENDEMAIL_FILE_TOTAL} = "$num_files";
+	foreach my $r (@real_files) {
+		$ENV{GIT_SENDEMAIL_FILE_COUNTER} = "$num";
+		pre_process_file($r, 1);
+		validate_patch($r, $target_xfer_encoding);
+		$num += 1;
+	}
+	delete $ENV{GIT_SENDEMAIL_FILE_COUNTER};
+	delete $ENV{GIT_SENDEMAIL_FILE_TOTAL};
+}
+
 if (!defined $sendmail_cmd && !defined $smtp_server) {
 	my @sendmail_paths = qw( /usr/sbin/sendmail /usr/lib/sendmail );
 	push @sendmail_paths, map {"$_/sendmail"} split /:/, $ENV{PATH};

seems to fix the problem from this thread and passes the existing tests.
Manually inspecting the result (and what's fed to the validation hook) I
don't see anything odd (like "ARRAY (...)").

-Peff
Jeff King Oct. 20, 2023, 7:14 a.m. UTC | #9
On Fri, Oct 20, 2023 at 02:45:25AM -0400, Jeff King wrote:

> > I have been looking into handling the interactive input cases while
> > solving this issue, but have yet to make a breakthrough.  Simply
> > moving the validation code below the original process_address_list
> > code results in a a scenario where I get the email address being seen
> > as something like "ARRAY (0x55ddb951d768)" rather than the email
> > address I wrote in the compose buffer.
> 
> Sounds like something is making a perl ref that shouldn't (or something
> that should be dereferencing it not doing so). If you post your patch
> and a reproduction command, I might be able to help debug.

Ah, your "address I wrote in the compose buffer" was the clue I needed.

I think this is actually an existing bug. If I use --compose and write:

  To: foo@example.com

in the editor, we read that back in and handle it in parse_header_line()
like:

        my $addr_pat = join "|", qw(To Cc Bcc);

        foreach (split(/\n/, $lines)) {
                if (/^($addr_pat):\s*(.+)$/i) {
                        $parsed_line->{$1} = [ parse_address_line($2) ];
                } elsif (/^([^:]*):\s*(.+)\s*$/i) {
                        $parsed_line->{$1} = $2;
                }
        }

and there's your perl array ref (from the square brackets, which are
necessary because we're sticking it in a hash value). But even before
your patch, this seems to end up as garbage. The code which reads
$parsed_line does not dereference the array.

The patch to fix it is only a few lines (well, more than that with some
light editorializing in the comments):

diff --git a/git-send-email.perl b/git-send-email.perl
index 76589c7827..46a30088c9 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -918,7 +918,28 @@ sub get_patch_subject {
 	# Preserve unknown headers
 	foreach my $key (keys %parsed_email) {
 		next if $key eq 'body';
-		print $c2 "$key: $parsed_email{$key}";
+
+		# it seems like it would be easier to just look for
+		# $parsed_email{'To'} and so on. But we actually match
+		# these case-insenstively and preserve the user's spelling, so
+		# we might see $parsed_email{'to'}. Of course, the same bug
+		# exists for Subject, etc, above. Anyway, a "/i" regex here
+		# handles all cases.
+		#
+		# It kind of feels like all of this code would be much simpler
+		# if we just handled all of the headers while reading back the
+		# input, rather than stuffing them all into $parsed_email and
+		# then picking them out of it.
+		#
+		# It also really feels like these to/cc/bcc lines should be
+		# added to the regular ones? It is silly to make a cover letter
+		# that goes to some addresses, and then not send the patches to
+		# them, too.
+		if ($key =~ /^(To|Cc|Bcc)$/i) {
+			print $c2 "$key: ", join(', ', @{$parsed_email{$key}});
+		} else {
+			print $c2 "$key: $parsed_email{$key}";
+		}
 	}
 
 	if ($parsed_email{'body'}) {

I don't really think your patch makes things worse here. But it is
probably worth fixing it while we are here.

-Peff
diff mbox series

Patch

diff --git a/git-send-email.perl b/git-send-email.perl
index 288ea1ae80..cfd80c9d8b 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -799,6 +799,10 @@  sub is_format_patch_arg {
 
 $time = time - scalar $#files;
 
+@initial_to = process_address_list(@initial_to);
+@initial_cc = process_address_list(@initial_cc);
+@initial_bcc = process_address_list(@initial_bcc);
+
 if ($validate) {
        # FIFOs can only be read once, exclude them from validation.
        my @real_files = ();
@@ -1099,10 +1103,6 @@  sub expand_one_alias {
        return $aliases{$alias} ? expand_aliases(@{$aliases{$alias}}) : $alias;
 }
 
-@initial_to = process_address_list(@initial_to);
-@initial_cc = process_address_list(@initial_cc);
-@initial_bcc = process_address_list(@initial_bcc);
-
 if ($thread && !defined $initial_in_reply_to && $prompting) {
        $initial_in_reply_to = ask(
                __("Message-ID to be used as In-Reply-To for the first email (if any)? "),