diff mbox series

Regression: git send-email Message-Id: numbering doesn't start at 1 any more

Message ID 20231106153214.s5abourejkuiwk64@pengutronix.de (mailing list archive)
State New, archived
Headers show
Series Regression: git send-email Message-Id: numbering doesn't start at 1 any more | expand

Commit Message

Uwe Kleine-König Nov. 6, 2023, 3:32 p.m. UTC
Hello,

Since commit 3ece9bf0f9e24909b090cf348d89e8920bd4f82f I experience that
the generated Message-Ids don't start at ....-1-... any more. I have:

$ git send-email w/*
...
Subject: [PATCH 0/5] watchdog: Drop platform_driver_probe() and convert to platform remove callback returning void (part II)
Date: Mon,  6 Nov 2023 16:10:04 +0100
Message-ID: <20231106151003.3844134-7-u.kleine-koenig@pengutronix.de>
...

So the cover letter is sent with Message-Id: ...-7-...

Before above mentioned commit I had:

...
Message-ID: <20231106151003.3844134-1-u.kleine-koenig@pengutronix.de>
...

Similar to my earlier regression report this also only happens in the
presence of a sendemail-validate hook. Passing --no-validate works
around this issue.

While this isn't an issue for git itself, it breaks one of my scripts
that knows how to determine the number of patches in a series from the
last Message-Id:.

The following patch works for me:


But I guess this could be done prettier by someone who is fluent in
Perl.

Best regards
Uwe

Comments

Junio C Hamano Nov. 6, 2023, 11:06 p.m. UTC | #1
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

> Hello,
>
> Since commit 3ece9bf0f9e24909b090cf348d89e8920bd4f82f I experience that
> the generated Message-Ids don't start at ....-1-... any more. I have:
>
> $ git send-email w/*
> ...
> Subject: [PATCH 0/5] watchdog: Drop platform_driver_probe() and convert to platform remove callback returning void (part II)
> Date: Mon,  6 Nov 2023 16:10:04 +0100
> Message-ID: <20231106151003.3844134-7-u.kleine-koenig@pengutronix.de>
> ...
>
> So the cover letter is sent with Message-Id: ...-7-...

The above is consistent with the fact that a 5-patch series with a
cover letter consists of 6 messages.  Dry-run uses message numbers
1-6 and forgets to reset the counter, so the next message becomes 7.
As you identified, the fix in 3ece9bf0 (send-email: clear the
$message_id after validation, 2023-05-17) for the fallout from an
even earlier change to process each message twice still had left an
observable side effect subject to the Hyrum's law, it seems.

> +my ($message_id_stamp, $message_id_serial);
>  if ($validate) {
>  	# FIFOs can only be read once, exclude them from validation.
>  	my @real_files = ();
> @@ -821,6 +822,7 @@ sub is_format_patch_arg {
>  	}
>  	delete $ENV{GIT_SENDEMAIL_FILE_COUNTER};
>  	delete $ENV{GIT_SENDEMAIL_FILE_TOTAL};
> +	$message_id_serial = 0;
>  }

This fix looks quite logical to me, but even with this, the side
effects of the earlier "read message twice" persists in end-user
observable form, don't they?  IIRC, when sending out an N message
series, we start from the timestamp as of N seconds ago and give
each message the Date: header that increments by 1 second, which
would mean the validator will see Date: that is different from what
will actually be sent out, and more importantly, the messages sent
out for real will have timestamps from the future, negating the
point of starting from N seconds ago in the first place.  Your
script may not have been paying attention to it and only noticed the
difference in id_serial, but somebody else would complain the
difference coming from calling gen_header more than once for each
messages since a8022c5f (send-email: expose header information to
git-send-email's sendemail-validate hook, 2023-04-19).

So, I dunno.  Michael, what do you think?  It appears to me that a
more fundamental fix to the fallout from a8022c5f might be needed
(e.g., we still let gen_header run while validating, but once
validation is done, save the headers that validator saw and use them
without calling gen_header again when we send the messages out, or
something), if we truly want to be regression free.

By the way, out of curiosity, earlier you said your script looks at
the Message-IDs and counts the number of messages.  How does it do
that?  Does it read the output of send-email and pass the messages
to MTA for sending out for real?

Thanks.

>  @files = handle_backup_files(@files);
> @@ -1181,7 +1183,6 @@ sub validate_address_list {
>  
>  # We'll setup a template for the message id, using the "from" address:
>  
> -my ($message_id_stamp, $message_id_serial);
>  sub make_message_id {
>  	my $uniq;
>  	if (!defined $message_id_stamp) {
>
> But I guess this could be done prettier by someone who is fluent in
> Perl.
>
> Best regards
> Uwe
Uwe Kleine-König Nov. 7, 2023, 7:06 a.m. UTC | #2
Hello Junio,

On Tue, Nov 07, 2023 at 08:06:22AM +0900, Junio C Hamano wrote:
> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
> 
> > Hello,
> >
> > Since commit 3ece9bf0f9e24909b090cf348d89e8920bd4f82f I experience that
> > the generated Message-Ids don't start at ....-1-... any more. I have:
> >
> > $ git send-email w/*
> > ...
> > Subject: [PATCH 0/5] watchdog: Drop platform_driver_probe() and convert to platform remove callback returning void (part II)
> > Date: Mon,  6 Nov 2023 16:10:04 +0100
> > Message-ID: <20231106151003.3844134-7-u.kleine-koenig@pengutronix.de>
> > ...
> >
> > So the cover letter is sent with Message-Id: ...-7-...
> 
> The above is consistent with the fact that a 5-patch series with a
> cover letter consists of 6 messages.  Dry-run uses message numbers
> 1-6 and forgets to reset the counter, so the next message becomes 7.
> As you identified, the fix in 3ece9bf0 (send-email: clear the
> $message_id after validation, 2023-05-17) for the fallout from an
> even earlier change to process each message twice still had left an
> observable side effect subject to the Hyrum's law, it seems.
> 
> > +my ($message_id_stamp, $message_id_serial);
> >  if ($validate) {
> >  	# FIFOs can only be read once, exclude them from validation.
> >  	my @real_files = ();
> > @@ -821,6 +822,7 @@ sub is_format_patch_arg {
> >  	}
> >  	delete $ENV{GIT_SENDEMAIL_FILE_COUNTER};
> >  	delete $ENV{GIT_SENDEMAIL_FILE_TOTAL};
> > +	$message_id_serial = 0;
> >  }
> 
> This fix looks quite logical to me, but even with this, the side
> effects of the earlier "read message twice" persists in end-user
> observable form, don't they?  IIRC, when sending out an N message
> series, we start from the timestamp as of N seconds ago and give
> each message the Date: header that increments by 1 second, which
> would mean the validator will see Date: that is different from what
> will actually be sent out, and more importantly, the messages sent
> out for real will have timestamps from the future, negating the
> point of starting from N seconds ago in the first place.

The series I used as an example here was finally sent out with
git-send-email patched with my suggested change.

The Message-Ids involved are:

	20231106154807.3866712-1-u.kleine-koenig@pengutronix.de
	20231106154807.3866712-2-u.kleine-koenig@pengutronix.de
	20231106154807.3866712-3-u.kleine-koenig@pengutronix.de
	20231106154807.3866712-4-u.kleine-koenig@pengutronix.de
	20231106154807.3866712-5-u.kleine-koenig@pengutronix.de
	20231106154807.3866712-6-u.kleine-koenig@pengutronix.de

So the Ids are are identical apart from the number this report is about.

> Your script may not have been paying attention to it and only noticed
> the difference in id_serial, but somebody else would complain the
> difference coming from calling gen_header more than once for each
> messages since a8022c5f (send-email: expose header information to
> git-send-email's sendemail-validate hook, 2023-04-19).
> 
> So, I dunno.  Michael, what do you think?  It appears to me that a
> more fundamental fix to the fallout from a8022c5f might be needed
> (e.g., we still let gen_header run while validating, but once
> validation is done, save the headers that validator saw and use them
> without calling gen_header again when we send the messages out, or
> something), if we truly want to be regression free.

That sounds sane.

> By the way, out of curiosity, earlier you said your script looks at
> the Message-IDs and counts the number of messages.  How does it do
> that?  Does it read the output of send-email and pass the messages
> to MTA for sending out for real?

The output of git send-email dumps the messages it sends out and then I
pick the message-id of the last mail by cut-n-paste and call my script
with that as a parameter. It then adds notes to the $commitcount topmost
commits such that I have something like this on the sent out commits:

	Notes:
	    Forwarded: id:20231106154807.3866712-2-u.kleine-koenig@pengutronix.de (v1)

This is very convenient to find the thread to ping if a patch doesn't
make it into the next release.

(By the way, one difficulty in my script is that depending on the series
having a cover letter or not I have to apply the id:....-1-... marker or
not. Would be great if git send-email started with ...-0-... for a
series with a cover letter. Detecting that reliably isn't trivial I
guess.)

Best regards
Uwe
Junio C Hamano Nov. 8, 2023, 1:34 a.m. UTC | #3
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

> The output of git send-email dumps the messages it sends out and then I
> pick the message-id of the last mail by cut-n-paste and call my script
> with that as a parameter.

Yikes.  I was hoping that the whole "dumps the messages" output is
read by the script, so that it does not have to assume anything
about the Message-ID format (like "it has some unchanging parts,
the changing parts begin with 1, and it increments by 1 for each
message").

You could give Message-ID externally to the output of format-patch
before feeding send-email, which would just use them, and that way
you would have more control over the entire process, I guess.
Michael Strawbridge Nov. 8, 2023, 8:39 p.m. UTC | #4
On 11/7/23 02:06, Uwe Kleine-König wrote:
> Hello Junio,
> 
> On Tue, Nov 07, 2023 at 08:06:22AM +0900, Junio C Hamano wrote:
>> Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:
>>
>>> Hello,
>>>
>>> Since commit 3ece9bf0f9e24909b090cf348d89e8920bd4f82f I experience that
>>> the generated Message-Ids don't start at ....-1-... any more. I have:
>>>
>>> $ git send-email w/*
>>> ...
>>> Subject: [PATCH 0/5] watchdog: Drop platform_driver_probe() and convert to platform remove callback returning void (part II)
>>> Date: Mon,  6 Nov 2023 16:10:04 +0100
>>> Message-ID: <20231106151003.3844134-7-u.kleine-koenig@pengutronix.de>
>>> ...
>>>
>>> So the cover letter is sent with Message-Id: ...-7-...
>>
>> The above is consistent with the fact that a 5-patch series with a
>> cover letter consists of 6 messages.  Dry-run uses message numbers
>> 1-6 and forgets to reset the counter, so the next message becomes 7.
>> As you identified, the fix in 3ece9bf0 (send-email: clear the
>> $message_id after validation, 2023-05-17) for the fallout from an
>> even earlier change to process each message twice still had left an
>> observable side effect subject to the Hyrum's law, it seems.
>>
>>> +my ($message_id_stamp, $message_id_serial);
>>>  if ($validate) {
>>>  	# FIFOs can only be read once, exclude them from validation.
>>>  	my @real_files = ();
>>> @@ -821,6 +822,7 @@ sub is_format_patch_arg {
>>>  	}
>>>  	delete $ENV{GIT_SENDEMAIL_FILE_COUNTER};
>>>  	delete $ENV{GIT_SENDEMAIL_FILE_TOTAL};
>>> +	$message_id_serial = 0;
>>>  }
>>
>> This fix looks quite logical to me, but even with this, the side
>> effects of the earlier "read message twice" persists in end-user
>> observable form, don't they?  IIRC, when sending out an N message
>> series, we start from the timestamp as of N seconds ago and give
>> each message the Date: header that increments by 1 second, which
>> would mean the validator will see Date: that is different from what
>> will actually be sent out, and more importantly, the messages sent
>> out for real will have timestamps from the future, negating the
>> point of starting from N seconds ago in the first place.
> 
> The series I used as an example here was finally sent out with
> git-send-email patched with my suggested change.
> 
> The Message-Ids involved are:
> 
> 	20231106154807.3866712-1-u.kleine-koenig@pengutronix.de
> 	20231106154807.3866712-2-u.kleine-koenig@pengutronix.de
> 	20231106154807.3866712-3-u.kleine-koenig@pengutronix.de
> 	20231106154807.3866712-4-u.kleine-koenig@pengutronix.de
> 	20231106154807.3866712-5-u.kleine-koenig@pengutronix.de
> 	20231106154807.3866712-6-u.kleine-koenig@pengutronix.de
> 
> So the Ids are are identical apart from the number this report is about.
> 
>> Your script may not have been paying attention to it and only noticed
>> the difference in id_serial, but somebody else would complain the
>> difference coming from calling gen_header more than once for each
>> messages since a8022c5f (send-email: expose header information to
>> git-send-email's sendemail-validate hook, 2023-04-19).
>>
>> So, I dunno.  Michael, what do you think?  It appears to me that a
>> more fundamental fix to the fallout from a8022c5f might be needed
>> (e.g., we still let gen_header run while validating, but once
>> validation is done, save the headers that validator saw and use them
>> without calling gen_header again when we send the messages out, or
>> something), if we truly want to be regression free.
> 
> That sounds sane.

I agree that a more fundamental fix would be optimal.  Saving the header
information after generating it and using that in both validation and
sending would be great.  That would probably need to be paired with
cleaning up all the random code in between the function declarations
so that we can get a good look at all the variables.  Right now the
control flow is hard to follow.  I can't guarantee any specific timeline
on those patches.  It would likely be fairly large changes.  If anyone
else wants to take a crack at it, feel free.  Otherwise I may tinker
and post when I have a chance.
> 
>> By the way, out of curiosity, earlier you said your script looks at
>> the Message-IDs and counts the number of messages.  How does it do
>> that?  Does it read the output of send-email and pass the messages
>> to MTA for sending out for real?
> 
> The output of git send-email dumps the messages it sends out and then I
> pick the message-id of the last mail by cut-n-paste and call my script
> with that as a parameter. It then adds notes to the $commitcount topmost
> commits such that I have something like this on the sent out commits:
> 
> 	Notes:
> 	    Forwarded: id:20231106154807.3866712-2-u.kleine-koenig@pengutronix.de (v1)
> 
> This is very convenient to find the thread to ping if a patch doesn't
> make it into the next release.
> 
> (By the way, one difficulty in my script is that depending on the series
> having a cover letter or not I have to apply the id:....-1-... marker or
> not. Would be great if git send-email started with ...-0-... for a
> series with a cover letter. Detecting that reliably isn't trivial I
> guess.)
> 
> Best regards
> Uwe
>
diff mbox series

Patch

diff --git a/git-send-email.perl b/git-send-email.perl
index 9e21b0b3f43a..095a3d9dd720 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -799,6 +799,7 @@  sub is_format_patch_arg {
 
 $time = time - scalar $#files;
 
+my ($message_id_stamp, $message_id_serial);
 if ($validate) {
 	# FIFOs can only be read once, exclude them from validation.
 	my @real_files = ();
@@ -821,6 +822,7 @@  sub is_format_patch_arg {
 	}
 	delete $ENV{GIT_SENDEMAIL_FILE_COUNTER};
 	delete $ENV{GIT_SENDEMAIL_FILE_TOTAL};
+	$message_id_serial = 0;
 }
 
 @files = handle_backup_files(@files);
@@ -1181,7 +1183,6 @@  sub validate_address_list {
 
 # We'll setup a template for the message id, using the "from" address:
 
-my ($message_id_stamp, $message_id_serial);
 sub make_message_id {
 	my $uniq;
 	if (!defined $message_id_stamp) {