[RESEND] hooks: add sendemail-validate-series

Message ID	20230402185635.302653-1-robin@jarry.cc (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@vger.kernel.org> sender: robin@jarry.cc) by mail.gandi.net (Postfix) with ESMTPSA id 01F7A100005; Sun, 2 Apr 2023 18:56:42 +0000 (UTC) From: Robin Jarry <robin@jarry.cc> To: git@vger.kernel.org Cc: Tim Culverhouse <tim@timculverhouse.com>, Nicolas Dichtel <nicolas.dichtel@6wind.com>, Bagas Sanjaya <bagasdotme@gmail.com>, Junio C Hamano <gitster@pobox.com>, Robin Jarry <robin@jarry.cc> Subject: [PATCH RESEND] hooks: add sendemail-validate-series Date: Sun, 2 Apr 2023 20:56:35 +0200 Message-Id: <20230402185635.302653-1-robin@jarry.cc> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[RESEND] hooks: add sendemail-validate-series \| expand [RESEND] hooks: add sendemail-validate-series

Robin Jarry April 2, 2023, 6:56 p.m. UTC

When sending patch series (with a cover-letter or not)
sendemail-validate is called with every email/patch file independently
from the others. When one of the patches depends on a previous one, it
may not be possible to use this hook in a meaningful way.

Changing sendemail-validate to take all patches as arguments would break
backward compatibility.

Add a new hook to allow validating patch series instead of patch by
patch. The patch files are provided in the hook script standard input.

git hook run cannot be used since it closes the hook standard input. Run
the hook directly.

Signed-off-by: Robin Jarry <robin@jarry.cc>
---
Rebased on 140b9478dad5 ("The sixth batch")

 Documentation/git-send-email.txt |  1 +
 Documentation/githooks.txt       | 17 +++++++++++++
 git-send-email.perl              | 42 ++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+)

Eric Sunshine April 3, 2023, 12:17 a.m. UTC | #1

On Sun, Apr 2, 2023 at 3:10 PM Robin Jarry <robin@jarry.cc> wrote:
> When sending patch series (with a cover-letter or not)
> sendemail-validate is called with every email/patch file independently
> from the others. When one of the patches depends on a previous one, it
> may not be possible to use this hook in a meaningful way.
>
> Changing sendemail-validate to take all patches as arguments would break
> backward compatibility.
>
> Add a new hook to allow validating patch series instead of patch by
> patch. The patch files are provided in the hook script standard input.

It's not clear from this description whether the pathnames of the
patches are fed to the hook on stdin or if the patch contents are fed
on stdin.

> git hook run cannot be used since it closes the hook standard input. Run
> the hook directly.
>
> Signed-off-by: Robin Jarry <robin@jarry.cc>
> ---
> diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
> @@ -600,6 +600,23 @@ the name of the file that holds the e-mail to be sent.  Exiting with a
> +sendemail-validate-series
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +This hook is invoked by linkgit:git-send-email[1].  It allows performing
> +validation on a complete patch series at once, instead of patch by patch with
> +`sendemail-validate`.
> +
> +`sendemail-validate-series` takes no arguments, but for each e-mail to be sent
> +it receives on standard input a line of the format:
> +
> +  <patch-file> LF
> +
> +where `<patch-file>` is a name of a file that holds an e-mail to be sent,

This does a better job than the commit message of explaining that
stdin receives the names of the patches rather than the content of the
patches themselves. It's a nit, but it might be even clearer to say
that <patch-file> is the _pathname_ of the file rather than merely the
_name_.

> +If the hook exits with non-zero status, `git send-email` will abort before
> +sending any e-mails.

It was a bit startling to see this spelled "e-mail" rather than
"email", the latter of which is used far more frequently in the
documentation. However, "e-mail" does indeed appear in githooks.txt
more frequently than "email", so the use of "e-mail" here is probably
fine.

I doubt that any of the above comments warrant a reroll.

Phillip Wood April 3, 2023, 2:09 p.m. UTC | #2

Hi Robin

On 02/04/2023 19:56, Robin Jarry wrote:
> When sending patch series (with a cover-letter or not)
> sendemail-validate is called with every email/patch file independently
> from the others. When one of the patches depends on a previous one, it
> may not be possible to use this hook in a meaningful way.
> 
> Changing sendemail-validate to take all patches as arguments would break
> backward compatibility.
> 
> Add a new hook to allow validating patch series instead of patch by
> patch. The patch files are provided in the hook script standard input.
> 
> git hook run cannot be used since it closes the hook standard input. Run
> the hook directly.

I've left some comments about this lower down as "git hook run" now has 
a --to-stdin option.

> Signed-off-by: Robin Jarry <robin@jarry.cc>
> ---

> +sendemail-validate-series
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +This hook is invoked by linkgit:git-send-email[1].  It allows performing
> +validation on a complete patch series at once, instead of patch by patch with
> +`sendemail-validate`.
> +
> +`sendemail-validate-series` takes no arguments, but for each e-mail to be sent
> +it receives on standard input a line of the format:
> +
> +  <patch-file> LF

Usually git commands that produce or consume paths either use quoted 
paths terminated by LF or unquoted paths terminated by NUL. That way 
there is no ambiguity when a path contains LF.

> +where `<patch-file>` is a name of a file that holds an e-mail to be sent,
> +
> +If the hook exits with non-zero status, `git send-email` will abort before
> +sending any e-mails.
> +
>   fsmonitor-watchman
>   ~~~~~~~~~~~~~~~~~~
>   
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 07f2a0cbeaad..bec4d0f4ab47 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -800,6 +800,7 @@ sub is_format_patch_arg {
>   			validate_patch($f, $target_xfer_encoding);
>   		}
>   	}
> +	validate_patch_series(@files)

This happens fairly early, before the user has had a chance to edit the 
patches and before we have added all the recipient and in-reply-to 
headers to the patch files. Would it be more useful to validate what 
will actually be sent?

>   }
>   
>   if (@files) {
> @@ -2125,6 +2126,47 @@ sub validate_patch {
>   	return;
>   }
>   
> +sub validate_patch_series {
> +	my @files = @_;
> +
> +	unless ($repo) {
> +		return;
> +	}
> +
> +	my $hook_name = 'sendemail-validate-series';
> +	my $hooks_path = $repo->command_oneline('rev-parse', '--git-path', 'hooks');

$hooks_path maybe a relative path, this is problematic because we change 
directory before executing the hook (using "git hook run" would avoid this).

> +	require File::Spec;
> +	my $validate_hook = File::Spec->catfile($hooks_path, $hook_name);
> +	my $hook_error;
> +	unless (-x $validate_hook) {
> +		return;
> +	}
> +
> +	# The hook needs a correct cwd and GIT_DIR.
> +	require Cwd;
> +	my $cwd_save = Cwd::getcwd();
> +	chdir($repo->wc_path() or $repo->repo_path()) or die("chdir: $!");
> +	local $ENV{"GIT_DIR"} = $repo->repo_path();

This looks like it is copied from the existing code but why do we need 
to do this? I'm struggling to come up with a scenario where "git 
send-email" can find the repository but the hook cannot.

> +	# cannot use git hook run, it closes stdin before forking the hook
> +	open(my $stdin, "|-", $validate_hook) or die("fork: $!");

This passes $validate_hook to the shell to execute which is not what we 
want as it will split the hook path on whitespace etc. I think it would 
be better to use "git hook run --to-stdin" (see 0414b3891c (hook: 
support a --to-stdin=<path> option, 2023-02-08))

Best Wishes

Phillip

> +	chdir($cwd_save) or die("chdir: $!");
> +	for my $fn (@files) {
> +		unless (-p $fn) {
> +			$fn = Cwd::abs_path($fn);
> +			$stdin->print("$fn\n");
> +		}
> +	}
> +	close($stdin); # calls waitpid
> +	if ($? & 0x7f) {
> +		my $sig = $? & 0x7f;
> +		die("fatal: hook $hook_name killed by signal $sig");
> +	} elsif ($? >> 8) {
> +		my $err = $? >> 8;
> +		die("fatal: hook $hook_name rejected patch series (exit code $err)");
> +	}
> +	return;
> +}
> +
>   sub handle_backup {
>   	my ($last, $lastlen, $file, $known_suffix) = @_;
>   	my ($suffix, $skip);

Robin Jarry April 3, 2023, 2:32 p.m. UTC | #3

Hi Phillip,

Phillip Wood, Apr 03, 2023 at 16:09:
> > +  <patch-file> LF
>
> Usually git commands that produce or consume paths either use quoted 
> paths terminated by LF or unquoted paths terminated by NUL. That way 
> there is no ambiguity when a path contains LF.

I had never imagined that some twisted mind would insert LF in path
names but since nothing will forbid it, I agree that it is
a possibility.

I'm not sure what you mean by quoted paths, you mean adding literal
double quotes before printing them to the hook stdin? That means the
hook needs to handle de-quoting after reading, right?

> > diff --git a/git-send-email.perl b/git-send-email.perl
> > index 07f2a0cbeaad..bec4d0f4ab47 100755
> > --- a/git-send-email.perl
> > +++ b/git-send-email.perl
> > @@ -800,6 +800,7 @@ sub is_format_patch_arg {
> >   			validate_patch($f, $target_xfer_encoding);
> >   		}
> >   	}
> > +	validate_patch_series(@files)
>
> This happens fairly early, before the user has had a chance to edit the 
> patches and before we have added all the recipient and in-reply-to 
> headers to the patch files. Would it be more useful to validate what 
> will actually be sent?

I agree that it would be better. I added the check here to be in line
with the existing sendemail-validate hook. I could move it after edition
and header finalization but then we would need to move
sendemail-validate as well for consistency. What do you think?

> > +	# The hook needs a correct cwd and GIT_DIR.
> > +	require Cwd;
> > +	my $cwd_save = Cwd::getcwd();
> > +	chdir($repo->wc_path() or $repo->repo_path()) or die("chdir: $!");
> > +	local $ENV{"GIT_DIR"} = $repo->repo_path();
>
> This looks like it is copied from the existing code but why do we need 
> to do this? I'm struggling to come up with a scenario where "git 
> send-email" can find the repository but the hook cannot.

Again, for consistency I assumed it would be best to keep the code
similar in both hooks. If you think it is safe to skip that check, I'll
remove it gladly.

> > +	# cannot use git hook run, it closes stdin before forking the hook
> > +	open(my $stdin, "|-", $validate_hook) or die("fork: $!");
>
> This passes $validate_hook to the shell to execute which is not what we 
> want as it will split the hook path on whitespace etc. I think it would 
> be better to use "git hook run --to-stdin" (see 0414b3891c (hook: 
> support a --to-stdin=<path> option, 2023-02-08))

Ah that's a nice addition. I'll add that in v2.

Thanks for reviewing!

Phillip Wood April 3, 2023, 3:20 p.m. UTC | #4

Hi Robin

On 03/04/2023 15:32, Robin Jarry wrote:
> Hi Phillip,
> 
> Phillip Wood, Apr 03, 2023 at 16:09:
>>> +  <patch-file> LF
>>
>> Usually git commands that produce or consume paths either use quoted
>> paths terminated by LF or unquoted paths terminated by NUL. That way
>> there is no ambiguity when a path contains LF.
> 
> I had never imagined that some twisted mind would insert LF in path
> names but since nothing will forbid it, I agree that it is
> a possibility.
> 
> I'm not sure what you mean by quoted paths, you mean adding literal
> double quotes before printing them to the hook stdin? That means the
> hook needs to handle de-quoting after reading, right?

I meant quoted in the same way that diff and ls-files etc quote paths 
that contain control characters - see quote_c_style() in quote.c if 
you're interested in the details. It looks like Git.pm can unquote paths 
but has no code to quote them. It is probably easiest to use NUL 
termination here - bash and zsh can read NUL terminated lines and so can 
any scripting language.

>>> diff --git a/git-send-email.perl b/git-send-email.perl
>>> index 07f2a0cbeaad..bec4d0f4ab47 100755
>>> --- a/git-send-email.perl
>>> +++ b/git-send-email.perl
>>> @@ -800,6 +800,7 @@ sub is_format_patch_arg {
>>>    			validate_patch($f, $target_xfer_encoding);
>>>    		}
>>>    	}
>>> +	validate_patch_series(@files)
>>
>> This happens fairly early, before the user has had a chance to edit the
>> patches and before we have added all the recipient and in-reply-to
>> headers to the patch files. Would it be more useful to validate what
>> will actually be sent?
> 
> I agree that it would be better. I added the check here to be in line
> with the existing sendemail-validate hook. I could move it after edition
> and header finalization but then we would need to move
> sendemail-validate as well for consistency. What do you think?

That would be my inclination but I'm only an occasional send-email user. 
The downside is that the user may edit a patch only for it to be 
rejected but we could offer for them to edit it again rather than just 
throwing their work away.

>>> +	# The hook needs a correct cwd and GIT_DIR.
>>> +	require Cwd;
>>> +	my $cwd_save = Cwd::getcwd();
>>> +	chdir($repo->wc_path() or $repo->repo_path()) or die("chdir: $!");
>>> +	local $ENV{"GIT_DIR"} = $repo->repo_path();
>>
>> This looks like it is copied from the existing code but why do we need
>> to do this? I'm struggling to come up with a scenario where "git
>> send-email" can find the repository but the hook cannot.
> 
> Again, for consistency I assumed it would be best to keep the code
> similar in both hooks. If you think it is safe to skip that check, I'll
> remove it gladly.

I suspect it is safe, hopefully someone will speak up if I'm mistaken. A 
while ago rebase stopped setting these variables when running an "exec" 
command as they were causing problems (see 434e0636db (sequencer: do not 
export GIT_DIR and GIT_WORK_TREE for 'exec', 2021-12-04))

>>> +	# cannot use git hook run, it closes stdin before forking the hook
>>> +	open(my $stdin, "|-", $validate_hook) or die("fork: $!");
>>
>> This passes $validate_hook to the shell to execute which is not what we
>> want as it will split the hook path on whitespace etc. I think it would
>> be better to use "git hook run --to-stdin" (see 0414b3891c (hook:
>> support a --to-stdin=<path> option, 2023-02-08))
> 
> Ah that's a nice addition. I'll add that in v2.

That's great, Best Wishes

Phillip

> Thanks for reviewing!

Junio C Hamano April 3, 2023, 3:42 p.m. UTC | #5

Phillip Wood <phillip.wood123@gmail.com> writes:

>>   diff --git a/git-send-email.perl b/git-send-email.perl
>> index 07f2a0cbeaad..bec4d0f4ab47 100755
>> --- a/git-send-email.perl
>> +++ b/git-send-email.perl
>> @@ -800,6 +800,7 @@ sub is_format_patch_arg {
>>   			validate_patch($f, $target_xfer_encoding);
>>   		}
>>   	}
>> +	validate_patch_series(@files)
>
> This happens fairly early, before the user has had a chance to edit
> the patches and before we have added all the recipient and in-reply-to
> headers to the patch files. Would it be more useful to validate what
> will actually be sent?

I actually think the original intent was to catch errors in the part
of the file that can mechanically be created before letting the user
spend time on editing, without realizing that a later stage will be
rejected due to the auto-generated (e.g. came from a commit object)
stuff.  I do not know why we need another hook to do pretty much the
same thing as the existing one (which could be taught to spool and
then the last round to validate, in addition to each step rejecting
incoming one as needed), but at least calling it there would be very
much in line with the existing one, I would say.

Thanks for a careful review.

Robin Jarry April 3, 2023, 5:25 p.m. UTC | #6

Junio C Hamano, Apr 03, 2023 at 17:42:
> I do not know why we need another hook to do pretty much the same
> thing as the existing one (which could be taught to spool and then the
> last round to validate, in addition to each step rejecting incoming
> one as needed), but at least calling it there would be very much in
> line with the existing one, I would say.

If for example the validation would require trying to apply patches on
top of another branch in a temp repository, you would need to know the
number of patches and be able to determine whether you need to reset the
branch (patch 1/N) before applying. For that you would need to parse the
contents of the patches. This is not the end of the world but I assumed
that it would be easier to handle with a hook that fires once with all
patch files.

Another option would be to change sendemail-validate to be called only
once with all patches. That would be the ideal solution since the
existing hook is not always usable with series. But that would be
a breaking change. I personally don't mind a small breakage like this
but I don't know what is the project's policy.

Robin Jarry April 3, 2023, 10:29 p.m. UTC | #7

Phillip Wood, Apr 03, 2023 at 16:09:
> Usually git commands that produce or consume paths either use quoted 
> paths terminated by LF or unquoted paths terminated by NUL. That way 
> there is no ambiguity when a path contains LF.

Thinking again about that. The probability that a file path name
generated by git-format-patch would contain LF is close to zero.
However, reading per line is more natural and more in line with other
hooks that read from stdin. Having that single hook separating stuff in
stdin with NUL bytes is weird from a user point of view. Don't you think
it would be an acceptable limitation?

Junio C Hamano April 3, 2023, 10:52 p.m. UTC | #8

"Robin Jarry" <robin@jarry.cc> writes:

> Thinking again about that. The probability that a file path name
> generated by git-format-patch would contain LF is close to zero.

Close to zero is very different from absolutely zero, and in the
case of format-patch generated patches, I think it is absolutely
zero.  At least, that was the case back when I designed and
implemented it, and I do not think I accepted a patch to break it
over the years.

But "git send-email" can be fed a list of files and even a directory
(and enumerate files in it).  The filenames are under end-users'
control in this case, so "close to zero" has absolutely no relevance.
If the end user means to feed you such a file, they can do so 100%
of the time.

If we support such a file is a different issue.  A good rule of
thumb to decide if it is reasonable is to see if the main command
already works with such filenames, e.g.

    $ git format-patch -2
    0001-foo.txt
    0002-bar.txt
    $ mv 0001-foo.txt '0001-fo
    > o.txt'
    $ mkdir dir
    $ mv 000[12]*.txt dir/.

may prepare two patch files that can be sent via send-email.  One
file (the first one) is deliberately given a filename with LF in
it.  Does send-email work on it correctly if you did e.g.

    $ git send-email dir/000[12]*.txt

or something silly like

    $ git send-email dir

or does it already choke on the first file because of the filename?

Robin Jarry April 3, 2023, 10:59 p.m. UTC | #9

Junio C Hamano, Apr 04, 2023 at 00:52:
> Close to zero is very different from absolutely zero, and in the
> case of format-patch generated patches, I think it is absolutely
> zero.  At least, that was the case back when I designed and
> implemented it, and I do not think I accepted a patch to break it
> over the years.
>
> But "git send-email" can be fed a list of files and even a directory
> (and enumerate files in it).  The filenames are under end-users'
> control in this case, so "close to zero" has absolutely no relevance.
> If the end user means to feed you such a file, they can do so 100%
> of the time.

Ok that's a fair point. Even though I am having a hard time believing
someone would do such a thing :D

> If we support such a file is a different issue.  A good rule of
> thumb to decide if it is reasonable is to see if the main command
> already works with such filenames, e.g.
>
>     $ git format-patch -2
>     0001-foo.txt
>     0002-bar.txt
>     $ mv 0001-foo.txt '0001-fo
>     > o.txt'
>     $ mkdir dir
>     $ mv 000[12]*.txt dir/.
>
> may prepare two patch files that can be sent via send-email.  One
> file (the first one) is deliberately given a filename with LF in
> it.  Does send-email work on it correctly if you did e.g.
>
>     $ git send-email dir/000[12]*.txt
>
> or something silly like
>
>     $ git send-email dir
>
> or does it already choke on the first file because of the filename?

It seems to work with both. I guess, NUL bytes separation it is then...

Junio C Hamano April 4, 2023, 8:14 p.m. UTC | #10

"Robin Jarry" <robin@jarry.cc> writes:

>> it.  Does send-email work on it correctly if you did e.g.
>>
>>     $ git send-email dir/000[12]*.txt
>>
>> or something silly like
>>
>>     $ git send-email dir
>>
>> or does it already choke on the first file because of the filename?
>
> It seems to work with both. I guess, NUL bytes separation it is then...

Feeding the filenames as the command line arguments would have been
much simpler X-<, but either NUL termination or c-quoting the
filenames would be needed _if_ we want to support crazy folks who
feed us such garbage filenames.  Letting hook scripts understand NUL
termination is a chore and it still is debatable if it is reasonable
to support, though.  I'd say it would be sufficient to just declare
"files whose name has LF in it is not given to the hook, ever" and
users would avoid such a filename if they care.

Thanks.

Robin Jarry April 5, 2023, 8:31 a.m. UTC | #11

Junio C Hamano, Apr 04, 2023 at 22:14:
> Feeding the filenames as the command line arguments would have been
> much simpler X-<,

Just a thought. Instead of adding another hook, wouldn't it be better to
add an option --validate-series (git config sendemail.validateSeries)
that would change the behaviour of the sendemail-validate hook? Calling
it with all patch files as command line arguments only once instead of
once per file.

I know I was concerned with the max size of the command line args but is
there really a chance that we hit that maximum? On my system, it is
2097152 bytes. Even with a 1000 patches series with 1000 bytes
filenames, we wouldn't hit the limit.

This way, we support any crap filename that the user may send and we
don't add a new hook which basically does the same thing than the
existing one.

Thoughts?

Junio C Hamano April 5, 2023, 9:49 p.m. UTC | #12

"Robin Jarry" <robin@jarry.cc> writes:

> Just a thought. Instead of adding another hook, wouldn't it be better to
> add an option --validate-series (git config sendemail.validateSeries)
> that would change the behaviour of the sendemail-validate hook? Calling
> it with all patch files as command line arguments only once instead of
> once per file.

I do not see much upside in doing so.  Instead of having to write a
new validate-series hook script to store it under a new name, users
can update an existing validate-patch hook script to take a batch of
patches in one go.  Then the user now has to set a new configuration
variable.  Because the expected way to feed the hook script is *not*
per invocation but depends solely on how the script is written, a
command line option would not make much sense.  Not having to rename
the updated validate-patch script to validate-series might be a
small win, but I do not think it is a compelling reason to take that
approach.

> I know I was concerned with the max size of the command line args but is
> there really a chance that we hit that maximum? On my system, it is
> 2097152 bytes. Even with a 1000 patches series with 1000 bytes
> filenames, we wouldn't hit the limit.
>
> This way, we support any crap filename that the user may send and we
> don't add a new hook which basically does the same thing than the
> existing one.

Between "we may exceed command line argument limit" (which by the
way is way lower on certain systems than what you expect, IIRC) and
"the user may throw us a file with LF in its name", I'd find it
simpler to punt on the latter and tell them "if it hurts, don't do
it".  As long as the limitation is clearly documented, I'd say it is
OK.

Thanks.

[RESEND] hooks: add sendemail-validate-series

Commit Message

Comments

Patch