Message ID | 20230402185635.302653-1-robin@jarry.cc (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RESEND] hooks: add sendemail-validate-series | expand |
On Sun, Apr 2, 2023 at 3:10 PM Robin Jarry <robin@jarry.cc> wrote: > When sending patch series (with a cover-letter or not) > sendemail-validate is called with every email/patch file independently > from the others. When one of the patches depends on a previous one, it > may not be possible to use this hook in a meaningful way. > > Changing sendemail-validate to take all patches as arguments would break > backward compatibility. > > Add a new hook to allow validating patch series instead of patch by > patch. The patch files are provided in the hook script standard input. It's not clear from this description whether the pathnames of the patches are fed to the hook on stdin or if the patch contents are fed on stdin. > git hook run cannot be used since it closes the hook standard input. Run > the hook directly. > > Signed-off-by: Robin Jarry <robin@jarry.cc> > --- > diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt > @@ -600,6 +600,23 @@ the name of the file that holds the e-mail to be sent. Exiting with a > +sendemail-validate-series > +~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +This hook is invoked by linkgit:git-send-email[1]. It allows performing > +validation on a complete patch series at once, instead of patch by patch with > +`sendemail-validate`. > + > +`sendemail-validate-series` takes no arguments, but for each e-mail to be sent > +it receives on standard input a line of the format: > + > + <patch-file> LF > + > +where `<patch-file>` is a name of a file that holds an e-mail to be sent, This does a better job than the commit message of explaining that stdin receives the names of the patches rather than the content of the patches themselves. It's a nit, but it might be even clearer to say that <patch-file> is the _pathname_ of the file rather than merely the _name_. > +If the hook exits with non-zero status, `git send-email` will abort before > +sending any e-mails. It was a bit startling to see this spelled "e-mail" rather than "email", the latter of which is used far more frequently in the documentation. However, "e-mail" does indeed appear in githooks.txt more frequently than "email", so the use of "e-mail" here is probably fine. I doubt that any of the above comments warrant a reroll.
Hi Robin On 02/04/2023 19:56, Robin Jarry wrote: > When sending patch series (with a cover-letter or not) > sendemail-validate is called with every email/patch file independently > from the others. When one of the patches depends on a previous one, it > may not be possible to use this hook in a meaningful way. > > Changing sendemail-validate to take all patches as arguments would break > backward compatibility. > > Add a new hook to allow validating patch series instead of patch by > patch. The patch files are provided in the hook script standard input. > > git hook run cannot be used since it closes the hook standard input. Run > the hook directly. I've left some comments about this lower down as "git hook run" now has a --to-stdin option. > Signed-off-by: Robin Jarry <robin@jarry.cc> > --- > +sendemail-validate-series > +~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +This hook is invoked by linkgit:git-send-email[1]. It allows performing > +validation on a complete patch series at once, instead of patch by patch with > +`sendemail-validate`. > + > +`sendemail-validate-series` takes no arguments, but for each e-mail to be sent > +it receives on standard input a line of the format: > + > + <patch-file> LF Usually git commands that produce or consume paths either use quoted paths terminated by LF or unquoted paths terminated by NUL. That way there is no ambiguity when a path contains LF. > +where `<patch-file>` is a name of a file that holds an e-mail to be sent, > + > +If the hook exits with non-zero status, `git send-email` will abort before > +sending any e-mails. > + > fsmonitor-watchman > ~~~~~~~~~~~~~~~~~~ > > diff --git a/git-send-email.perl b/git-send-email.perl > index 07f2a0cbeaad..bec4d0f4ab47 100755 > --- a/git-send-email.perl > +++ b/git-send-email.perl > @@ -800,6 +800,7 @@ sub is_format_patch_arg { > validate_patch($f, $target_xfer_encoding); > } > } > + validate_patch_series(@files) This happens fairly early, before the user has had a chance to edit the patches and before we have added all the recipient and in-reply-to headers to the patch files. Would it be more useful to validate what will actually be sent? > } > > if (@files) { > @@ -2125,6 +2126,47 @@ sub validate_patch { > return; > } > > +sub validate_patch_series { > + my @files = @_; > + > + unless ($repo) { > + return; > + } > + > + my $hook_name = 'sendemail-validate-series'; > + my $hooks_path = $repo->command_oneline('rev-parse', '--git-path', 'hooks'); $hooks_path maybe a relative path, this is problematic because we change directory before executing the hook (using "git hook run" would avoid this). > + require File::Spec; > + my $validate_hook = File::Spec->catfile($hooks_path, $hook_name); > + my $hook_error; > + unless (-x $validate_hook) { > + return; > + } > + > + # The hook needs a correct cwd and GIT_DIR. > + require Cwd; > + my $cwd_save = Cwd::getcwd(); > + chdir($repo->wc_path() or $repo->repo_path()) or die("chdir: $!"); > + local $ENV{"GIT_DIR"} = $repo->repo_path(); This looks like it is copied from the existing code but why do we need to do this? I'm struggling to come up with a scenario where "git send-email" can find the repository but the hook cannot. > + # cannot use git hook run, it closes stdin before forking the hook > + open(my $stdin, "|-", $validate_hook) or die("fork: $!"); This passes $validate_hook to the shell to execute which is not what we want as it will split the hook path on whitespace etc. I think it would be better to use "git hook run --to-stdin" (see 0414b3891c (hook: support a --to-stdin=<path> option, 2023-02-08)) Best Wishes Phillip > + chdir($cwd_save) or die("chdir: $!"); > + for my $fn (@files) { > + unless (-p $fn) { > + $fn = Cwd::abs_path($fn); > + $stdin->print("$fn\n"); > + } > + } > + close($stdin); # calls waitpid > + if ($? & 0x7f) { > + my $sig = $? & 0x7f; > + die("fatal: hook $hook_name killed by signal $sig"); > + } elsif ($? >> 8) { > + my $err = $? >> 8; > + die("fatal: hook $hook_name rejected patch series (exit code $err)"); > + } > + return; > +} > + > sub handle_backup { > my ($last, $lastlen, $file, $known_suffix) = @_; > my ($suffix, $skip);
Hi Phillip, Phillip Wood, Apr 03, 2023 at 16:09: > > + <patch-file> LF > > Usually git commands that produce or consume paths either use quoted > paths terminated by LF or unquoted paths terminated by NUL. That way > there is no ambiguity when a path contains LF. I had never imagined that some twisted mind would insert LF in path names but since nothing will forbid it, I agree that it is a possibility. I'm not sure what you mean by quoted paths, you mean adding literal double quotes before printing them to the hook stdin? That means the hook needs to handle de-quoting after reading, right? > > diff --git a/git-send-email.perl b/git-send-email.perl > > index 07f2a0cbeaad..bec4d0f4ab47 100755 > > --- a/git-send-email.perl > > +++ b/git-send-email.perl > > @@ -800,6 +800,7 @@ sub is_format_patch_arg { > > validate_patch($f, $target_xfer_encoding); > > } > > } > > + validate_patch_series(@files) > > This happens fairly early, before the user has had a chance to edit the > patches and before we have added all the recipient and in-reply-to > headers to the patch files. Would it be more useful to validate what > will actually be sent? I agree that it would be better. I added the check here to be in line with the existing sendemail-validate hook. I could move it after edition and header finalization but then we would need to move sendemail-validate as well for consistency. What do you think? > > + # The hook needs a correct cwd and GIT_DIR. > > + require Cwd; > > + my $cwd_save = Cwd::getcwd(); > > + chdir($repo->wc_path() or $repo->repo_path()) or die("chdir: $!"); > > + local $ENV{"GIT_DIR"} = $repo->repo_path(); > > This looks like it is copied from the existing code but why do we need > to do this? I'm struggling to come up with a scenario where "git > send-email" can find the repository but the hook cannot. Again, for consistency I assumed it would be best to keep the code similar in both hooks. If you think it is safe to skip that check, I'll remove it gladly. > > + # cannot use git hook run, it closes stdin before forking the hook > > + open(my $stdin, "|-", $validate_hook) or die("fork: $!"); > > This passes $validate_hook to the shell to execute which is not what we > want as it will split the hook path on whitespace etc. I think it would > be better to use "git hook run --to-stdin" (see 0414b3891c (hook: > support a --to-stdin=<path> option, 2023-02-08)) Ah that's a nice addition. I'll add that in v2. Thanks for reviewing!
Hi Robin On 03/04/2023 15:32, Robin Jarry wrote: > Hi Phillip, > > Phillip Wood, Apr 03, 2023 at 16:09: >>> + <patch-file> LF >> >> Usually git commands that produce or consume paths either use quoted >> paths terminated by LF or unquoted paths terminated by NUL. That way >> there is no ambiguity when a path contains LF. > > I had never imagined that some twisted mind would insert LF in path > names but since nothing will forbid it, I agree that it is > a possibility. > > I'm not sure what you mean by quoted paths, you mean adding literal > double quotes before printing them to the hook stdin? That means the > hook needs to handle de-quoting after reading, right? I meant quoted in the same way that diff and ls-files etc quote paths that contain control characters - see quote_c_style() in quote.c if you're interested in the details. It looks like Git.pm can unquote paths but has no code to quote them. It is probably easiest to use NUL termination here - bash and zsh can read NUL terminated lines and so can any scripting language. >>> diff --git a/git-send-email.perl b/git-send-email.perl >>> index 07f2a0cbeaad..bec4d0f4ab47 100755 >>> --- a/git-send-email.perl >>> +++ b/git-send-email.perl >>> @@ -800,6 +800,7 @@ sub is_format_patch_arg { >>> validate_patch($f, $target_xfer_encoding); >>> } >>> } >>> + validate_patch_series(@files) >> >> This happens fairly early, before the user has had a chance to edit the >> patches and before we have added all the recipient and in-reply-to >> headers to the patch files. Would it be more useful to validate what >> will actually be sent? > > I agree that it would be better. I added the check here to be in line > with the existing sendemail-validate hook. I could move it after edition > and header finalization but then we would need to move > sendemail-validate as well for consistency. What do you think? That would be my inclination but I'm only an occasional send-email user. The downside is that the user may edit a patch only for it to be rejected but we could offer for them to edit it again rather than just throwing their work away. >>> + # The hook needs a correct cwd and GIT_DIR. >>> + require Cwd; >>> + my $cwd_save = Cwd::getcwd(); >>> + chdir($repo->wc_path() or $repo->repo_path()) or die("chdir: $!"); >>> + local $ENV{"GIT_DIR"} = $repo->repo_path(); >> >> This looks like it is copied from the existing code but why do we need >> to do this? I'm struggling to come up with a scenario where "git >> send-email" can find the repository but the hook cannot. > > Again, for consistency I assumed it would be best to keep the code > similar in both hooks. If you think it is safe to skip that check, I'll > remove it gladly. I suspect it is safe, hopefully someone will speak up if I'm mistaken. A while ago rebase stopped setting these variables when running an "exec" command as they were causing problems (see 434e0636db (sequencer: do not export GIT_DIR and GIT_WORK_TREE for 'exec', 2021-12-04)) >>> + # cannot use git hook run, it closes stdin before forking the hook >>> + open(my $stdin, "|-", $validate_hook) or die("fork: $!"); >> >> This passes $validate_hook to the shell to execute which is not what we >> want as it will split the hook path on whitespace etc. I think it would >> be better to use "git hook run --to-stdin" (see 0414b3891c (hook: >> support a --to-stdin=<path> option, 2023-02-08)) > > Ah that's a nice addition. I'll add that in v2. That's great, Best Wishes Phillip > Thanks for reviewing!
Phillip Wood <phillip.wood123@gmail.com> writes: >> diff --git a/git-send-email.perl b/git-send-email.perl >> index 07f2a0cbeaad..bec4d0f4ab47 100755 >> --- a/git-send-email.perl >> +++ b/git-send-email.perl >> @@ -800,6 +800,7 @@ sub is_format_patch_arg { >> validate_patch($f, $target_xfer_encoding); >> } >> } >> + validate_patch_series(@files) > > This happens fairly early, before the user has had a chance to edit > the patches and before we have added all the recipient and in-reply-to > headers to the patch files. Would it be more useful to validate what > will actually be sent? I actually think the original intent was to catch errors in the part of the file that can mechanically be created before letting the user spend time on editing, without realizing that a later stage will be rejected due to the auto-generated (e.g. came from a commit object) stuff. I do not know why we need another hook to do pretty much the same thing as the existing one (which could be taught to spool and then the last round to validate, in addition to each step rejecting incoming one as needed), but at least calling it there would be very much in line with the existing one, I would say. Thanks for a careful review.
Junio C Hamano, Apr 03, 2023 at 17:42: > I do not know why we need another hook to do pretty much the same > thing as the existing one (which could be taught to spool and then the > last round to validate, in addition to each step rejecting incoming > one as needed), but at least calling it there would be very much in > line with the existing one, I would say. If for example the validation would require trying to apply patches on top of another branch in a temp repository, you would need to know the number of patches and be able to determine whether you need to reset the branch (patch 1/N) before applying. For that you would need to parse the contents of the patches. This is not the end of the world but I assumed that it would be easier to handle with a hook that fires once with all patch files. Another option would be to change sendemail-validate to be called only once with all patches. That would be the ideal solution since the existing hook is not always usable with series. But that would be a breaking change. I personally don't mind a small breakage like this but I don't know what is the project's policy.
Phillip Wood, Apr 03, 2023 at 16:09: > Usually git commands that produce or consume paths either use quoted > paths terminated by LF or unquoted paths terminated by NUL. That way > there is no ambiguity when a path contains LF. Thinking again about that. The probability that a file path name generated by git-format-patch would contain LF is close to zero. However, reading per line is more natural and more in line with other hooks that read from stdin. Having that single hook separating stuff in stdin with NUL bytes is weird from a user point of view. Don't you think it would be an acceptable limitation?
"Robin Jarry" <robin@jarry.cc> writes: > Thinking again about that. The probability that a file path name > generated by git-format-patch would contain LF is close to zero. Close to zero is very different from absolutely zero, and in the case of format-patch generated patches, I think it is absolutely zero. At least, that was the case back when I designed and implemented it, and I do not think I accepted a patch to break it over the years. But "git send-email" can be fed a list of files and even a directory (and enumerate files in it). The filenames are under end-users' control in this case, so "close to zero" has absolutely no relevance. If the end user means to feed you such a file, they can do so 100% of the time. If we support such a file is a different issue. A good rule of thumb to decide if it is reasonable is to see if the main command already works with such filenames, e.g. $ git format-patch -2 0001-foo.txt 0002-bar.txt $ mv 0001-foo.txt '0001-fo > o.txt' $ mkdir dir $ mv 000[12]*.txt dir/. may prepare two patch files that can be sent via send-email. One file (the first one) is deliberately given a filename with LF in it. Does send-email work on it correctly if you did e.g. $ git send-email dir/000[12]*.txt or something silly like $ git send-email dir or does it already choke on the first file because of the filename?
Junio C Hamano, Apr 04, 2023 at 00:52: > Close to zero is very different from absolutely zero, and in the > case of format-patch generated patches, I think it is absolutely > zero. At least, that was the case back when I designed and > implemented it, and I do not think I accepted a patch to break it > over the years. > > But "git send-email" can be fed a list of files and even a directory > (and enumerate files in it). The filenames are under end-users' > control in this case, so "close to zero" has absolutely no relevance. > If the end user means to feed you such a file, they can do so 100% > of the time. Ok that's a fair point. Even though I am having a hard time believing someone would do such a thing :D > If we support such a file is a different issue. A good rule of > thumb to decide if it is reasonable is to see if the main command > already works with such filenames, e.g. > > $ git format-patch -2 > 0001-foo.txt > 0002-bar.txt > $ mv 0001-foo.txt '0001-fo > > o.txt' > $ mkdir dir > $ mv 000[12]*.txt dir/. > > may prepare two patch files that can be sent via send-email. One > file (the first one) is deliberately given a filename with LF in > it. Does send-email work on it correctly if you did e.g. > > $ git send-email dir/000[12]*.txt > > or something silly like > > $ git send-email dir > > or does it already choke on the first file because of the filename? It seems to work with both. I guess, NUL bytes separation it is then...
"Robin Jarry" <robin@jarry.cc> writes: >> it. Does send-email work on it correctly if you did e.g. >> >> $ git send-email dir/000[12]*.txt >> >> or something silly like >> >> $ git send-email dir >> >> or does it already choke on the first file because of the filename? > > It seems to work with both. I guess, NUL bytes separation it is then... Feeding the filenames as the command line arguments would have been much simpler X-<, but either NUL termination or c-quoting the filenames would be needed _if_ we want to support crazy folks who feed us such garbage filenames. Letting hook scripts understand NUL termination is a chore and it still is debatable if it is reasonable to support, though. I'd say it would be sufficient to just declare "files whose name has LF in it is not given to the hook, ever" and users would avoid such a filename if they care. Thanks.
Junio C Hamano, Apr 04, 2023 at 22:14: > Feeding the filenames as the command line arguments would have been > much simpler X-<, Just a thought. Instead of adding another hook, wouldn't it be better to add an option --validate-series (git config sendemail.validateSeries) that would change the behaviour of the sendemail-validate hook? Calling it with all patch files as command line arguments only once instead of once per file. I know I was concerned with the max size of the command line args but is there really a chance that we hit that maximum? On my system, it is 2097152 bytes. Even with a 1000 patches series with 1000 bytes filenames, we wouldn't hit the limit. This way, we support any crap filename that the user may send and we don't add a new hook which basically does the same thing than the existing one. Thoughts?
"Robin Jarry" <robin@jarry.cc> writes: > Just a thought. Instead of adding another hook, wouldn't it be better to > add an option --validate-series (git config sendemail.validateSeries) > that would change the behaviour of the sendemail-validate hook? Calling > it with all patch files as command line arguments only once instead of > once per file. I do not see much upside in doing so. Instead of having to write a new validate-series hook script to store it under a new name, users can update an existing validate-patch hook script to take a batch of patches in one go. Then the user now has to set a new configuration variable. Because the expected way to feed the hook script is *not* per invocation but depends solely on how the script is written, a command line option would not make much sense. Not having to rename the updated validate-patch script to validate-series might be a small win, but I do not think it is a compelling reason to take that approach. > I know I was concerned with the max size of the command line args but is > there really a chance that we hit that maximum? On my system, it is > 2097152 bytes. Even with a 1000 patches series with 1000 bytes > filenames, we wouldn't hit the limit. > > This way, we support any crap filename that the user may send and we > don't add a new hook which basically does the same thing than the > existing one. Between "we may exceed command line argument limit" (which by the way is way lower on certain systems than what you expect, IIRC) and "the user may throw us a file with LF in its name", I'd find it simpler to punt on the latter and tell them "if it hurts, don't do it". As long as the limitation is clearly documented, I'd say it is OK. Thanks.
diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt index 765b2df8530d..45113b928593 100644 --- a/Documentation/git-send-email.txt +++ b/Documentation/git-send-email.txt @@ -438,6 +438,7 @@ have been specified, in which case default to 'compose'. + -- * Invoke the sendemail-validate hook if present (see linkgit:githooks[5]). + * Invoke the sendemail-validate-series hook if present (see linkgit:githooks[5]). * Warn of patches that contain lines longer than 998 characters unless a suitable transfer encoding ('auto', 'base64', or 'quoted-printable') is used; diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 62908602e7be..b81783235111 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -600,6 +600,23 @@ the name of the file that holds the e-mail to be sent. Exiting with a non-zero status causes `git send-email` to abort before sending any e-mails. +sendemail-validate-series +~~~~~~~~~~~~~~~~~~~~~~~~~ + +This hook is invoked by linkgit:git-send-email[1]. It allows performing +validation on a complete patch series at once, instead of patch by patch with +`sendemail-validate`. + +`sendemail-validate-series` takes no arguments, but for each e-mail to be sent +it receives on standard input a line of the format: + + <patch-file> LF + +where `<patch-file>` is a name of a file that holds an e-mail to be sent, + +If the hook exits with non-zero status, `git send-email` will abort before +sending any e-mails. + fsmonitor-watchman ~~~~~~~~~~~~~~~~~~ diff --git a/git-send-email.perl b/git-send-email.perl index 07f2a0cbeaad..bec4d0f4ab47 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -800,6 +800,7 @@ sub is_format_patch_arg { validate_patch($f, $target_xfer_encoding); } } + validate_patch_series(@files) } if (@files) { @@ -2125,6 +2126,47 @@ sub validate_patch { return; } +sub validate_patch_series { + my @files = @_; + + unless ($repo) { + return; + } + + my $hook_name = 'sendemail-validate-series'; + my $hooks_path = $repo->command_oneline('rev-parse', '--git-path', 'hooks'); + require File::Spec; + my $validate_hook = File::Spec->catfile($hooks_path, $hook_name); + my $hook_error; + unless (-x $validate_hook) { + return; + } + + # The hook needs a correct cwd and GIT_DIR. + require Cwd; + my $cwd_save = Cwd::getcwd(); + chdir($repo->wc_path() or $repo->repo_path()) or die("chdir: $!"); + local $ENV{"GIT_DIR"} = $repo->repo_path(); + # cannot use git hook run, it closes stdin before forking the hook + open(my $stdin, "|-", $validate_hook) or die("fork: $!"); + chdir($cwd_save) or die("chdir: $!"); + for my $fn (@files) { + unless (-p $fn) { + $fn = Cwd::abs_path($fn); + $stdin->print("$fn\n"); + } + } + close($stdin); # calls waitpid + if ($? & 0x7f) { + my $sig = $? & 0x7f; + die("fatal: hook $hook_name killed by signal $sig"); + } elsif ($? >> 8) { + my $err = $? >> 8; + die("fatal: hook $hook_name rejected patch series (exit code $err)"); + } + return; +} + sub handle_backup { my ($last, $lastlen, $file, $known_suffix) = @_; my ($suffix, $skip);
When sending patch series (with a cover-letter or not) sendemail-validate is called with every email/patch file independently from the others. When one of the patches depends on a previous one, it may not be possible to use this hook in a meaningful way. Changing sendemail-validate to take all patches as arguments would break backward compatibility. Add a new hook to allow validating patch series instead of patch by patch. The patch files are provided in the hook script standard input. git hook run cannot be used since it closes the hook standard input. Run the hook directly. Signed-off-by: Robin Jarry <robin@jarry.cc> --- Rebased on 140b9478dad5 ("The sixth batch") Documentation/git-send-email.txt | 1 + Documentation/githooks.txt | 17 +++++++++++++ git-send-email.perl | 42 ++++++++++++++++++++++++++++++++ 3 files changed, 60 insertions(+)