diff mbox series

[v2] send-email: export patch counters in validate environment

Message ID 20230412095434.140754-1-robin@jarry.cc (mailing list archive)
State Superseded
Headers show
Series [v2] send-email: export patch counters in validate environment | expand

Commit Message

Robin Jarry April 12, 2023, 9:54 a.m. UTC
When sending patch series (with a cover-letter or not)
sendemail-validate is called with every email/patch file independently
from the others. When one of the patches depends on a previous one, it
may not be possible to use this hook in a meaningful way. A hook that
wants to check some property of the whole series needs to know which
patch is the final one.

Expose the current and total number of patches to the hook via the
GIT_SENDEMAIL_PATCH_COUNTER and GIT_SENDEMAIL_PATCH_TOTAL environment
variables so that both incremental and global validation is possible.

Sharing any other state between successive invocations of the validate
hook must be done via external means. For example, by storing it in
a git config sendemail.validateWorkdir entry.

Add a sample script with placeholder validations.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Robin Jarry <robin@jarry.cc>
---

Notes:
    v1 -> v2:
    
    * Added more details in documentation.
    * Exclude FIFOs from COUNT/TOTAL
    * Only set TOTAL once.
    * Only unset COUNT/TOTAL once.
    * Add sample hook script.

 Documentation/githooks.txt                 | 22 ++++++
 git-send-email.perl                        | 17 ++++-
 templates/hooks--sendemail-validate.sample | 84 ++++++++++++++++++++++
 3 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100755 templates/hooks--sendemail-validate.sample

Comments

Junio C Hamano April 12, 2023, 5:53 p.m. UTC | #1
Robin Jarry <robin@jarry.cc> writes:

>  if ($validate) {
> +	# FIFOs can only be read once, exclude them from validation.

It is very good to see this comment here, as it may not be obvious
to everybody why we exclude them.

> +validate_cover_letter()
> +{

See Documentation/CodingGuidelines, look for "For shell scripts
specifically" and follow what is in the section.  There may be style
violations of other kinds in the file.

> +validate_patch()
> +{
> +	file="$1"
> +	# Ensure that the patch applies without conflicts to the latest
> +	# upstream version.

That comment is true only for the first one.  The second patch needs
to apply to the upstream plus the first patch, and so on.

> +	git am -3 "$file" || die "failed to apply patch on upstream repo"
> +	# XXX: Add appropriate checks here (e.g. checkpatch.pl).
> +}
> +
> +validate_series()
> +{
> +	# XXX: Add appropriate checks here (e.g. quick build, etc.).
> +}
> +
> +die()
> +{
> +	echo "sendemail-validate: error: $*" >&2
> +	exit 1
> +}
> +
> +get_work_dir()
> +{
> +	git config --get sendemail.validateWorkdir || {
> +		# Initialize it to a temp dir, if unset.
> +		git config --add sendemail.validateWorkdir "$(mktemp -d)"
> +		git config --get sendemail.validateWorkdir
> +	}
> +}
> +
> +get_upstream_url()
> +{
> +	git config --get remote.origin.url ||
> +		die "cannot get remote.origin.url"
> +}
> +
> +clone_upstream()
> +{
> +	workdir="$1"
> +	url="$(get_upstream_url)"
> +	rm -rf -- "$workdir"
> +	git clone --depth=1 "$url" "$workdir" ||
> +		die "failed to clone upstream repository"
> +}

Style-wise, it is better to get rid of get_upstream_url and write
the above more like

	workdir=$1 &&
	url=$(git config remote.originurl) &&
	rm -r -- "$workdir" &&
	git clone ... ||
	die "failed to ..."

and that would be less error prone (e.g. you will catch failure from
"rm" yourself, instead of relying on "git clone" to catch it for
you).

In any case, I would avoid network traffic and extra disk usage if I
were showing an example for readers to follow, and would not
recommend you to use "clone" here, even if it were a shallow one.

It would make much more sense to create a secondary worktree based
on this repository, with its HEAD detached at the copy of the target
branch (e.g. refs/remotes/origin/HEAD), and use that secondary
worktree, as the necessary objects for "am -3" to fall back on are
more likely to be found in such a setting, compared to a shallow
clone that only can have the blobs at the tip.

> +
> +# main -------------------------------------------------------------------------

> +workdir=$(get_work_dir)
> +if [ "$GIT_SENDEMAIL_FILE_COUNTER" = 1 ]; then
> +	clone_upstream "$workdir"
> +fi
> +cd "$workdir"
> +export GIT_DIR="$workdir/.git"

It is a good discipline to always set GIT_DIR and GIT_WORK_TREE as a
pair.  Working in a subdirectory of a working tree becomes awkward,
because the presence of the former without the latter signals that
your $(cwd) is at the top of the working tree.

But that is more or less moot, because I am suggesting not to use
"git clone" to prepare the playground and instead use a secondary
worktree that is attached to the same current repository, so GIT_DIR
would be the same as the current one.

And because you are "cd"ing there anyway, it probably is much
simpler to just 

    unset GIT_DIR GIT_WORK_TREE

to let the repository discovery mechanism take care of it.

> +if grep -q "^diff --git " "$1"; then
> +	validate_patch "$1"
> +else
> +	validate_cover_letter "$1"
> +fi
> +
> +if [ "$GIT_SENDEMAIL_FILE_COUNTER" = "$GIT_SENDEMAIL_FILE_TOTAL" ]; then
> +	validate_series || die "patch series was rejected"
> +fi

It is uneven that validate_patch and validate_cover_letter are
responsible for dying when problem is found, but validate_series is
not and the caller is made responsible for that.

I would make the caller responsible for dying with message for all
three by removing the calls to "die" or "exit" from the former two,
if I were showing an example for readers to follow.

Overall, a very well crafted patch, even though little details and
some design choices can be improved.

Thanks.
Robin Jarry April 12, 2023, 6:33 p.m. UTC | #2
Hi Junio,

Junio C Hamano, Apr 12, 2023 at 19:53:
> See Documentation/CodingGuidelines, look for "For shell scripts
> specifically" and follow what is in the section.  There may be style
> violations of other kinds in the file.

I had missed that one. I'll have a look.

> > +	# Ensure that the patch applies without conflicts to the latest
> > +	# upstream version.
>
> That comment is true only for the first one.  The second patch needs
> to apply to the upstream plus the first patch, and so on.

Will adjust this as well.

> Style-wise, it is better to get rid of get_upstream_url and write
> the above more like
>
> 	workdir=$1 &&
> 	url=$(git config remote.originurl) &&
> 	rm -r -- "$workdir" &&
> 	git clone ... ||
> 	die "failed to ..."
>
> and that would be less error prone (e.g. you will catch failure from
> "rm" yourself, instead of relying on "git clone" to catch it for
> you).

I have added set -e at the beginning of the script, specifically to
avoid the chained && commands which make the code hard to read. If any
command returns/exits with a non-zero status which is not handled by an
if or by a ||, the shell script will exit.

I can probably get rid of the explicit die statements because of this.

> In any case, I would avoid network traffic and extra disk usage if I
> were showing an example for readers to follow, and would not
> recommend you to use "clone" here, even if it were a shallow one.
>
> It would make much more sense to create a secondary worktree based
> on this repository, with its HEAD detached at the copy of the target
> branch (e.g. refs/remotes/origin/HEAD), and use that secondary
> worktree, as the necessary objects for "am -3" to fall back on are
> more likely to be found in such a setting, compared to a shallow
> clone that only can have the blobs at the tip.

I have never used secondary worktrees. My original thinking was that the
local repository may not be up to date compared to the upstream and
running git fetch on the local repo seemed like a bad idea. Would there
be a proper way to do this with secondary worktree?

There may not be an elegant generic solution here. $(git config
remote.origin.url) may not even contain the proper upstream url...

Also, if I understand how worktrees function, applying patches in
a detached HEAD will create blobs in the current git dir. These will
eventually be garbage collected but I wonder if that could be a problem.

> It is a good discipline to always set GIT_DIR and GIT_WORK_TREE as a
> pair.  Working in a subdirectory of a working tree becomes awkward,
> because the presence of the former without the latter signals that
> your $(cwd) is at the top of the working tree.
>
> But that is more or less moot, because I am suggesting not to use
> "git clone" to prepare the playground and instead use a secondary
> worktree that is attached to the same current repository, so GIT_DIR
> would be the same as the current one.
>
> And because you are "cd"ing there anyway, it probably is much
> simpler to just
>
>     unset GIT_DIR GIT_WORK_TREE
>
> to let the repository discovery mechanism take care of it.

Depending on whether I use a worktree or not, I will unset these
variables.

> It is uneven that validate_patch and validate_cover_letter are
> responsible for dying when problem is found, but validate_series is
> not and the caller is made responsible for that.
>
> I would make the caller responsible for dying with message for all
> three by removing the calls to "die" or "exit" from the former two,
> if I were showing an example for readers to follow.

Agreed, this is inconsistent. My original intent was to provide more
explicit error messages but it is probably not necessary.

As explained above, `set -e` will force early exit if any command fails
without being explicitly handled. I will remove die/exit calls.

> Overall, a very well crafted patch, even though little details and
> some design choices can be improved.

Thanks for the careful review!
Junio C Hamano April 12, 2023, 8:37 p.m. UTC | #3
"Robin Jarry" <robin@jarry.cc> writes:

> Also, if I understand how worktrees function, applying patches in
> a detached HEAD will create blobs in the current git dir. These will
> eventually be garbage collected but I wonder if that could be a problem.

You are the user who just ran format-patch to prepare sending out
the patches, and you are checking your patches.  Wouldn't you have
the blobs already anyways?

> As explained above, `set -e` will force early exit if any command fails
> without being explicitly handled. I will remove die/exit calls.

I'd rather not to see anybody go in that direction.  "set -e" is a
poor substitute for a properly designed error handling.  Between

	set -e
	command A
	command B
	command C

and

	command A &&
	command B &&
	command C || die message

the former can only say "command B" failed because command B was run
under some condition that it did not like, but that is too low level
an error that is close to the implementation.  As opposed to the
latter that can talk about what it _means_ that any one of these
three commands did not succeed in the end-user's terms.

Thanks.
Robin Jarry April 12, 2023, 8:39 p.m. UTC | #4
Junio C Hamano, Apr 12, 2023 at 22:37:
> You are the user who just ran format-patch to prepare sending out
> the patches, and you are checking your patches.  Wouldn't you have
> the blobs already anyways?

But these will be new blobs since we are applying the patches from files
into another detached branch.

> I'd rather not to see anybody go in that direction.  "set -e" is a
> poor substitute for a properly designed error handling.  Between
>
> 	set -e
> 	command A
> 	command B
> 	command C
>
> and
>
> 	command A &&
> 	command B &&
> 	command C || die message
>
> the former can only say "command B" failed because command B was run
> under some condition that it did not like, but that is too low level
> an error that is close to the implementation.  As opposed to the
> latter that can talk about what it _means_ that any one of these
> three commands did not succeed in the end-user's terms.

Ok.
Junio C Hamano April 12, 2023, 9:48 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> Overall, a very well crafted patch, even though little details and
> some design choices can be improved.

One thing I forgot to mention.  We probably want some test, perhaps
adding something like the following to t9001 after we already test
for --validate.

Thanks.

----------- >8 ---------------------- >8 ---------------------- >8 -----------
expected_file_counter_output () {
	total=$1
	count=0
	while test $count -ne $total
	do
		count=$((count + 1)) &&
		echo "$count/$total" || return
	done
}

test_expect_success $PREREQ '--validate hook allows counting of messages' '
	test_when_finished "rm my-hooks.log" &&
	test_config core.hooksPath "my-hooks" &&

	write_script my-hooks/sendemail-validate <<-\EOF &&
		num=$GIT_SENDEMAIL_FILE_COUNTER &&
		tot=$GIT_SENDEMAIL_FILE_TOTAL &&
		echo "$num/$tot" >>my-hooks.log || exit 1
	EOF

	>my-hooks.log &&
	expected_file_counter_output 1 >expect &&
	git send-email \
		--from="Example <from@example.com>" \
		--to=nobody@example.com \
		--smtp-server="$(pwd)/fake.sendmail" \
		--validate $patches &&
	test_cmp expect my-hooks.log
'
diff mbox series

Patch

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 62908602e7be..c8e55b2613f5 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -600,6 +600,28 @@  the name of the file that holds the e-mail to be sent.  Exiting with a
 non-zero status causes `git send-email` to abort before sending any
 e-mails.
 
+The following environment variables are set when executing the hook.
+
+`GIT_SENDEMAIL_FILE_COUNTER`::
+	A 1-based counter incremented by one for every file holding an e-mail
+	to be sent (excluding any FIFOs). This counter does not follow the
+	patch series counter scheme. It will always start at 1 and will end at
+	GIT_SENDEMAIL_FILE_TOTAL.
+
+`GIT_SENDEMAIL_FILE_TOTAL`::
+	The total number of files that will be sent (excluding any FIFOs). This
+	counter does not follow the patch series counter scheme. It will always
+	be equal to the number of files being sent, whether there is a cover
+	letter or not.
+
+These variables may for instance be used to validate patch series.
+
+The sample `sendemail-validate` hook that comes with Git checks that all sent
+patches (excluding the cover letter) can be applied on top of the upstream
+repository default branch without conflicts. Some placeholders are left for
+additional validation steps to be performed after all patches of a given series
+have been applied.
+
 fsmonitor-watchman
 ~~~~~~~~~~~~~~~~~~
 
diff --git a/git-send-email.perl b/git-send-email.perl
index 07f2a0cbeaad..497ec0354790 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -795,11 +795,26 @@  sub is_format_patch_arg {
 @files = handle_backup_files(@files);
 
 if ($validate) {
+	# FIFOs can only be read once, exclude them from validation.
+	my @real_files = ();
 	foreach my $f (@files) {
 		unless (-p $f) {
-			validate_patch($f, $target_xfer_encoding);
+			push(@real_files, $f);
 		}
 	}
+
+	# Run the loop once again to avoid gaps in the counter due to FIFO
+	# arguments provided by the user.
+	my $num = 1;
+	my $num_files = scalar @real_files;
+	$ENV{GIT_SENDEMAIL_FILE_TOTAL} = "$num_files";
+	foreach my $r (@real_files) {
+		$ENV{GIT_SENDEMAIL_FILE_COUNTER} = "$num";
+		validate_patch($r, $target_xfer_encoding);
+		$num += 1;
+	}
+	delete $ENV{GIT_SENDEMAIL_FILE_COUNTER};
+	delete $ENV{GIT_SENDEMAIL_FILE_TOTAL};
 }
 
 if (@files) {
diff --git a/templates/hooks--sendemail-validate.sample b/templates/hooks--sendemail-validate.sample
new file mode 100755
index 000000000000..c898ee3ab167
--- /dev/null
+++ b/templates/hooks--sendemail-validate.sample
@@ -0,0 +1,84 @@ 
+#!/bin/sh
+
+# An example hook script to validate a patch (and/or patch series) before
+# sending it via email.
+#
+# The hook should exit with non-zero status after issuing an appropriate
+# message if it wants to prevent the email(s) from being sent.
+#
+# To enable this hook, rename this file to "sendemail-validate".
+#
+# By default, it will only check that the patch(es) can be applied on top of
+# the default upstream branch without conflicts. Replace the XXX placeholders
+# with appropriate checks according to your needs.
+
+set -e
+
+validate_cover_letter()
+{
+	file="$1"
+	# XXX: Add appropriate checks here (e.g. spell checking).
+}
+
+validate_patch()
+{
+	file="$1"
+	# Ensure that the patch applies without conflicts to the latest
+	# upstream version.
+	git am -3 "$file" || die "failed to apply patch on upstream repo"
+	# XXX: Add appropriate checks here (e.g. checkpatch.pl).
+}
+
+validate_series()
+{
+	# XXX: Add appropriate checks here (e.g. quick build, etc.).
+}
+
+die()
+{
+	echo "sendemail-validate: error: $*" >&2
+	exit 1
+}
+
+get_work_dir()
+{
+	git config --get sendemail.validateWorkdir || {
+		# Initialize it to a temp dir, if unset.
+		git config --add sendemail.validateWorkdir "$(mktemp -d)"
+		git config --get sendemail.validateWorkdir
+	}
+}
+
+get_upstream_url()
+{
+	git config --get remote.origin.url ||
+		die "cannot get remote.origin.url"
+}
+
+clone_upstream()
+{
+	workdir="$1"
+	url="$(get_upstream_url)"
+	rm -rf -- "$workdir"
+	git clone --depth=1 "$url" "$workdir" ||
+		die "failed to clone upstream repository"
+}
+
+# main -------------------------------------------------------------------------
+
+workdir=$(get_work_dir)
+if [ "$GIT_SENDEMAIL_FILE_COUNTER" = 1 ]; then
+	clone_upstream "$workdir"
+fi
+cd "$workdir"
+export GIT_DIR="$workdir/.git"
+
+if grep -q "^diff --git " "$1"; then
+	validate_patch "$1"
+else
+	validate_cover_letter "$1"
+fi
+
+if [ "$GIT_SENDEMAIL_FILE_COUNTER" = "$GIT_SENDEMAIL_FILE_TOTAL" ]; then
+	validate_series || die "patch series was rejected"
+fi