[v2,01/30] hook: add 'run' subcommand

Message ID	patch-01.30-447d349c738-20210614T101920Z-avarab@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Emily Shaffer <emilyshaffer@google.com>, Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>, Felipe Contreras <felipe.contreras@gmail.com>, Eric Sunshine <sunshine@sunshineco.com>, "brian m . carlson" <sandals@crustytoothpaste.net>, Josh Steadmon <steadmon@google.com>, Jonathan Tan <jonathantanmy@google.com>, Derrick Stolee <stolee@gmail.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFy?= =?utf-8?b?bWFzb24=?= <avarab@gmail.com> Subject: [PATCH v2 01/30] hook: add 'run' subcommand Date: Mon, 14 Jun 2021 12:32:50 +0200 Message-Id: <patch-01.30-447d349c738-20210614T101920Z-avarab@gmail.com> In-Reply-To: <cover-00.30-00000000000-20210614T101920Z-avarab@gmail.com> References: <cover-00.31-00000000000-20210528T110515Z-avarab@gmail.com> <cover-00.30-00000000000-20210614T101920Z-avarab@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Minimal restart of "config-based-hooks" \| expand [v2,00/30] Minimal restart of "config-based-hooks" [v2,01/30] hook: add 'run' subcommand [v2,02/30] run-command.h: move find_hook() to hook.h [v2,03/30] hook.c: add a hook_exists() wrapper and use it in bugreport.c [v2,04/30] gc: use hook library for pre-auto-gc hook [v2,05/30] rebase: teach pre-rebase to use hook.h [v2,06/30] am: convert applypatch hooks to use config [v2,07/30] hooks: convert 'post-checkout' hook to hook library [v2,08/30] merge: use config-based hooks for post-merge hook [v2,09/30] git hook run: add an --ignore-missing flag [v2,10/30] send-email: use 'git hook run' for 'sendemail-validate' [v2,11/30] git-p4: use 'git hook' to run hooks [v2,12/30] commit: use hook.h to execute hooks [v2,13/30] read-cache: convert post-index-change hook to use config [v2,14/30] receive-pack: convert push-to-checkout hook to hook.h [v2,15/30] run-command: remove old run_hook_{le,ve}() hook API [v2,16/30] run-command: allow stdin for run_processes_parallel [v2,17/30] hook: support passing stdin to hooks [v2,18/30] am: convert 'post-rewrite' hook to hook.h [v2,19/30] run-command: add stdin callback for parallelization [v2,20/30] hook: provide stdin by string_list or callback [v2,21/30] hook: convert 'post-rewrite' hook in sequencer.c to hook.h [v2,22/30] transport: convert pre-push hook to use config [v2,23/30] reference-transaction: use hook.h to run hooks [v2,24/30] run-command: allow capturing of collated output [v2,25/30] hooks: allow callers to capture output [v2,26/30] receive-pack: convert 'update' hook to hook.h [v2,27/30] post-update: use hook.h library [v2,28/30] receive-pack: convert receive hooks to hook.h [v2,29/30] hooks: fix a TOCTOU in "did we run a hook?" heuristic [v2,30/30] hook-list.h: add a generated list of hooks, like config-list.h

Ævar Arnfjörð Bjarmason June 14, 2021, 10:32 a.m. UTC

From: Emily Shaffer <emilyshaffer@google.com>

In order to enable hooks to be run as an external process, by a
standalone Git command, or by tools which wrap Git, provide an external
means to run all configured hook commands for a given hook event.

Most of our hooks require more complex functionality than this, but
let's start with the bare minimum required to support our simplest
hooks.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 .gitignore                 |   1 +
 Documentation/git-hook.txt |  36 ++++++++++
 Documentation/githooks.txt |   4 ++
 Makefile                   |   2 +
 builtin.h                  |   1 +
 builtin/hook.c             |  65 ++++++++++++++++++
 command-list.txt           |   1 +
 git.c                      |   1 +
 hook.c                     | 114 ++++++++++++++++++++++++++++++++
 hook.h                     |  54 +++++++++++++++
 t/t1800-hook.sh            | 131 +++++++++++++++++++++++++++++++++++++
 11 files changed, 410 insertions(+)
 create mode 100644 Documentation/git-hook.txt
 create mode 100644 builtin/hook.c
 create mode 100644 hook.c
 create mode 100644 hook.h
 create mode 100755 t/t1800-hook.sh

Emily Shaffer June 14, 2021, 9:33 p.m. UTC | #1

On Mon, Jun 14, 2021 at 12:32:50PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> In order to enable hooks to be run as an external process, by a
> standalone Git command, or by tools which wrap Git, provide an external
> means to run all configured hook commands for a given hook event.

From what it says on the box, I'm slightly worried about this patch
doing too much at once, but let's see... (I think this is also a common
thing you and I disagree on - how much work to do per commit - so feel
free to ignore me ;) )

> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Thanks for including attribution - I appreciate it.

> diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
> new file mode 100644
> index 00000000000..902b9cffaef
> --- /dev/null
> +++ b/Documentation/git-hook.txt
> @@ -0,0 +1,36 @@
> +git-hook(1)
> +===========
> +
> +NAME
> +----
> +git-hook - run git hooks
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git hook' run <hook-name> [-- <hook-args>]

Interesting. This is definitely more user friendly than `-a foo -a bar -a
aagh` ;)

Can we think of a scenario when a user might want to alias to part of
'git hook run' with an argument provided, but still wish to use the hook
more generally? I thought maybe something like `git hook run -a
"--user=Bob"` for a user who has a handful of hooks that can take some
additional argument, but then I realized that most hooks need to meet a
contract with which args they accept, so this isn't a reasonable use
case.

I also wondered whether accepting hook args this way implied that we
can't also provide environment vars for the hooks later on, but I think
it's fine to have those two interfaces be asymmetrical, e.g. `git hook
run -e "USERID=Bob" -- blah.txt`.

So I like this way of accepting them :)

[snip]
> +run::
> +
> +	Run the `<hook-name>` hook. Any positional arguments to the
> +	hook should be passed after an optional "--" (or
> +	"--end-of-options"). See "OPTIONS" below for the arguments
> +	this accepts.

Is it clear enough that users will need to provide arguments to certain
hooks? (Should this have some reference to githooks.txt?)

The "OPTIONS" reference is stale - there is no OPTIONS header in the
manpage now.

> --- /dev/null
> +++ b/builtin/hook.c
> @@ -0,0 +1,65 @@
> +#include "cache.h"
> +#include "builtin.h"
> +#include "config.h"
> +#include "hook.h"
> +#include "parse-options.h"
> +#include "strbuf.h"
> +#include "strvec.h"
> +
> +static const char * const builtin_hook_usage[] = {
> +	N_("git hook run <hook-name> [-- <hook-args>]"),
> +	NULL
> +};
> +
> +static int run(int argc, const char **argv, const char *prefix)
> +{
> +	int i;
> +	struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
> +	int rc = 0;
> +	const char *hook_name;
> +	const char *hook_path;
> +
> +	struct option run_options[] = {
> +		OPT_END(),
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, run_options,
> +			     builtin_hook_usage, PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);
> +
> +	if (argc > 2) {
> +		if (strcmp(argv[2], "--") &&
> +		    strcmp(argv[2], "--end-of-options"))
> +			/* Having a -- for "run" is mandatory */
> +			usage_with_options(builtin_hook_usage, run_options);
> +		/* Add our arguments, start after -- */
> +		for (i = 3 ; i < argc; i++)
> +			strvec_push(&opt.args, argv[i]);
> +	}
> +
> +	/* Need to take into account core.hooksPath */
> +	git_config(git_default_config, NULL);
> +
> +	hook_name = argv[1];
> +	hook_path = find_hook(hook_name);
> +	if (!hook_path) {
> +		error("cannot find a hook named %s", hook_name);
> +		return 1;
> +	}
> +	rc = run_found_hooks(hook_name, hook_path, &opt);

Hum, what's the reasoning for not letting the hook.h call look up the
hook path for itself? I scanned through the v1 cover and older version
of this patch and didn't see any reasoning. To me, having the builtin
look up paths feels like incorrect layering.

> +int cmd_hook(int argc, const char **argv, const char *prefix)
> +{
> +	struct option builtin_hook_options[] = {
> +		OPT_END(),
> +	};
> +
> +	if (!strcmp(argv[1], "run"))
> +		return run(argc, argv, prefix);

Hum. This means that 'run' will still be included in argv for run(),
which I see that it works around silently. I personally find that to be
confusing - maybe at least a comment pointing it out, if you don't like
the idea of adjusting argv before passing it to run()?

> diff --git a/git.c b/git.c
> index 18bed9a9964..540909c391f 100644
> --- a/git.c
> +++ b/git.c
> @@ -538,6 +538,7 @@ static struct cmd_struct commands[] = {
>  	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
>  	{ "hash-object", cmd_hash_object },
>  	{ "help", cmd_help },
> +	{ "hook", cmd_hook, RUN_SETUP },

Hm. RUN_SETUP requires a gitdir, which I suppose makes sense as this is
a pre-config-hooks world.

Does this mean "git send-email" will abort if I try to run it with no
gitdir (which I often do)? I looked ahead to patch 10 and it doesn't
look like there's a significant change to the error handling, so I guess
that if it works for me today, it will work for me with this change too.

Later config-based hooks will mean that such hooks could exist without a
gitdir, but we can cross that bridge when we get there :)

> --- /dev/null
> +++ b/hook.c
> @@ -0,0 +1,114 @@
> +#include "cache.h"
> +#include "hook.h"
> +#include "run-command.h"
> +
> +void run_hooks_opt_clear(struct run_hooks_opt *o)
> +{
> +	strvec_clear(&o->env);
> +	strvec_clear(&o->args);

Maybe more graceful to nullcheck within the _clear() function before
dereferencing 'o'? That way callers don't need to worry about NULL
checking on their end.

> +static int pick_next_hook(struct child_process *cp,
> +			  struct strbuf *out,
> +			  void *pp_cb,
> +			  void **pp_task_cb)
> +{
> +	struct hook_cb_data *hook_cb = pp_cb;
> +	struct hook *run_me = hook_cb->run_me;
> +
> +	if (!run_me)
> +		BUG("did we not return 1 in notify_hook_finished?");

I'm not sure I like this message, even as a BUG(), although the things
I'd rather say ("run_me was NULL unexpectedly!") are obvious as soon as
you grep the codebase. So I think I dislike it for no reason :)

[...]

> +static int notify_start_failure(struct strbuf *out,
> +				void *pp_cb,
> +				void *pp_task_cp)
> +{
> +	struct hook_cb_data *hook_cb = pp_cb;
> +	struct hook *attempted = pp_task_cp;
> +
> +	/* |= rc in cb */
> +	hook_cb->rc |= 1;

Yuck, I think I wrote this comment... yikes. Maybe something like
"hook_cb->rc reflects cumulative failure state" instead?

> +static int notify_hook_finished(int result,
> +				struct strbuf *out,
> +				void *pp_cb,
> +				void *pp_task_cb)
> +{
> +	struct hook_cb_data *hook_cb = pp_cb;
> +
> +	/* |= rc in cb */
> +	hook_cb->rc |= result;

(And same as above.)

> +int run_found_hooks(const char *hook_name, const char *hook_path,
> +		    struct run_hooks_opt *options)
> +{
> +	struct hook my_hook = {
> +		.hook_path = hook_path,

As mentioned earlier, I think it is neater - and better for config-based
hooks in the future - if my_hook.hook_path is set by find_hooks()
directly instead of by being passed in, here. (I expect you did it this
way because one of the later hooks lives in an odd place - but I seem to
remember that one being strange in other ways, too, and I ended up
letting it manage its own affairs in my attempt. So I'll look forward to
seeing whether you handled that differently.)

[...]

> +int run_hooks(const char *hook_name, struct run_hooks_opt *options)
> +{
> +	const char *hook_path;
> +	int ret;
> +	if (!options)
> +		BUG("a struct run_hooks_opt must be provided to run_hooks");
> +
> +	hook_path = find_hook(hook_name);
> +
> +	/* Care about nonexistence? Use run_found_hooks() */
> +	if (!hook_path)
> +		return 0;

Ah, I see - you've done it this way so that builtin/hook.c can complain
"You tried to run pre-commit hook but you don't even have one!".

Hm. I think I dislike this comment for the same reason I dislike the one
much earlier in this patch - it's different from how I would have
written it. But I do think it still conveys the exact same information (I
would have said "If you need to act on a missing hook, use
run_found_hooks() instead") so chalk it up to difference in tone
preferences and ignore me :)

> diff --git a/hook.h b/hook.h
[...]
> +	/* Number of threads to parallelize across */
> +	int jobs;

I wonder whether it's worth changing the comments here...

[...]
> +/*
> + * Callback provided to feed_pipe_fn and consume_sideband_fn.
> + */

...and here, since they don't mean anything in the context of this
specific commit? But they will mean something later on in the series.

> +/*
> + * Calls find_hook(hookname) and runs the hooks (if any) with
> + * run_found_hooks().
> + */
> +int run_hooks(const char *hook_name, struct run_hooks_opt *options);
> +
> +/*
> + * Takes an already resolved hook and runs it. Internally the simpler
> + * run_hooks() will call this.
> + */
> +int run_found_hooks(const char *hookname, const char *hook_path,
> +		    struct run_hooks_opt *options);

The comments in the header here resolve any concerns I had about the
comments in the run_hooks() implementation. I like these a lot.

> diff --git a/t/t1800-hook.sh b/t/t1800-hook.sh
> new file mode 100755
> index 00000000000..f6ff6c4a493
> --- /dev/null
> +++ b/t/t1800-hook.sh
> @@ -0,0 +1,131 @@
> +#!/bin/bash
> +
> +test_description='git-hook command'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'git hook run -- nonexistent hook' '

Nit: Since you take '--' in 'git hook run' now, can you use something else as
a delimiter in the test names? I keep reading these as "here we will
call `git hook run -- nonexistent hook`" :/

> +	cat >stderr.expect <<-\EOF &&
> +	error: cannot find a hook named does-not-exist
> +	EOF
> +	test_expect_code 1 git hook run does-not-exist 2>stderr.actual &&
> +	test_cmp stderr.expect stderr.actual

I'm not wild about matching directly against the error message; that
means that the test will be a pain to update any time we update the
error message language. I'd prefer an approach where we check that the
error is for the reason we expect (by ensuring .git/hooks/does-not-exist
is not there in the fs) and then check that 'git hook run' fails, but do
not particularly care about the error message.

> +test_expect_success 'git hook run -- stdout and stderr handling' '

I have a slight preference towards "the name of the test tells me
exactly what is supposed to happen" - which means I'd prefer to see this
named "stdout and stderr both write to hook's stderr". Too chatty,
maybe, though.

> +test_expect_success 'git hook run -- out-of-repo runs excluded' '
> +	write_script .git/hooks/test-hook <<-EOF &&
> +	echo Test hook
> +	EOF
> +
> +	nongit test_must_fail git hook run test-hook

I wonder if it's necessary to enforce this. I'm just thinking, in a
config-based hook world later on, it will make sense to allow nongit
runs - specifically, I'd use the heck out of a send-email hook to fixup
my In-Reply-To lines, and I always run git-send-email from a nongit dir,
because I keep all my mailed patches stored away out of repo.

What's the general feeling towards "this is how it works, but we don't
have a good reason to require it"?

> +test_expect_success 'git -c core.hooksPath=<PATH> hook run' '
> +	mkdir my-hooks &&
> +	write_script my-hooks/test-hook <<-EOF &&
> +	echo Hook ran >>actual
> +	EOF
> +
> +	cat >expect <<-\EOF &&
> +	Test hook
> +	Hook ran
> +	Hook ran
> +	Hook ran
> +	Hook ran
> +	EOF

I'm not sure I like this - collecting multiple runs into one "actual"
and only comparing it once at the end. Are there other places in the
codebase that do this?

> +
> +	# Test various ways of specifying the path. See also
> +	# t1350-config-hooks-path.sh
> +	>actual &&
> +	git hook run test-hook 2>>actual &&
> +	git -c core.hooksPath=my-hooks hook run test-hook 2>>actual &&
> +	git -c core.hooksPath=my-hooks/ hook run test-hook 2>>actual &&
> +	git -c core.hooksPath="$PWD/my-hooks" hook run test-hook 2>>actual &&
> +	git -c core.hooksPath="$PWD/my-hooks/" hook run test-hook 2>>actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'set up a pre-commit hook in core.hooksPath' '
> +	>actual &&
> +	mkdir -p .git/custom-hooks .git/hooks &&
> +	write_script .git/custom-hooks/pre-commit <<-\EOF &&
> +	echo CUSTOM >>actual
> +	EOF
> +	write_script .git/hooks/pre-commit <<-\EOF
> +	echo NORMAL >>actual
> +	EOF
> +'

Is this setup test a leftover from a later commit?

Overall, I think I like the direction your reroll is going - I've needed
some time to process it. Hopefully I'll be able to get through all or
most of the series this week, but there's a lot going on here, too. I'll
do what I can. Thanks for the help.

 - Emily

Ævar Arnfjörð Bjarmason June 15, 2021, 9:36 a.m. UTC | #2

On Mon, Jun 14 2021, Emily Shaffer wrote:

> On Mon, Jun 14, 2021 at 12:32:50PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> In order to enable hooks to be run as an external process, by a
>> standalone Git command, or by tools which wrap Git, provide an external
>> means to run all configured hook commands for a given hook event.
>
> From what it says on the box, I'm slightly worried about this patch
> doing too much at once, but let's see... (I think this is also a common
> thing you and I disagree on - how much work to do per commit - so feel
> free to ignore me ;) )
>
>> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>
> Thanks for including attribution - I appreciate it.

They are almost entirely subsets of your patches, so I retained the
authorship, but added my own SOB. The only ones with my authorship are
something where I added fixes in-between or on top.

>> diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
>> new file mode 100644
>> index 00000000000..902b9cffaef
>> --- /dev/null
>> +++ b/Documentation/git-hook.txt
>> @@ -0,0 +1,36 @@
>> +git-hook(1)
>> +===========
>> +
>> +NAME
>> +----
>> +git-hook - run git hooks
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'git hook' run <hook-name> [-- <hook-args>]
>
> Interesting. This is definitely more user friendly than `-a foo -a bar -a
> aagh` ;)
>
> Can we think of a scenario when a user might want to alias to part of
> 'git hook run' with an argument provided, but still wish to use the hook
> more generally? I thought maybe something like `git hook run -a
> "--user=Bob"` for a user who has a handful of hooks that can take some
> additional argument, but then I realized that most hooks need to meet a
> contract with which args they accept, so this isn't a reasonable use
> case.

I can only think of ones where the [-- <hook-args>] would be sufficient.

> I also wondered whether accepting hook args this way implied that we
> can't also provide environment vars for the hooks later on, but I think
> it's fine to have those two interfaces be asymmetrical, e.g. `git hook
> run -e "USERID=Bob" -- blah.txt`.
>
> So I like this way of accepting them :)

I carved out the "-e" support from "run" as part of a general removal of
things not needed for the bug-for-bug compatibility with existing hook
interfaces, i.e. nothing needed it in this phase.

But I also can't see why a "hook run" would need to have an "-e",
anything that invokes the command has other more native ways of setting
the environment first, so the command just needs to avoid actively
clearing its own environment, no?

> [snip]
>> +run::
>> +
>> +	Run the `<hook-name>` hook. Any positional arguments to the
>> +	hook should be passed after an optional "--" (or
>> +	"--end-of-options"). See "OPTIONS" below for the arguments
>> +	this accepts.
>
> Is it clear enough that users will need to provide arguments to certain
> hooks? (Should this have some reference to githooks.txt?)

Probably, I also figured at this point it could be left as almost a
*--helper command.

> The "OPTIONS" reference is stale - there is no OPTIONS header in the
> manpage now.

Well spotted, will fix, or maybe you will :)

>> --- /dev/null
>> +++ b/builtin/hook.c
>> @@ -0,0 +1,65 @@
>> +#include "cache.h"
>> +#include "builtin.h"
>> +#include "config.h"
>> +#include "hook.h"
>> +#include "parse-options.h"
>> +#include "strbuf.h"
>> +#include "strvec.h"
>> +
>> +static const char * const builtin_hook_usage[] = {
>> +	N_("git hook run <hook-name> [-- <hook-args>]"),
>> +	NULL
>> +};
>> +
>> +static int run(int argc, const char **argv, const char *prefix)
>> +{
>> +	int i;
>> +	struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
>> +	int rc = 0;
>> +	const char *hook_name;
>> +	const char *hook_path;
>> +
>> +	struct option run_options[] = {
>> +		OPT_END(),
>> +	};
>> +
>> +	argc = parse_options(argc, argv, prefix, run_options,
>> +			     builtin_hook_usage, PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);
>> +
>> +	if (argc > 2) {
>> +		if (strcmp(argv[2], "--") &&
>> +		    strcmp(argv[2], "--end-of-options"))
>> +			/* Having a -- for "run" is mandatory */
>> +			usage_with_options(builtin_hook_usage, run_options);
>> +		/* Add our arguments, start after -- */
>> +		for (i = 3 ; i < argc; i++)
>> +			strvec_push(&opt.args, argv[i]);
>> +	}
>> +
>> +	/* Need to take into account core.hooksPath */
>> +	git_config(git_default_config, NULL);
>> +
>> +	hook_name = argv[1];
>> +	hook_path = find_hook(hook_name);
>> +	if (!hook_path) {
>> +		error("cannot find a hook named %s", hook_name);
>> +		return 1;
>> +	}
>> +	rc = run_found_hooks(hook_name, hook_path, &opt);
>
> Hum, what's the reasoning for not letting the hook.h call look up the
> hook path for itself? I scanned through the v1 cover and older version
> of this patch and didn't see any reasoning. To me, having the builtin
> look up paths feels like incorrect layering.

I see you end up addressing this yourself below, i.e. some users want to
silently ignore missing hooks, others want to error (as in this case).

>> +int cmd_hook(int argc, const char **argv, const char *prefix)
>> +{
>> +	struct option builtin_hook_options[] = {
>> +		OPT_END(),
>> +	};
>> +
>> +	if (!strcmp(argv[1], "run"))
>> +		return run(argc, argv, prefix);
>
> Hum. This means that 'run' will still be included in argv for run(),
> which I see that it works around silently. I personally find that to be
> confusing - maybe at least a comment pointing it out, if you don't like
> the idea of adjusting argv before passing it to run()?

It's not something I actively changed IIRC, i.e. it just came out of the
general refactoring of minimizing the command. I tihnk it would also
make sense (probably more sense) to adjust argv/argc as you suggest.

>> diff --git a/git.c b/git.c
>> index 18bed9a9964..540909c391f 100644
>> --- a/git.c
>> +++ b/git.c
>> @@ -538,6 +538,7 @@ static struct cmd_struct commands[] = {
>>  	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
>>  	{ "hash-object", cmd_hash_object },
>>  	{ "help", cmd_help },
>> +	{ "hook", cmd_hook, RUN_SETUP },
>
> Hm. RUN_SETUP requires a gitdir, which I suppose makes sense as this is
> a pre-config-hooks world.
>
> Does this mean "git send-email" will abort if I try to run it with no
> gitdir (which I often do)? I looked ahead to patch 10 and it doesn't
> look like there's a significant change to the error handling, so I guess
> that if it works for me today, it will work for me with this change too.
>
> Later config-based hooks will mean that such hooks could exist without a
> gitdir, but we can cross that bridge when we get there :)

Right, since it's a way more limited command used for only running hooks
by git itself it's a RUN_SETUP. I had some preliminary WIP work (based
on yours, obviously) to make it a RUN_SETUP_GENTLY, but within this
series it doesn't become needed.

>> --- /dev/null
>> +++ b/hook.c
>> @@ -0,0 +1,114 @@
>> +#include "cache.h"
>> +#include "hook.h"
>> +#include "run-command.h"
>> +
>> +void run_hooks_opt_clear(struct run_hooks_opt *o)
>> +{
>> +	strvec_clear(&o->env);
>> +	strvec_clear(&o->args);
>
> Maybe more graceful to nullcheck within the _clear() function before
> dereferencing 'o'? That way callers don't need to worry about NULL
> checking on their end.

I don't feel strongly about it, but I tihnk doing it this way is
consistent with other APIs in git. I.e. you do:

    struct something blah = BLAH_INIT;
    [...]
    blah_release(&blah);

And we simply assume that "blah" isn't NULL, and that any field
mandatory via the "INIT" initialization can be assumed not to be NULL.

>> +static int pick_next_hook(struct child_process *cp,
>> +			  struct strbuf *out,
>> +			  void *pp_cb,
>> +			  void **pp_task_cb)
>> +{
>> +	struct hook_cb_data *hook_cb = pp_cb;
>> +	struct hook *run_me = hook_cb->run_me;
>> +
>> +	if (!run_me)
>> +		BUG("did we not return 1 in notify_hook_finished?");
>
> I'm not sure I like this message, even as a BUG(), although the things
> I'd rather say ("run_me was NULL unexpectedly!") are obvious as soon as
> you grep the codebase. So I think I dislike it for no reason :)

I think I just added that as an assertion during development, could be
turned into somethin gelse.

> [...]
>
>> +static int notify_start_failure(struct strbuf *out,
>> +				void *pp_cb,
>> +				void *pp_task_cp)
>> +{
>> +	struct hook_cb_data *hook_cb = pp_cb;
>> +	struct hook *attempted = pp_task_cp;
>> +
>> +	/* |= rc in cb */
>> +	hook_cb->rc |= 1;
>
> Yuck, I think I wrote this comment... yikes. Maybe something like
> "hook_cb->rc reflects cumulative failure state" instead?

*nod*, more on this below...

>> +static int notify_hook_finished(int result,
>> +				struct strbuf *out,
>> +				void *pp_cb,
>> +				void *pp_task_cb)
>> +{
>> +	struct hook_cb_data *hook_cb = pp_cb;
>> +
>> +	/* |= rc in cb */
>> +	hook_cb->rc |= result;
>
> (And same as above.)

..and this...

>> +int run_found_hooks(const char *hook_name, const char *hook_path,
>> +		    struct run_hooks_opt *options)
>> +{
>> +	struct hook my_hook = {
>> +		.hook_path = hook_path,
>
> As mentioned earlier, I think it is neater - and better for config-based
> hooks in the future - if my_hook.hook_path is set by find_hooks()
> directly instead of by being passed in, here. (I expect you did it this
> way because one of the later hooks lives in an odd place - but I seem to
> remember that one being strange in other ways, too, and I ended up
> letting it manage its own affairs in my attempt. So I'll look forward to
> seeing whether you handled that differently.)

Yeah, the designated initializers made a few things much nicer.


>
> [...]
>
>> +int run_hooks(const char *hook_name, struct run_hooks_opt *options)
>> +{
>> +	const char *hook_path;
>> +	int ret;
>> +	if (!options)
>> +		BUG("a struct run_hooks_opt must be provided to run_hooks");
>> +
>> +	hook_path = find_hook(hook_name);
>> +
>> +	/* Care about nonexistence? Use run_found_hooks() */
>> +	if (!hook_path)
>> +		return 0;
>
> Ah, I see - you've done it this way so that builtin/hook.c can complain
> "You tried to run pre-commit hook but you don't even have one!".

Sounds better.

> Hm. I think I dislike this comment for the same reason I dislike the one
> much earlier in this patch - it's different from how I would have
> written it. But I do think it still conveys the exact same information (I
> would have said "If you need to act on a missing hook, use
> run_found_hooks() instead") so chalk it up to difference in tone
> preferences and ignore me :)
>
>> diff --git a/hook.h b/hook.h
> [...]
>> +	/* Number of threads to parallelize across */
>> +	int jobs;
>
> I wonder whether it's worth changing the comments here...

So more generally (continuing the "and this" above), I really tried to
not change the comments, structure, names, order etc. of your initial
series unless it was necessary for the stated aims of minimizing and
simplifying this.

That was all in order to make it easier for you to eventually rebase on
top of this, or to pick it up. I.e. it leaves most things as subsets of
corresponding patches of yours.

But if you think this approach is good / would want to pick it up and
run with it that would make me happy, and I think at that point you
should continue tweaking/adjusting anything like this that you want,
since the non-tweaking of it was something I did for your benefit, but
if you don't mind...

> [...]
>> +/*
>> + * Callback provided to feed_pipe_fn and consume_sideband_fn.
>> + */
>
> ...and here, since they don't mean anything in the context of this
> specific commit? But they will mean something later on in the series.

*ditto*

>> +/*
>> + * Calls find_hook(hookname) and runs the hooks (if any) with
>> + * run_found_hooks().
>> + */
>> +int run_hooks(const char *hook_name, struct run_hooks_opt *options);
>> +
>> +/*
>> + * Takes an already resolved hook and runs it. Internally the simpler
>> + * run_hooks() will call this.
>> + */
>> +int run_found_hooks(const char *hookname, const char *hook_path,
>> +		    struct run_hooks_opt *options);
>
> The comments in the header here resolve any concerns I had about the
> comments in the run_hooks() implementation. I like these a lot.
>
>> diff --git a/t/t1800-hook.sh b/t/t1800-hook.sh
>> new file mode 100755
>> index 00000000000..f6ff6c4a493
>> --- /dev/null
>> +++ b/t/t1800-hook.sh
>> @@ -0,0 +1,131 @@
>> +#!/bin/bash
>> +
>> +test_description='git-hook command'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'git hook run -- nonexistent hook' '
>
> Nit: Since you take '--' in 'git hook run' now, can you use something else as
> a delimiter in the test names? I keep reading these as "here we will
> call `git hook run -- nonexistent hook`" :/

Yeah, makes sense, maybe ": " more generally in this test?

>> +	cat >stderr.expect <<-\EOF &&
>> +	error: cannot find a hook named does-not-exist
>> +	EOF
>> +	test_expect_code 1 git hook run does-not-exist 2>stderr.actual &&
>> +	test_cmp stderr.expect stderr.actual
>
> I'm not wild about matching directly against the error message; that
> means that the test will be a pain to update any time we update the
> error message language. I'd prefer an approach where we check that the
> error is for the reason we expect (by ensuring .git/hooks/does-not-exist
> is not there in the fs) and then check that 'git hook run' fails, but do
> not particularly care about the error message.

I generally prefer it, but as noted above, "if you want to pick it up..." :)

FWIW yes I agree it's a bit of a pain, but having run into a lot of
things of the form:

    cmd 2>err &&
    grep "some message I expect" err

hiding issues like the message being duplicated, or an unexpected
additional error/warning being there, I think it's best just to test_cmp
the full output (inserting relevant OIDs if needed) when possible.

>> +test_expect_success 'git hook run -- stdout and stderr handling' '
>
> I have a slight preference towards "the name of the test tells me
> exactly what is supposed to happen" - which means I'd prefer to see this
> named "stdout and stderr both write to hook's stderr". Too chatty,
> maybe, though.
>
>> +test_expect_success 'git hook run -- out-of-repo runs excluded' '
>> +	write_script .git/hooks/test-hook <<-EOF &&
>> +	echo Test hook
>> +	EOF
>> +
>> +	nongit test_must_fail git hook run test-hook
>
> I wonder if it's necessary to enforce this. I'm just thinking, in a
> config-based hook world later on, it will make sense to allow nongit
> runs - specifically, I'd use the heck out of a send-email hook to fixup
> my In-Reply-To lines, and I always run git-send-email from a nongit dir,
> because I keep all my mailed patches stored away out of repo.
>
> What's the general feeling towards "this is how it works, but we don't
> have a good reason to require it"?

I for one think tests should assert existing behavior, we now have
RUN_SETUP so it would be a bug for thaht not to error, so a test is
asserting it. It doesn't mean that we shouldn't willy-nilly change this
later on if and when it's expected that we have a RUN_SETUP_GENTLY.

>> +test_expect_success 'git -c core.hooksPath=<PATH> hook run' '
>> +	mkdir my-hooks &&
>> +	write_script my-hooks/test-hook <<-EOF &&
>> +	echo Hook ran >>actual
>> +	EOF
>> +
>> +	cat >expect <<-\EOF &&
>> +	Test hook
>> +	Hook ran
>> +	Hook ran
>> +	Hook ran
>> +	Hook ran
>> +	EOF
>
> I'm not sure I like this - collecting multiple runs into one "actual"
> and only comparing it once at the end. Are there other places in the
> codebase that do this?

Yeah, that's some ad-hoc nastyness, would be better if it attributed
specific runs or otherwise asserted the "we expected to run" otherwise.

>> +
>> +	# Test various ways of specifying the path. See also
>> +	# t1350-config-hooks-path.sh
>> +	>actual &&
>> +	git hook run test-hook 2>>actual &&
>> +	git -c core.hooksPath=my-hooks hook run test-hook 2>>actual &&
>> +	git -c core.hooksPath=my-hooks/ hook run test-hook 2>>actual &&
>> +	git -c core.hooksPath="$PWD/my-hooks" hook run test-hook 2>>actual &&
>> +	git -c core.hooksPath="$PWD/my-hooks/" hook run test-hook 2>>actual &&
>> +	test_cmp expect actual
>> +'
>> +
>> +test_expect_success 'set up a pre-commit hook in core.hooksPath' '
>> +	>actual &&
>> +	mkdir -p .git/custom-hooks .git/hooks &&
>> +	write_script .git/custom-hooks/pre-commit <<-\EOF &&
>> +	echo CUSTOM >>actual
>> +	EOF
>> +	write_script .git/hooks/pre-commit <<-\EOF
>> +	echo NORMAL >>actual
>> +	EOF
>> +'
>
> Is this setup test a leftover from a later commit?

Hrm, maybe... :)

> Overall, I think I like the direction your reroll is going - I've needed
> some time to process it. Hopefully I'll be able to get through all or
> most of the series this week, but there's a lot going on here, too. I'll
> do what I can. Thanks for the help.

Yeah, will reply to any qusetions etc; and as noted above my initial
goal here was "hey, what about this approach", so if you wanted to pick
this up & run with it...

This particular version of the series is at github.com/avar/git.git's
es-avar/config-based-hooks-3 b.t.w.

Emily Shaffer June 18, 2021, 10:13 p.m. UTC | #3

On Tue, Jun 15, 2021 at 11:36:26AM +0200, Ævar Arnfjörð Bjarmason wrote:
> > Overall, I think I like the direction your reroll is going - I've needed
> > some time to process it. Hopefully I'll be able to get through all or
> > most of the series this week, but there's a lot going on here, too. I'll
> > do what I can. Thanks for the help.
> 
> Yeah, will reply to any qusetions etc; and as noted above my initial
> goal here was "hey, what about this approach", so if you wanted to pick
> this up & run with it...
> 
> This particular version of the series is at github.com/avar/git.git's
> es-avar/config-based-hooks-3 b.t.w.

Have finished scanning through the rest of the series, and I think I
understand your goal a little better - you are not saying "let me take
over and drive this part of the feature set", which is what I thought
initially. Instead, you seem to be saying "let's chop it up this way
instead".

I don't dislike the reorganization, but I do still wonder whether it's
a setback to the progress the original series had made. I guess it is
hard to know - I had thought the original series was pretty much ready
to go in, therefore making "what if we ordered it this way" moot. But it
seems that you disagree.

Anyway, I do hear also that you don't have interest in driving this
subset to completion, and that's fine. Correct me if I'm wrong.

I'll keep thinking on this over the weekend. Thanks for the suggestion.

 - Emily

Ævar Arnfjörð Bjarmason June 20, 2021, 7:30 p.m. UTC | #4

On Fri, Jun 18 2021, Emily Shaffer wrote:

> On Tue, Jun 15, 2021 at 11:36:26AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> > Overall, I think I like the direction your reroll is going - I've needed
>> > some time to process it. Hopefully I'll be able to get through all or
>> > most of the series this week, but there's a lot going on here, too. I'll
>> > do what I can. Thanks for the help.
>> 
>> Yeah, will reply to any qusetions etc; and as noted above my initial
>> goal here was "hey, what about this approach", so if you wanted to pick
>> this up & run with it...
>> 
>> This particular version of the series is at github.com/avar/git.git's
>> es-avar/config-based-hooks-3 b.t.w.
>
> Have finished scanning through the rest of the series, and I think I
> understand your goal a little better - you are not saying "let me take
> over and drive this part of the feature set", which is what I thought
> initially. Instead, you seem to be saying "let's chop it up this way
> instead".

Yes, 30-some patches that both refactor and introduce new behavior are
harder to reason about.

I've also had suggestions about the end-state, but I think whatever we
arrive at doing the scaffolding first without behavior changes makes
sense.

> I don't dislike the reorganization, but I do still wonder whether it's
> a setback to the progress the original series had made. I guess it is
> hard to know - I had thought the original series was pretty much ready
> to go in, therefore making "what if we ordered it this way" moot. But it
> seems that you disagree.

I'm still not sure if I disagree, well, I'm 95% sure I disagree with
some of the end-state, but you never replied to my questions about that:
https://lore.kernel.org/git/87mtv8fww3.fsf@evledaraar.gmail.com/ &
https://lore.kernel.org/git/87lf80l1m6.fsf@evledraar.gmail.com/; So I
don't know for sure, maybe there's things I missed there.

I think since Junio picked up the "base" version of this and it looks
like we're going that way first that's not something we need to discuss
now if you'd like to punt it, but I'd really like to get that cleared up
post-base topic.

In brief summary:

I'm 100% with you on hooks being driven by config, that they aren't is
in hindsight a historical wart. Ditto the parallel execution etc. (which
I'd suggested in an earlier iteration). That's all great.

Where you lose me is the need for having "git hook" be an administrative
interface for it, particularly (as noted in the linked E-Mail) since the
need for that over simply using "git config", or a trivial "git config"
wrapper seems to be fallout from other arbitrary design choices.

I.e. that all the config for a hook needing to be discovered by a
two-pass iteration over the config space (or keeping state), as opposed
to a "hookcfg.<name>.*" (or whatever) prefix.

Maybe that makes sense in the eventual end-state, your series has the
equivalent of "WIP, more will be added later" around that "git hook"
command; but not having the full overview of that I think we can make
simpler inroads into getting us all of the practical featureset we want,
without regretting our choices in command & config interfaces later.

> Anyway, I do hear also that you don't have interest in driving this
> subset to completion, and that's fine. Correct me if I'm wrong.

I submitted a v3 of this (which I forgot to label as such in the
subject) at
https://lore.kernel.org/git/cover-00.27-0000000000-20210617T101216Z-avarab@gmail.com/;
given the timing our E-Mails may have crossed.

But no, I will drive this subset to completion. What I meant with the
"run with it" comment and the earlier reply on v1 of my "base" version
here: https://lore.kernel.org/git/87y2bs7gyc.fsf@evledraar.gmail.com/

... is that I'd be happier if I managed to just convince you that the
more piecemeal approach is better, and something you'd want to pick up &
drive going forward.

I.e. it's still >95% your code, just re-arranged and split into subsets
of your patches. I really did not mean to "steal" it, it's just
something I hacked up one day to see if the more incremental approach
I'd been suggesting (and felt you were either ignoring or were too busy
to address) was something that could work.

Junio C Hamano June 21, 2021, 3:44 a.m. UTC | #5

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I think since Junio picked up the "base" version of this and it looks
> like we're going that way first that's not something we need to discuss
> now if you'd like to punt it, but I'd really like to get that cleared up
> post-base topic.

I queued this 'base' thing only because (1) Emily promised to review
it and I wanted to give her a version of "seen" she can try to build
on the part beyond "refactoring", (2) I wanted to see how the 'base'
looks myself and (3) I wanted to see what possible interactions with
other topics in flight I should anticipate, if we were to decide to
go that route.  Please do not read more than that into what is and
what is not in 'seen'.

Emily Shaffer June 22, 2021, midnight UTC | #6

On Sun, Jun 20, 2021 at 09:30:59PM +0200, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Jun 18 2021, Emily Shaffer wrote:
> Yes, 30-some patches that both refactor and introduce new behavior are
> harder to reason about.
> 
> I've also had suggestions about the end-state, but I think whatever we
> arrive at doing the scaffolding first without behavior changes makes
> sense.
> 
> > I don't dislike the reorganization, but I do still wonder whether it's
> > a setback to the progress the original series had made. I guess it is
> > hard to know - I had thought the original series was pretty much ready
> > to go in, therefore making "what if we ordered it this way" moot. But it
> > seems that you disagree.
> 
> I'm still not sure if I disagree, well, I'm 95% sure I disagree with
> some of the end-state, but you never replied to my questions about that:
> https://lore.kernel.org/git/87mtv8fww3.fsf@evledaraar.gmail.com/ &
> https://lore.kernel.org/git/87lf80l1m6.fsf@evledraar.gmail.com/; So I
> don't know for sure, maybe there's things I missed there.
> 
> I think since Junio picked up the "base" version of this and it looks
> like we're going that way first that's not something we need to discuss
> now if you'd like to punt it, but I'd really like to get that cleared up
> post-base topic.
> 
> In brief summary:
> 
> I'm 100% with you on hooks being driven by config, that they aren't is
> in hindsight a historical wart. Ditto the parallel execution etc. (which
> I'd suggested in an earlier iteration). That's all great.
> 
> Where you lose me is the need for having "git hook" be an administrative
> interface for it, particularly (as noted in the linked E-Mail) since the
> need for that over simply using "git config", or a trivial "git config"
> wrapper seems to be fallout from other arbitrary design choices.
> 
> I.e. that all the config for a hook needing to be discovered by a
> two-pass iteration over the config space (or keeping state), as opposed
> to a "hookcfg.<name>.*" (or whatever) prefix.
> 
> Maybe that makes sense in the eventual end-state, your series has the
> equivalent of "WIP, more will be added later" around that "git hook"
> command; but not having the full overview of that I think we can make
> simpler inroads into getting us all of the practical featureset we want,
> without regretting our choices in command & config interfaces later.
> 
> > Anyway, I do hear also that you don't have interest in driving this
> > subset to completion, and that's fine. Correct me if I'm wrong.
> 
> I submitted a v3 of this (which I forgot to label as such in the
> subject) at
> https://lore.kernel.org/git/cover-00.27-0000000000-20210617T101216Z-avarab@gmail.com/;
> given the timing our E-Mails may have crossed.
> 
> But no, I will drive this subset to completion. What I meant with the
> "run with it" comment and the earlier reply on v1 of my "base" version
> here: https://lore.kernel.org/git/87y2bs7gyc.fsf@evledraar.gmail.com/
> 
> ... is that I'd be happier if I managed to just convince you that the
> more piecemeal approach is better, and something you'd want to pick up &
> drive going forward.
> 
> I.e. it's still >95% your code, just re-arranged and split into subsets
> of your patches. I really did not mean to "steal" it, it's just
> something I hacked up one day to see if the more incremental approach
> I'd been suggesting (and felt you were either ignoring or were too busy
> to address) was something that could work.

Ok. Thanks for clarifying.

Yes, I do like this direction, and I'm pleased you were able to chop it
up in a way where partial submission made sense - I struggled with that,
myself. Yes, I am excited that you want to drive this series :) :) and
will be happy to rebase on top of it.

I'll have a look at the range-diff for v3 this week. Thanks.

 - Emily

Felipe Contreras June 25, 2021, 7:02 p.m. UTC | #7

Emily Shaffer wrote:
> On Tue, Jun 15, 2021 at 11:36:26AM +0200, Ævar Arnfjörð Bjarmason wrote:
> > > Overall, I think I like the direction your reroll is going - I've needed
> > > some time to process it. Hopefully I'll be able to get through all or
> > > most of the series this week, but there's a lot going on here, too. I'll
> > > do what I can. Thanks for the help.
> > 
> > Yeah, will reply to any qusetions etc; and as noted above my initial
> > goal here was "hey, what about this approach", so if you wanted to pick
> > this up & run with it...
> > 
> > This particular version of the series is at github.com/avar/git.git's
> > es-avar/config-based-hooks-3 b.t.w.
> 
> Have finished scanning through the rest of the series, and I think I
> understand your goal a little better - you are not saying "let me take
> over and drive this part of the feature set", which is what I thought
> initially. Instead, you seem to be saying "let's chop it up this way
> instead".

Indeed. In particular Ævar's chopping allowed me to visualize what the
patches were trying to do and it was much easier to review. Step by
step. I don't know about others, but I think it's similar.

> I don't dislike the reorganization, but I do still wonder whether it's
> a setback to the progress the original series had made. I guess it is
> hard to know - I had thought the original series was pretty much ready
> to go in, therefore making "what if we ordered it this way" moot. But it
> seems that you disagree.
> 
> Anyway, I do hear also that you don't have interest in driving this
> subset to completion, and that's fine. Correct me if I'm wrong.

In an open source project nobody "owns" a set of patches, we can all
work on them collaboratively.

> I'll keep thinking on this over the weekend. Thanks for the suggestion.

My suggestion is to not think about it too much. Just find what would be
the next logical step to do on top of Ævar's base and simply do it
(cherry-pick or rebase).

By simply trying it out you would get a much better idea of how the
series could progress to the end-goal you have in mind.

Cheers.

Felipe Contreras June 25, 2021, 7:08 p.m. UTC | #8

Emily Shaffer wrote:
> On Mon, Jun 14, 2021 at 12:32:50PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > 
> > In order to enable hooks to be run as an external process, by a
> > standalone Git command, or by tools which wrap Git, provide an external
> > means to run all configured hook commands for a given hook event.
> 
> From what it says on the box, I'm slightly worried about this patch
> doing too much at once, but let's see... (I think this is also a common
> thing you and I disagree on - how much work to do per commit - so feel
> free to ignore me ;) )

From my cursory look this is the only big patch, and I trust the reason
Ævar made it so big is that he couldn't figure out a way to chop it into
smaller pieces that would somehow work.

> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> 
> Thanks for including attribution - I appreciate it.

It's not really an attribution, since he used your code he is pretty
much obligated to put your s-o-b, see Developer Certificate of Origin

https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin

Junio C Hamano June 29, 2021, 1:12 a.m. UTC | #9

Emily Shaffer <emilyshaffer@google.com> writes:

> Ok. Thanks for clarifying.
>
> Yes, I do like this direction, and I'm pleased you were able to chop it
> up in a way where partial submission made sense - I struggled with that,
> myself. Yes, I am excited that you want to drive this series :) :) and
> will be happy to rebase on top of it.
>
> I'll have a look at the range-diff for v3 this week. Thanks.

Thanks for working together so well.  Very pleased to see
contributors and reviewers working together constructively.

[v2,01/30] hook: add 'run' subcommand

Commit Message

Comments

Patch