[8/9] hook: teach 'hookcmd' config to alias hook scripts

Message ID	20210715232603.3415111-9-emilyshaffer@google.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Date: Thu, 15 Jul 2021 16:26:02 -0700 In-Reply-To: <20210715232603.3415111-1-emilyshaffer@google.com> Message-Id: <20210715232603.3415111-9-emilyshaffer@google.com> Mime-Version: 1.0 References: <20210715232603.3415111-1-emilyshaffer@google.com> Subject: [PATCH 8/9] hook: teach 'hookcmd' config to alias hook scripts From: Emily Shaffer <emilyshaffer@google.com> To: git@vger.kernel.org Cc: Emily Shaffer <emilyshaffer@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	config-based hooks restarted \| expand [0/9] config-based hooks restarted [1/9] hook: run a list of hooks instead [2/9] hook: allow parallel hook execution [3/9] hook: introduce "git hook list" [4/9] hook: treat hookdir hook specially [5/9] hook: allow running non-native hooks [6/9] hook: include hooks from the config [7/9] hook: allow out-of-repo 'git hook' invocations [8/9] hook: teach 'hookcmd' config to alias hook scripts [9/9] hook: implement hookcmd.<name>.skip

Emily Shaffer July 15, 2021, 11:26 p.m. UTC

To enable fine-grained options which apply to a single hook executable,
and to make it easier for a single executable to be run on multiple hook
events, teach "hookcmd.<alias>.config". These can be configured as
follows:

  [hookcmd.my-linter]
    command = ~/my-linter.sh
  [hook.pre-commit]
    command = my-linter

During the config parse, we can attempt to dereference the
'hook.pre-commit.command' string 'my-linter' and check if it matches any
hookcmd names; if so, we can run the command associated with that
hookcmd alias instead.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Documentation/config/hook.txt |  5 +++++
 Documentation/git-hook.txt    | 42 +++++++++++++++++++++++++++++++----
 hook.c                        |  9 +++++++-
 t/t1360-config-based-hooks.sh | 19 ++++++++++++++++
 4 files changed, 70 insertions(+), 5 deletions(-)

Ævar Arnfjörð Bjarmason July 16, 2021, 9:13 a.m. UTC | #1

On Thu, Jul 15 2021, Emily Shaffer wrote:

> To enable fine-grained options which apply to a single hook executable,
> and to make it easier for a single executable to be run on multiple hook
> events, teach "hookcmd.<alias>.config". These can be configured as
> follows:
> [...]
> diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt
> index a97b980cca..5b35170664 100644
> --- a/Documentation/config/hook.txt
> +++ b/Documentation/config/hook.txt
> @@ -3,6 +3,11 @@ hook.<command>.command::
>  	executable on your device, a oneliner for your shell, or the name of a
>  	hookcmd. See linkgit:git-hook[1].
>  
> +hookcmd.<name>.command::
> +	A command to execute during a hook for which <name> has been specified
> +	as a command. This can be an executable on your device or a oneliner for
> +	your shell. See linkgit:git-hook[1].
> +
> [...]
> +Global config
> +----
> +  [hook "post-commit"]
> +    command = "linter"
> +    command = "~/typocheck.sh"
> +
> +  [hookcmd "linter"]
> +    command = "/bin/linter --c"
> +----
> +
> +Local config
> +----
> +  [hook "prepare-commit-msg"]
> +    command = "linter"
> +  [hook "post-commit"]
> +    command = "python ~/run-test-suite.py"
> +----
> +
> +With these configs, you'd then run post-commit hooks in this order:
> +
> +  /bin/linter --c
> +  ~/typocheck.sh
> +  python ~/run-test-suite.py
> +  .git/hooks/post-commit (if present)
> +
> +and prepare-commit-msg hooks in this order:
> +
> +  /bin/linter --c
> +  .git/hooks/prepare-commit-msg (if present)
>  

I still have outstanding feedback on the fundamental design
here. I.e. why is this not:

    hook.<name>.event = post-commit
    hook.<name>.command = <path>

See:

https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/

As noted there I don't see why not, and more complexity results from the
design choice of doing it the way you're proposing. I.e. we can't
discover hooks based on config prefixes, and we end up sticking full FS
paths in config keys.

Emily Shaffer July 22, 2021, 11:31 p.m. UTC | #2

On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Thu, Jul 15 2021, Emily Shaffer wrote:
> 
> > To enable fine-grained options which apply to a single hook executable,
> > and to make it easier for a single executable to be run on multiple hook
> > events, teach "hookcmd.<alias>.config". These can be configured as
> > follows:
> > [...]
> > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt
> > index a97b980cca..5b35170664 100644
> > --- a/Documentation/config/hook.txt
> > +++ b/Documentation/config/hook.txt
> > @@ -3,6 +3,11 @@ hook.<command>.command::
> >  	executable on your device, a oneliner for your shell, or the name of a
> >  	hookcmd. See linkgit:git-hook[1].
> >  
> > +hookcmd.<name>.command::
> > +	A command to execute during a hook for which <name> has been specified
> > +	as a command. This can be an executable on your device or a oneliner for
> > +	your shell. See linkgit:git-hook[1].
> > +
> > [...]
> > +Global config
> > +----
> > +  [hook "post-commit"]
> > +    command = "linter"
> > +    command = "~/typocheck.sh"
> > +
> > +  [hookcmd "linter"]
> > +    command = "/bin/linter --c"
> > +----
> > +
> > +Local config
> > +----
> > +  [hook "prepare-commit-msg"]
> > +    command = "linter"
> > +  [hook "post-commit"]
> > +    command = "python ~/run-test-suite.py"
> > +----
> > +
> > +With these configs, you'd then run post-commit hooks in this order:
> > +
> > +  /bin/linter --c
> > +  ~/typocheck.sh
> > +  python ~/run-test-suite.py
> > +  .git/hooks/post-commit (if present)
> > +
> > +and prepare-commit-msg hooks in this order:
> > +
> > +  /bin/linter --c
> > +  .git/hooks/prepare-commit-msg (if present)
> >  
> 
> I still have outstanding feedback on the fundamental design
> here. I.e. why is this not:
> 
>     hook.<name>.event = post-commit
>     hook.<name>.command = <path>
> 
> See:
> 
> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/
> 
> As noted there I don't see why not, and more complexity results from the
> design choice of doing it the way you're proposing. I.e. we can't
> discover hooks based on config prefixes, and we end up sticking full FS
> paths in config keys.

Interesting. My gut says that it would make it harder to neatly write a
config with the same hook running at the very beginning of one event and
the very end of another, but I'm not sure whether that's a case to
optimize for.

I would also need twice as many lines to run a script I wrote as a hook
- that is, the base case which is probably all most people will need. So
with your proposal I *must* name every hook I want to add. Instead of
"hook.pre-commit.command = ~/check-for-debug-strings" I need to
configure "hook.check-debug-strings.event = pre-commit" and
"hook.check-debug-strings.command = ~/check-for-debug-strings". That's a
little frustrating, and if I never want to skip that check or move it
later or something, it will be extra overhead for no gain for me.

I do agree that your approach bans the regrettably awkward
"hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure
whether or not it's worth it.

To help us visualize the change, here's some common and complicated
tasks and how they look with either schema (let mine be A and yours be
B):

#1 - Add a oneliner (not executing a script)
A:
[hook "post-commit"]
	command = echo post commit
B:
[hook "oneliner"]
	event = post-commit
	command = echo post commit

#2 - Execute a script
A:
[hook "post-commit"]
	command = ~/my-post-commit-hook
B:
[hook "script"]
	event = post-commit
	command = ~/my-post-commit-hook

#3 - Run a handful of scripts in a specific order on one event
A:
[hook "post-commit"]
	command = ~/my-post-commit-1
	command = ~/my-post-commit-2
	command = ~/my-post-commit-3
B:
[hook "script 1"]
	event = post-commit
	command = ~/my-post-commit-1
[hook "script 2"]
	event = post-commit
	command = ~/my-post-commit-2
[hook "script 3"]
	event = post-commit
	command = ~/my-post-commit-3

#4 - Skip a script configured more globally
A:
(original config)
[hook "post-commit"]
	command = /corp/stuff/corp-post-commit
(local config)
[hookcmd "/corp/stuff/corp-post-commit"]
	skip = true
OR,
(original config)
[hookcmd "corp-pc"]
	command = /corp/stuff/corp-post-commit
[hook "post-commit"]
	command = corp-pc
(local config)
[hookcmd "corp-pc"]
	skip = true
B:
(original config)
[hook "corp-pc"]
	event = post-commit
	command = /corp/stuff/corp-post-commit
(local config)
[hook "corp-pc"]
	skip = true

#5 - Execute one script for multiple events
A:
[hookcmd "reusable-hook"]
	command = /long/path/reusable-hook
[hook "post-commit"]
	command = reusable-hook
[hook "pre-commit"]
	command = reusable-hook
[hook "prepare-commit-msg"]
	command = reusable-hook
B:
[hook "reusable-hook"]
	command = /long/path/reusable-hook
	event = post-commit
	event = pre-commit
	event = prepare-commit-msg

#6 - Execute the same script early for one event and late for another
event
A:
(global)
[hookcmd "reusable-hook"]
	command = /long/path/reusable-hook
[hook "pre-commit"]
	command = reusable-hook
(local)
[hook "post-commit"]
	command = other
	command = hooks
	command = reusable-hook

B:
(global)
[hook "reusable-hook"]
	command = /long/path/reusable-hook
	event = pre-commit
(local)
[hook "other"]
	event = post-commit
	command = other
[hook "hooks"]
	event = post-commit
	command = hooks
[hook "reusable-hook"]
	event = reusable-hook


I'm not going to comment on which one I like more yet - I think I will
study this for a while and let others weigh in on their preferences too.
I tried to order the cases from most common to less common. Please feel
free to chime in with more use cases that you think would be handy which
I forgot :)

 - Emily

Ævar Arnfjörð Bjarmason July 23, 2021, 7:41 a.m. UTC | #3

On Thu, Jul 22 2021, Emily Shaffer wrote:

> On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> 
>> On Thu, Jul 15 2021, Emily Shaffer wrote:
>> 
>> > To enable fine-grained options which apply to a single hook executable,
>> > and to make it easier for a single executable to be run on multiple hook
>> > events, teach "hookcmd.<alias>.config". These can be configured as
>> > follows:
>> > [...]
>> > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt
>> > index a97b980cca..5b35170664 100644
>> > --- a/Documentation/config/hook.txt
>> > +++ b/Documentation/config/hook.txt
>> > @@ -3,6 +3,11 @@ hook.<command>.command::
>> >  	executable on your device, a oneliner for your shell, or the name of a
>> >  	hookcmd. See linkgit:git-hook[1].
>> >  
>> > +hookcmd.<name>.command::
>> > +	A command to execute during a hook for which <name> has been specified
>> > +	as a command. This can be an executable on your device or a oneliner for
>> > +	your shell. See linkgit:git-hook[1].
>> > +
>> > [...]
>> > +Global config
>> > +----
>> > +  [hook "post-commit"]
>> > +    command = "linter"
>> > +    command = "~/typocheck.sh"
>> > +
>> > +  [hookcmd "linter"]
>> > +    command = "/bin/linter --c"
>> > +----
>> > +
>> > +Local config
>> > +----
>> > +  [hook "prepare-commit-msg"]
>> > +    command = "linter"
>> > +  [hook "post-commit"]
>> > +    command = "python ~/run-test-suite.py"
>> > +----
>> > +
>> > +With these configs, you'd then run post-commit hooks in this order:
>> > +
>> > +  /bin/linter --c
>> > +  ~/typocheck.sh
>> > +  python ~/run-test-suite.py
>> > +  .git/hooks/post-commit (if present)
>> > +
>> > +and prepare-commit-msg hooks in this order:
>> > +
>> > +  /bin/linter --c
>> > +  .git/hooks/prepare-commit-msg (if present)
>> >  
>> 
>> I still have outstanding feedback on the fundamental design
>> here. I.e. why is this not:
>> 
>>     hook.<name>.event = post-commit
>>     hook.<name>.command = <path>
>> 
>> See:
>> 
>> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/
>> 
>> As noted there I don't see why not, and more complexity results from the
>> design choice of doing it the way you're proposing. I.e. we can't
>> discover hooks based on config prefixes, and we end up sticking full FS
>> paths in config keys.
>
> Interesting. My gut says that it would make it harder to neatly write a
> config with the same hook running at the very beginning of one event and
> the very end of another, but I'm not sure whether that's a case to
> optimize for.
>
> I would also need twice as many lines to run a script I wrote as a hook
> - that is, the base case which is probably all most people will need. So
> with your proposal I *must* name every hook I want to add. Instead of
> "hook.pre-commit.command = ~/check-for-debug-strings" I need to
> configure "hook.check-debug-strings.event = pre-commit" and
> "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a
> little frustrating, and if I never want to skip that check or move it
> later or something, it will be extra overhead for no gain for me.

The gain is that "git hook list" becomes a trivial "git config
-show-origin --show-scope --get-regexp" wrapper.

So the series either doesn't need "git hook list" or such a thing
becomes much less complex, especially given the proposed addition of
other features in the area like "git hook edit", i.e. (quoting the
linked E-Mail):

    As just one example; surely "git config edit <name>" would need to
    run around and find config files to edit, then open them in a loop
    for you, no?

    Which we'd eventually want for "git config" in general with an
    --edit-regexp option or whatever, which brings us (well, at least
    me) back to "then let's just add it to git-config?".

> I do agree that your approach bans the regrettably awkward
> "hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure
> whether or not it's worth it.

That design choice also means that you can't expand the path using "git
config --get --type=path.

We do have that with the "includeIf" construct, but if we can avoid it
we should, it makes it play nicer with other assumptions and features of
the config system.

As noted in the follow-up reply while we don't case normalize the LeVeL"
part of "ThReE.LeVeL.KeY" that's tolower(), which we know isn't a 1=1
mapping on some
FS's. https://lore.kernel.org/git/87y2ebo16v.fsf@evledraar.gmail.com/

> To help us visualize the change, here's some common and complicated
> tasks and how they look with either schema (let mine be A and yours be
> B):

Before diving into that, I'll just say I don't care about the trivial
specifics of how this is done, i.e. the bikeshedding of what the config
keys etc. are named.

Just in (as noted above) design choices here forcing avoidable
complexity in other areas.

> #1 - Add a oneliner (not executing a script)
> A:
> [hook "post-commit"]
> 	command = echo post commit
> B:
> [hook "oneliner"]
> 	event = post-commit
> 	command = echo post commit
> #2 - Execute a script
> A:
> [hook "post-commit"]
> 	command = ~/my-post-commit-hook
> B:
> [hook "script"]
> 	event = post-commit
> 	command = ~/my-post-commit-hook

...

> #3 - Run a handful of scripts in a specific order on one event
> A:
> [hook "post-commit"]
> 	command = ~/my-post-commit-1
> 	command = ~/my-post-commit-2
> 	command = ~/my-post-commit-3
> B:
> [hook "script 1"]
> 	event = post-commit
> 	command = ~/my-post-commit-1
> [hook "script 2"]
> 	event = post-commit
> 	command = ~/my-post-commit-2
> [hook "script 3"]
> 	event = post-commit
> 	command = ~/my-post-commit-3

To reply to all the above, yes, your suggestion comes out ahead in being
less verbose.

But isn't the real difference not in the differing prefixes, i.e. hook.*
and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly
focusing on, i.e. what requires the added complexity.

But that in that in your proposed way of doing it it's:

    hook.<event-or-name>.*

V.s. my suggestion of:

    hook.<name>.*

And thus whenever you have a <event-or-name> that just happens to be a
built-in hook listed in githooks(5) we in (A) implicitly expand config
like:

    hook.post-commit.command = echo foo

To:

    hook.post-commit.command = echo hi
    hook.post-commit.type    = post-commit

But not knowing about "foo" we don't expand:

    hook.foo.command = echo foo

To:

    hook.foo.command = echo hi
    hook.foo.type    = foo # This would be an error, or ignored.

But rather leave "dangling" for the user to later supply the "*.event"
themselves, i.e.:

    hook.foo.command = echo hi
    hook.foo.event = post-commit

And means that you presumably need to detect this case and error about
it, but my proposed model does not:

    hook.post-commit.command = echo hi
    # User error: "*.type" and <event-or-name>" does not match for
    # "hook.*.command"
    hook.post-commit.type    = pre-commit

And furthermore, that if someone in your model were to do:

    hook.verify-commit.command = echo hi

It's "dangling" today, but if a later version of git learns about a
"verify-commit" hook we'll start running it unexpectedly.

Because your design conflates the slot for known hook type names and
user-supplied names.

On balance I think it's better just to always supply two lines per-hook,
but whether we have this proposed shorthand or not is mostly orthogonal
to everything I mentioned in
https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/

I.e. my proposed version could also have it, but thinking about it I
think it's not worth it, we should always use <name>, not
<event-or-name> for the reasons noted if you'll read ahead...

> #4 - Skip a script configured more globally
> A:
> (original config)
> [hook "post-commit"]
> 	command = /corp/stuff/corp-post-commit
> (local config)
> [hookcmd "/corp/stuff/corp-post-commit"]
> 	skip = true
> OR,
> (original config)
> [hookcmd "corp-pc"]
> 	command = /corp/stuff/corp-post-commit
> [hook "post-commit"]
> 	command = corp-pc
> (local config)
> [hookcmd "corp-pc"]
> 	skip = true
> B:
> (original config)
> [hook "corp-pc"]
> 	event = post-commit
> 	command = /corp/stuff/corp-post-commit
> (local config)
> [hook "corp-pc"]
> 	skip = true

...which are among other things that no, my proposed version doesn't
have "skip" at all, see point #3 at
https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/

I.e. I think the "skip" is a thing that falls out of a complexity of
your design that I'm proposing to do away with.

That complexity being that you use <event-or-name> and I use <name>, and
you want to in turn support N number of "*.command" for any
"hook.<event-or-name>".

I'm suggesting we use the typical "last key wins" semantics, if you want
multiple commands for a given hook type you'll just supply multiple
"hook.<name>" sections with the same "hook.*.event = type" in all.

The way to skip hooks in my proposal is:

    hook.<name>.command = true

Or whatever noop command you want on the RHS. In practice we wouldn't
invoke "true", just like we don't invoke "cat" for "PAGER=cat".

But unlike "*.skip" that doesn't require complexity in implementation or
user understanding, it just falls naturally out of the rest of the
model.

> #5 - Execute one script for multiple events
> A:
> [hookcmd "reusable-hook"]
> 	command = /long/path/reusable-hook
> [hook "post-commit"]
> 	command = reusable-hook
> [hook "pre-commit"]
> 	command = reusable-hook
> [hook "prepare-commit-msg"]
> 	command = reusable-hook
> B:
> [hook "reusable-hook"]
> 	command = /long/path/reusable-hook
> 	event = post-commit
> 	event = pre-commit
> 	event = prepare-commit-msg

It's been so long since I wrote the original E-Mail that I'm having to
skim my own summary there & come up with things as I go along, so maybe
this conflicts with something I said earlier.

But no, I don't think I meant that *.event should be multi-value, or
that that magic is worthwhile. I.e. I think we just want:

    [hook "not-a-reusable-section-1"]
        command = /long/path/reusable-hook
        event = post-commit
    [hook "not-a-reusable-section-2"]
        command = /long/path/reusable-hook
        event = pre-commit
    [hook "not-a-reusable-section-3"]
        command = /long/path/reusable-hook
        event = prepare-commit-msg

I.e. is wanting to use the same command for N different hooks really
that common a use-case to avoid a little verbosity, and quite a lot of
complexity?

How would such a hook even be implemented?

We don't have anything in your hook execution model (and I don't
remember you adding this) which passes down a "you are this type of
hook" to hooks.

It's now implicit in that the hook invoked at .git/hooks/post-commit or
.git/hooks/pre-commit can check its own basename, but it won't be with
configurable hooks.

We could start passing down a GIT_HOOK_TYPE in the environment or
whatever, but I think the simpler case of just having the user do it is
better.

I'm assuming that mainly because people have wanted a
"/long/path/reusable-hook" router script because we have not supported
executing N hooks, or supported concurrency. Once we do that the
complexity of such routing scripts (mostly, not everyone's needs will be
met) will be replaced by a little bit of config.

I don't see why it's worth it to micro-optimize for lines in that config
at the cost of increased config complexity.

For example, how do you "skip" an "event" type? You have it for
"*.command"? So let's say in your model (A) I have system config like:

    [hook "my-reusable"]
    command = /some/commit-msg-hook
    event = prepare-commit-msg

Now I want to skip that, easy enough, in my local config:

    [hook "my-reusable"]
    skip = true

Or if I wanted to also invoke it for commit-msg events, in my local
config (which piggy-backs on the system one):

    # Now runs for both prepare-commit-msg and commit-message
    [hook "my-reusable"]
    event = commit-msg

Or, if I want to skip the command name and substitute my own:

    [hook "my-reusable"]
    skip = true
    command = /my/commit-msg-hook

I'm just assuming you mean that to work, as discussed above I think all
this is leading us down the road of too much complexity.

But anyway, now I want to instead not run it as a prepare-commit-msg
hook, but a commit-msg hook.

As far as I can tell you haven't specified that, but I think something
consistent with your model would be

    [hook "my-reusable"]
    event = skip
    event = commit-msg

Or rather, because that conflates "skip" with a hook event type name:

    [hook "my-reusable"]
    skipEvent = true
    event = commit-msg

I.e. the "skip" you have now really means "skipCommand", so we'll need a
skipSomeName to skip other "hook.*.someName".

As noted above I think all of this is too complex, let's just do the
system config like this (exact same as the above):

    [hook "my-reusable"]
    command = /some/commit-msg-hook
    event = prepare-commit-msg

To skip it, in my local config:

    [hook "my-reusable"]
    command = true

And to re-use /some/commit-msg-hook without having to re-include it in
your local config the answer is that we don't support that level of
cleverness. Just do this:

    # First skip the system hook
    [hook "my-reusable"]
    command = true

    # Set up another hook, just copy/pasting the /some/commit-msg-hook command
    [hook "my-local"]
    command = /some/commit-msg-hook
    event = commit-msg

I.e. I think these cases of someone having say a /etc/gitconfig hook on
their system not under their control that they want to not run, *but*
not as the "event" there, but as another event type is too specific a
use-case for us to worry about.

> #6 - Execute the same script early for one event and late for another
> event
> A:
> (global)
> [hookcmd "reusable-hook"]
> 	command = /long/path/reusable-hook
> [hook "pre-commit"]
> 	command = reusable-hook
> (local)
> [hook "post-commit"]
> 	command = other
> 	command = hooks
> 	command = reusable-hook

Even with I think it's fair to say deep knowledge of your proposal at
this point I still needed to read this a few times to see if that:

    command = reusable-hook

Is referring to:

    [hookcmd "reusable-hook"]

I.e. is it going to run:

    command = /long/path/reusable-hook

Or is it just re-specifying /long/path/reusable-hook but relying on a
$PATH lookup?

Having reasoned through that I think the answer is the former. But that
also means that in your model:

    [hookcmd "rm -rf /"]
    command = echo this will not actually destroy your data
    [hook "pre-commit"]
    command = rm -rf /

Is going to run that friendly "echo" command, since "command = rm -rf /"
just refers to the "rm -rf /" <name>, not <command>, right the "hookcmd"
line is removed, at which point we'll stop treating it as a <name> and
run it as a <command>?

In practice I think users are unlikely to use actively harmful names
like that.

I'm just making the point that I should not need to know about previous
config to see if a "hook.pre-commit.command = rm -rf /" is harmless or
not, or need to carefully squint to see if the "reusable-hook" is
referring to a section name or command name.

Or am I just confused at this point?

> B:
> (global)
> [hook "reusable-hook"]
> 	command = /long/path/reusable-hook
> 	event = pre-commit
> (local)
> [hook "other"]
> 	event = post-commit
> 	command = other
> [hook "hooks"]
> 	event = post-commit
> 	command = hooks
> [hook "reusable-hook"]
> 	event = reusable-hook

No, that's not what I'm proposing. I.e. "*.event" is reserved for
built-in events listed in githooks(5) that git itself knows about.

If you say "event = reusable-hook" that's going to be one of ignored,
warned or errored about.

(Probably "ignored" is the best thing to support multi-version setups,
but that's not hook-specific, just how we treat unknown config in
general)

My version of this would be the same as noted with "we don't support
that level of cleverness[...]" above.

I.e. you'd not re-use that "/long/path/reusable-hook", you can skip it
through "command = true", then just copy the relevnt part into your new
config. So:

        (global)
	[hook "reusable-hook"]
		command = /long/path/reusable-hook
		event = pre-commit

	(local)
	[hook "reusable-hook"]
		command = true # skip it

        # The "hooks" name is arbitrary, "my-hooks" or whatever would be
        # clearer, but just going with your example...

	[hook "hooks"]
		event = post-commit
		command = hooks
        # Not very reusable then...   
	[hook "reusable-hook"]
                command = /long/path/reusable-hook
		event = pre-commit

> [...]Please feel free to chime in with more use cases that you think would
> be handy which I forgot :)

I couldn't find this at a quick glance but I think we also had a
disussion at some point about hooks controlling parallelism. AFAICT your
current implementation just has global:

    hook.jobs=N

And we then blacklist certain hooks to always have hook.jobs=1, e.g. the
commit-msg hook that needs an implicit "lock" on that file (or rather,
we think that's the most common use-case).

I think my version of always having hook.<name>.{event,command} be one
value is also better in that case, i.e. we'd then:

    [hook "myhook"]
    command = some-command
    event = pre-receive
    parallel = true # the default

    [hook "myhook2"]
    command = some-command2
    event = pre-receive
    parallel = true # the default

    [hook "myhook3"]
    command = some-unconcurrent-command
    event = pre-receive
    parallel = false # I'm not OK with concurrency

To go along with something like:

    hook
	jobs = 8

Or:

    [hookEvent "pre-receive"]
    jobs = 4

But if you have N numer of "command" in a section it gets murky, does
"parallel = false" then apply to the whole section, but sections can
have one or more values. So we'd need both a
"hookEvent.pre-receive.jobs=N" and per-section config to
control/suppress parallelism?

I haven't thought about it deeply, but have a hunch that having sections
be a 1=1 mapping with a specific command instead of either 1=1 or 1=many
is going to make that easier to implement and understand.

Emily Shaffer Aug. 4, 2021, 8:38 p.m. UTC | #4

On Fri, Jul 23, 2021 at 09:41:35AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Thu, Jul 22 2021, Emily Shaffer wrote:
> 
> > On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> 
> >> 
> >> On Thu, Jul 15 2021, Emily Shaffer wrote:
> >> 
> >> > To enable fine-grained options which apply to a single hook executable,
> >> > and to make it easier for a single executable to be run on multiple hook
> >> > events, teach "hookcmd.<alias>.config". These can be configured as
> >> > follows:
> >> > [...]
> >> > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt
> >> > index a97b980cca..5b35170664 100644
> >> > --- a/Documentation/config/hook.txt
> >> > +++ b/Documentation/config/hook.txt
> >> > @@ -3,6 +3,11 @@ hook.<command>.command::
> >> >  	executable on your device, a oneliner for your shell, or the name of a
> >> >  	hookcmd. See linkgit:git-hook[1].
> >> >  
> >> > +hookcmd.<name>.command::
> >> > +	A command to execute during a hook for which <name> has been specified
> >> > +	as a command. This can be an executable on your device or a oneliner for
> >> > +	your shell. See linkgit:git-hook[1].
> >> > +
> >> > [...]
> >> > +Global config
> >> > +----
> >> > +  [hook "post-commit"]
> >> > +    command = "linter"
> >> > +    command = "~/typocheck.sh"
> >> > +
> >> > +  [hookcmd "linter"]
> >> > +    command = "/bin/linter --c"
> >> > +----
> >> > +
> >> > +Local config
> >> > +----
> >> > +  [hook "prepare-commit-msg"]
> >> > +    command = "linter"
> >> > +  [hook "post-commit"]
> >> > +    command = "python ~/run-test-suite.py"
> >> > +----
> >> > +
> >> > +With these configs, you'd then run post-commit hooks in this order:
> >> > +
> >> > +  /bin/linter --c
> >> > +  ~/typocheck.sh
> >> > +  python ~/run-test-suite.py
> >> > +  .git/hooks/post-commit (if present)
> >> > +
> >> > +and prepare-commit-msg hooks in this order:
> >> > +
> >> > +  /bin/linter --c
> >> > +  .git/hooks/prepare-commit-msg (if present)
> >> >  
> >> 
> >> I still have outstanding feedback on the fundamental design
> >> here. I.e. why is this not:
> >> 
> >>     hook.<name>.event = post-commit
> >>     hook.<name>.command = <path>
> >> 
> >> See:
> >> 
> >> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/
> >> 
> >> As noted there I don't see why not, and more complexity results from the
> >> design choice of doing it the way you're proposing. I.e. we can't
> >> discover hooks based on config prefixes, and we end up sticking full FS
> >> paths in config keys.
> >
> > Interesting. My gut says that it would make it harder to neatly write a
> > config with the same hook running at the very beginning of one event and
> > the very end of another, but I'm not sure whether that's a case to
> > optimize for.
> >
> > I would also need twice as many lines to run a script I wrote as a hook
> > - that is, the base case which is probably all most people will need. So
> > with your proposal I *must* name every hook I want to add. Instead of
> > "hook.pre-commit.command = ~/check-for-debug-strings" I need to
> > configure "hook.check-debug-strings.event = pre-commit" and
> > "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a
> > little frustrating, and if I never want to skip that check or move it
> > later or something, it will be extra overhead for no gain for me.
> 
> The gain is that "git hook list" becomes a trivial "git config
> -show-origin --show-scope --get-regexp" wrapper.

This isn't a very compelling reason to me, if it makes the user
experience worse. (I'm not convinced that it does - just saying I
disagree with this reasoning.)

> 
> So the series either doesn't need "git hook list" or such a thing
> becomes much less complex, especially given the proposed addition of
> other features in the area like "git hook edit", i.e. (quoting the
> linked E-Mail):

I still think "git hook list" is useful to have for end-users, compared
to remembering the appropriate "git config" invocation. It futureproofs
us if we do want to change the execution ordering to something besides
config order, it's easier to remember/more discoverable, etc.

>     As just one example; surely "git config edit <name>" would need to
>     run around and find config files to edit, then open them in a loop
>     for you, no?
> 
>     Which we'd eventually want for "git config" in general with an
>     --edit-regexp option or whatever, which brings us (well, at least
>     me) back to "then let's just add it to git-config?".
> 
> > I do agree that your approach bans the regrettably awkward
> > "hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure
> > whether or not it's worth it.
> 
> That design choice also means that you can't expand the path using "git
> config --get --type=path.

Yeah, this is a pretty handy point.

> 
> We do have that with the "includeIf" construct, but if we can avoid it
> we should, it makes it play nicer with other assumptions and features of
> the config system.
> 
> As noted in the follow-up reply while we don't case normalize the LeVeL"
> part of "ThReE.LeVeL.KeY" that's tolower(), which we know isn't a 1=1
> mapping on some
> FS's. https://lore.kernel.org/git/87y2ebo16v.fsf@evledraar.gmail.com/

This, too.

> 
> > To help us visualize the change, here's some common and complicated
> > tasks and how they look with either schema (let mine be A and yours be
> > B):
> 
> Before diving into that, I'll just say I don't care about the trivial
> specifics of how this is done, i.e. the bikeshedding of what the config
> keys etc. are named.
> 
> Just in (as noted above) design choices here forcing avoidable
> complexity in other areas.
> 
> > #1 - Add a oneliner (not executing a script)
> > A:
> > [hook "post-commit"]
> > 	command = echo post commit
> > B:
> > [hook "oneliner"]
> > 	event = post-commit
> > 	command = echo post commit
> > #2 - Execute a script
> > A:
> > [hook "post-commit"]
> > 	command = ~/my-post-commit-hook
> > B:
> > [hook "script"]
> > 	event = post-commit
> > 	command = ~/my-post-commit-hook
> 
> ...
> 
> > #3 - Run a handful of scripts in a specific order on one event
> > A:
> > [hook "post-commit"]
> > 	command = ~/my-post-commit-1
> > 	command = ~/my-post-commit-2
> > 	command = ~/my-post-commit-3
> > B:
> > [hook "script 1"]
> > 	event = post-commit
> > 	command = ~/my-post-commit-1
> > [hook "script 2"]
> > 	event = post-commit
> > 	command = ~/my-post-commit-2
> > [hook "script 3"]
> > 	event = post-commit
> > 	command = ~/my-post-commit-3
> 
> To reply to all the above, yes, your suggestion comes out ahead in being
> less verbose.
> 
> But isn't the real difference not in the differing prefixes, i.e. hook.*
> and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly
> focusing on, i.e. what requires the added complexity.
> 
> But that in that in your proposed way of doing it it's:
> 
>     hook.<event-or-name>.*
> 
> V.s. my suggestion of:
> 
>     hook.<name>.*
> 

No, I think that's a misunderstanding of proposal A. My proposal A is
always hook.<event>.*, and hookcmd.<name>.*, and the way that I read
your proposal B is that it's always hook.<name>.*.

> And thus whenever you have a <event-or-name> that just happens to be a
> built-in hook listed in githooks(5) we in (A) implicitly expand config
> like:
> 
>     hook.post-commit.command = echo foo
> 
> 
>     hook.post-commit.command = echo hi
>     hook.post-commit.type    = post-commit

This is where I'm becoming confused. I *think* you're saying that with
your proposal B, if the subsection happens to be the name of a built-in
hook, we can skip the .event field. But it also reads like you're saying
this is how my proposal A works, which isn't the case, because the
subsection is always assumed to be the name of the hook event we're
trying to run.

> 
> But not knowing about "foo" we don't expand:
> 
>     hook.foo.command = echo foo
> 
> 
>     hook.foo.command = echo hi
>     hook.foo.type    = foo # This would be an error, or ignored.
> 
> But rather leave "dangling" for the user to later supply the "*.event"
> themselves, i.e.:
> 
>     hook.foo.command = echo hi
>     hook.foo.event = post-commit

If you're talking about proposal B, then no, this isn't how it works.
This config would imply that someone else (like a wrapper) can run "echo
hi" by asking for "git hook run foo". But I'm not sure exactly that
that's what you mean...

> 
> And means that you presumably need to detect this case and error about
> it, but my proposed model does not:
> 
>     hook.post-commit.command = echo hi
>     # User error: "*.type" and <event-or-name>" does not match for
>     # "hook.*.command"
>     hook.post-commit.type    = pre-commit
> 
> And furthermore, that if someone in your model were to do:
> 
>     hook.verify-commit.command = echo hi
> 
> It's "dangling" today, but if a later version of git learns about a
> "verify-commit" hook we'll start running it unexpectedly.

No, that's in fact as designed, with my model B. The user configured
"echo hi" to run on "verify-commit" events; if those events are
initially used by some wrapper, but later we decide they're a great idea
and absorb the verify-commit event into native Git, then this is working
as intended. I think your argument is based on a misunderstanding of the
design.

Could it be that the way I provided the examples (my schema after A: and
your schema after B:) made it hard to parse? Sorry about that if so.

> 
> Because your design conflates the slot for known hook type names and
> user-supplied names.
> 
> On balance I think it's better just to always supply two lines per-hook,
> but whether we have this proposed shorthand or not is mostly orthogonal
> to everything I mentioned in
> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/
> 
> I.e. my proposed version could also have it, but thinking about it I
> think it's not worth it, we should always use <name>, not
> <event-or-name> for the reasons noted if you'll read ahead...
> 
> > #4 - Skip a script configured more globally
> > A:
> > (original config)
> > [hook "post-commit"]
> > 	command = /corp/stuff/corp-post-commit
> > (local config)
> > [hookcmd "/corp/stuff/corp-post-commit"]
> > 	skip = true
> > OR,
> > (original config)
> > [hookcmd "corp-pc"]
> > 	command = /corp/stuff/corp-post-commit
> > [hook "post-commit"]
> > 	command = corp-pc
> > (local config)
> > [hookcmd "corp-pc"]
> > 	skip = true
> > B:
> > (original config)
> > [hook "corp-pc"]
> > 	event = post-commit
> > 	command = /corp/stuff/corp-post-commit
> > (local config)
> > [hook "corp-pc"]
> > 	skip = true
> 
> ...which are among other things that no, my proposed version doesn't
> have "skip" at all, see point #3 at
> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/
> 
> I.e. I think the "skip" is a thing that falls out of a complexity of
> your design that I'm proposing to do away with.
> 
> That complexity being that you use <event-or-name> and I use <name>, and
> you want to in turn support N number of "*.command" for any
> "hook.<event-or-name>".
> 
> I'm suggesting we use the typical "last key wins" semantics, if you want
> multiple commands for a given hook type you'll just supply multiple
> "hook.<name>" sections with the same "hook.*.event = type" in all.
> 
> The way to skip hooks in my proposal is:
> 
>     hook.<name>.command = true

I see - instead of fiddly reorganizing the list, you're just changing
what the specific hook name is redirecting to.

Then, instead of storing the resolved command in the 'struct hook', we
can store the name, and always dereference the name right before
execution, using last-one-wins semantics.

The only thing I don't like about this is that it's a little bit
confusing to the user. I personally would rather have more complicated
code and less complicated user experience, so I like 'skip' more because
it's readable and because you don't need to guess "oh, it'll just
execute /bin/true which is a noop". That's easy to understand for
someone familiar with Unix scripting (and therefore for we Git
contributors), but confusing for everybody else.

Maybe we can take the essential spirit of this implementation but make
it a little bit simpler. In pseudocode:

populate_list(hookname):
  # look for everybody who says "hook.*.event = $hookname"
  hooks = get_configs_matching(hook.*.event, hookname)
  # the "*" bit is the bit we dereference later
  for hook in hooks:
    hook_list.add({.name = hook.key.subsection})

run_hooks(hook_list):
  for hook in hook_list:
    # should we even run?
    if (get_last_config("hook." + hook.name + ".skip"))
      continue

    # dereference the hook name and find out the command
    cmd = get_last_config("hook." + hook.name + ".command")
    run(cmd)

The price for this improved readability is an extra config lookup in
each case. But the code is still readable, definitely moreso than
rewalking the config list over and over as the implementation stands
now. I personally think it's worth it.

(One could also imagine a more readable placeholder than /bin/true in
the "hook.<name>.command" field, but then we get to decide just how hard
we want to avoid colliding with legitimate user hook names. Is .command
= skip enough? is .command = RESERVED_NAME_SKIP_THIS_HOOK enough? Is
something unrunnable like .command = "skip me" enough, although it could
conceivably collide with a determined user?)

> 
> Or whatever noop command you want on the RHS. In practice we wouldn't
> invoke "true", just like we don't invoke "cat" for "PAGER=cat".
> 
> But unlike "*.skip" that doesn't require complexity in implementation or
> user understanding, it just falls naturally out of the rest of the
> model.
> 
> > #5 - Execute one script for multiple events
> > A:
> > [hookcmd "reusable-hook"]
> > 	command = /long/path/reusable-hook
> > [hook "post-commit"]
> > 	command = reusable-hook
> > [hook "pre-commit"]
> > 	command = reusable-hook
> > [hook "prepare-commit-msg"]
> > 	command = reusable-hook
> > B:
> > [hook "reusable-hook"]
> > 	command = /long/path/reusable-hook
> > 	event = post-commit
> > 	event = pre-commit
> > 	event = prepare-commit-msg
> 
> It's been so long since I wrote the original E-Mail that I'm having to
> skim my own summary there & come up with things as I go along, so maybe
> this conflicts with something I said earlier.
> 
> But no, I don't think I meant that *.event should be multi-value, or
> that that magic is worthwhile. I.e. I think we just want:
> 
>     [hook "not-a-reusable-section-1"]
>         command = /long/path/reusable-hook
>         event = post-commit
>     [hook "not-a-reusable-section-2"]
>         command = /long/path/reusable-hook
>         event = pre-commit
>     [hook "not-a-reusable-section-3"]
>         command = /long/path/reusable-hook
>         event = prepare-commit-msg
> 
> I.e. is wanting to use the same command for N different hooks really
> that common a use-case to avoid a little verbosity, and quite a lot of
> complexity?
> 
> How would such a hook even be implemented?
> 
> We don't have anything in your hook execution model (and I don't
> remember you adding this) which passes down a "you are this type of
> hook" to hooks.

No, but it's something I'm interested in passing as an environment
variable. I didn't add it to this series because it seemed to me to
distract from the core feature. We would like to add it to simplify our
invocations of https://github.com/awslabs/git-secrets, so it's on my
radar.

And even without this, just today I wanted to configure a hook to really
make sure I had config.mak set up in all my Git worktrees (since you
pointed out I pushed code that fails -Wall.... ;) ) and set it up to run
on both pre-commit and post-checkout. So yes, it is feasible now, for
very stupid hooks.

> 
> It's now implicit in that the hook invoked at .git/hooks/post-commit or
> .git/hooks/pre-commit can check its own basename, but it won't be with
> configurable hooks.
> 
> We could start passing down a GIT_HOOK_TYPE in the environment or
> whatever, but I think the simpler case of just having the user do it is
> better.
> 
> I'm assuming that mainly because people have wanted a
> "/long/path/reusable-hook" router script because we have not supported
> executing N hooks, or supported concurrency. Once we do that the
> complexity of such routing scripts (mostly, not everyone's needs will be
> met) will be replaced by a little bit of config.

The above linked git-secrets is an example; the config is irritating and
doesn't really need to be there, when we could just teach the script
itself how to check GIT_HOOK_TYPE envvar.

> 
> I don't see why it's worth it to micro-optimize for lines in that config
> at the cost of increased config complexity.
> 
> For example, how do you "skip" an "event" type? You have it for
> "*.command"? So let's say in your model (A) I have system config like:
(The following isn't how my model A works, because it refers always to
"hook.<event>", as I mentioned above. But I also covered this same topic
in detail earlier in this email, so I'm going to snip this section.)

> > #6 - Execute the same script early for one event and late for another
> > event
> > A:
> > (global)
> > [hookcmd "reusable-hook"]
> > 	command = /long/path/reusable-hook
> > [hook "pre-commit"]
> > 	command = reusable-hook
> > (local)
> > [hook "post-commit"]
> > 	command = other
> > 	command = hooks
> > 	command = reusable-hook
> 
> Even with I think it's fair to say deep knowledge of your proposal at
> this point I still needed to read this a few times to see if that:
> 
>     command = reusable-hook
> 
> Is referring to:
> 
>     [hookcmd "reusable-hook"]
> 
> I.e. is it going to run:
> 
>     command = /long/path/reusable-hook
> 
> Or is it just re-specifying /long/path/reusable-hook but relying on a
> $PATH lookup?
> 
> Having reasoned through that I think the answer is the former. But that
> also means that in your model:
> 
>     [hookcmd "rm -rf /"]
>     command = echo this will not actually destroy your data
>     [hook "pre-commit"]
>     command = rm -rf /
> 
> Is going to run that friendly "echo" command, since "command = rm -rf /"
> just refers to the "rm -rf /" <name>, not <command>, right the "hookcmd"
> line is removed, at which point we'll stop treating it as a <name> and
> run it as a <command>?
> 
> In practice I think users are unlikely to use actively harmful names
> like that.
> 
> I'm just making the point that I should not need to know about previous
> config to see if a "hook.pre-commit.command = rm -rf /" is harmless or
> not, or need to carefully squint to see if the "reusable-hook" is
> referring to a section name or command name.
> 
> Or am I just confused at this point?

Nah, that's an accurate understanding of how it would work with the
current implementation. Sure, it's a good point.

> 
> > B:
> > (global)
> > [hook "reusable-hook"]
> > 	command = /long/path/reusable-hook
> > 	event = pre-commit
> > (local)
> > [hook "other"]
> > 	event = post-commit
> > 	command = other
> > [hook "hooks"]
> > 	event = post-commit
> > 	command = hooks
> > [hook "reusable-hook"]
> > 	event = reusable-hook

Hm, I think I mistyped when I wrote "event = reusable-hook" here, and
meant to write "event = post-commit" to indicate that we would move
"reusable-hook" to the bottom of the execution list for "post-commit".

I'll snip most of the rebuttal about my typo.

> 
>         (global)
> 	[hook "reusable-hook"]
> 		command = /long/path/reusable-hook
> 		event = pre-commit
> 
> 	(local)
> 	[hook "reusable-hook"]
> 		command = true # skip it
> 
>         # The "hooks" name is arbitrary, "my-hooks" or whatever would be
>         # clearer, but just going with your example...
> 
> 	[hook "hooks"]
> 		event = post-commit
> 		command = hooks
>         # Not very reusable then...   
> 	[hook "reusable-hook"]
>                 command = /long/path/reusable-hook
> 		event = pre-commit
> 
> 
> > [...]Please feel free to chime in with more use cases that you think would
> > be handy which I forgot :)
> 
> I couldn't find this at a quick glance but I think we also had a
> disussion at some point about hooks controlling parallelism. AFAICT your
> current implementation just has global:
> 
>     hook.jobs=N
> 
> And we then blacklist certain hooks to always have hook.jobs=1, e.g. the
> commit-msg hook that needs an implicit "lock" on that file (or rather,
> we think that's the most common use-case).
> 
> I think my version of always having hook.<name>.{event,command} be one
> value is also better in that case, i.e. we'd then:
> 
>     [hook "myhook"]
>     command = some-command
>     event = pre-receive
>     parallel = true # the default
> 
>     [hook "myhook2"]
>     command = some-command2
>     event = pre-receive
>     parallel = true # the default
> 
>     [hook "myhook3"]
>     command = some-unconcurrent-command
>     event = pre-receive
>     parallel = false # I'm not OK with concurrency

I am not sure what it means for a single executable to write "parallel =
true" - it is a single executable.

Ok, that is me being facetious - I think you are saying we can AND
together all of the 'hook.<thing-with-event-we-care-about>.parallel' to
decide whether or not to run in parallel.

I would rather not discuss this now, for this series, because regardless
of which config schema we use today, we can figure out "parallel unless
we really don't want it" later on. It is too complex to discuss in the
context of "hey, we should also configure hooks somewhere else". Let's
leave it for future work.

> I haven't thought about it deeply, but have a hunch that having sections
> be a 1=1 mapping with a specific command instead of either 1=1 or 1=many
> is going to make that easier to implement and understand.

Sure, probably. But it can be added later to either schema... ;)


Anyway, I did a quick strawpoll with my colleagues (Jonathan Nieder,
Jonathan Tan, and Josh Steadmon) and they all like your syntax more. I
am ambivalent between the two, personally.

So what I will do, hopefully today, maybe not, is the following:

 - no more hookcmd. (and there was much rejoicing)
 - Hooks are specified with "[hook "name-of-hook"]"
 - I do see value in having an explicit .skip field rather than mapping
   .command to a noop, so "hook.name-of-hook.skip" as described above.
   Of course the method you described will work regardless, since its
   mechanism is based on the inherent result of executing /bin/true.
 - I do see value in allowing "hook.name-of-hook.event" to be defined
   repeatably, as described above, so I will include that.

And left for later work:
 - teaching HOOK_EVENT_NAME=post-commit (etc)
 - figuring out what the heck we want to do with allowing hooks to
   describe whether they allow parallelism

Unfortunately I've dropped the design doc from this series, since nobody
seemed interested in having it checked in, but I'll try to rework the
help doc to make the schema more clear.

Thanks, I think it was useful to hash this out.

 - Emily

Jonathan Tan Aug. 4, 2021, 9:49 p.m. UTC | #5

> > I would also need twice as many lines to run a script I wrote as a hook
> > - that is, the base case which is probably all most people will need. So
> > with your proposal I *must* name every hook I want to add. Instead of
> > "hook.pre-commit.command = ~/check-for-debug-strings" I need to
> > configure "hook.check-debug-strings.event = pre-commit" and
> > "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a
> > little frustrating, and if I never want to skip that check or move it
> > later or something, it will be extra overhead for no gain for me.
> 
> The gain is that "git hook list" becomes a trivial "git config
> -show-origin --show-scope --get-regexp" wrapper.

I think that in both schemes, "git hook list" is not difficult to
implement.

> > I do agree that your approach bans the regrettably awkward
> > "hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure
> > whether or not it's worth it.
> 
> That design choice also means that you can't expand the path using "git
> config --get --type=path.

I'm not sure what the use of this is, especially when the more important
position is in "hook.pre-commit.command" (which can be expanded, I
believe).

> As noted in the follow-up reply while we don't case normalize the LeVeL"
> part of "ThReE.LeVeL.KeY" that's tolower(), which we know isn't a 1=1
> mapping on some
> FS's. https://lore.kernel.org/git/87y2ebo16v.fsf@evledraar.gmail.com/

We're comparing strings in the config file, though, not strings against
what's on the filesystem.

> To reply to all the above, yes, your suggestion comes out ahead in being
> less verbose.
> 
> But isn't the real difference not in the differing prefixes, i.e. hook.*
> and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly
> focusing on, i.e. what requires the added complexity.

OK, this is a good point - it may be confusing for the user to remember
the difference between "hook" and "hookcmd", and your scheme eliminates
that.

> But that in that in your proposed way of doing it it's:
> 
>     hook.<event-or-name>.*
> 
> V.s. my suggestion of:
> 
>     hook.<name>.*

[snip discussion of how hook.<event-or-name> is bad]

I think you were thinking that someone would say "let's unify 'hook' and
'hookcmd' then" in response to you saying "'hook' vs 'hookcmd' is bad",
and you're arguing against this new point. I don't think anyone is
saying that, though.

> That complexity being that you use <event-or-name> and I use <name>, and
> you want to in turn support N number of "*.command" for any
> "hook.<event-or-name>".

I don't know if anyone is wanting to support this.

> The way to skip hooks in my proposal is:
> 
>     hook.<name>.command = true
> 
> Or whatever noop command you want on the RHS. In practice we wouldn't
> invoke "true", just like we don't invoke "cat" for "PAGER=cat".
> 
> But unlike "*.skip" that doesn't require complexity in implementation or
> user understanding, it just falls naturally out of the rest of the
> model.

This is a good point.

> Even with I think it's fair to say deep knowledge of your proposal at
> this point I still needed to read this a few times to see if that:
> 
>     command = reusable-hook
> 
> Is referring to:
> 
>     [hookcmd "reusable-hook"]
> 
> I.e. is it going to run:
> 
>     command = /long/path/reusable-hook
> 
> Or is it just re-specifying /long/path/reusable-hook but relying on a
> $PATH lookup?

This is true - seeing "hook.post-commit.command = reusable-hook", we
need to look at the rest of the config to see how it is interpreted.

Going back to the central idea, though, I think that the main advantage
of the scheme Emily proposed is the ability to write, more concisely:

  [hook.pre-commit]
  command = /path/to/command-1
  command = /path/to/command-2

instead of

  [hook.command-1]
  event = pre-commit
  command = /path/to/command-1
  [hook.command-2]
  event = pre-commit
  command = /path/to/command-2

But Ævar's scheme gives us the advantage that if need to do anything
more complicated (even merely slightly more complicated - for example,
skipping a hook or overriding the command of a hook), it would be hard
to write (and even to just figure it out) in Emily's scheme. In Ævar's
scheme, just following the standard "last config wins" rule (and knowing
about /usr/bin/true) can already do a lot. For this reason, I think it's
worth considering Ævar's scheme - writing 3 lines instead of 2 is not
much more difficult to teach and to do.

Ævar Arnfjörð Bjarmason Aug. 5, 2021, 12:17 a.m. UTC | #6

On Wed, Aug 04 2021, Emily Shaffer wrote:

> On Fri, Jul 23, 2021 at 09:41:35AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>> 
>> On Thu, Jul 22 2021, Emily Shaffer wrote:
>> 
>> > On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >> 
>> >> 
>> >> On Thu, Jul 15 2021, Emily Shaffer wrote:
>> >> 
>> >> > To enable fine-grained options which apply to a single hook executable,
>> >> > and to make it easier for a single executable to be run on multiple hook
>> >> > events, teach "hookcmd.<alias>.config". These can be configured as
>> >> > follows:
>> >> > [...]
>> >> > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt
>> >> > index a97b980cca..5b35170664 100644
>> >> > --- a/Documentation/config/hook.txt
>> >> > +++ b/Documentation/config/hook.txt
>> >> > @@ -3,6 +3,11 @@ hook.<command>.command::
>> >> >  	executable on your device, a oneliner for your shell, or the name of a
>> >> >  	hookcmd. See linkgit:git-hook[1].
>> >> >  
>> >> > +hookcmd.<name>.command::
>> >> > +	A command to execute during a hook for which <name> has been specified
>> >> > +	as a command. This can be an executable on your device or a oneliner for
>> >> > +	your shell. See linkgit:git-hook[1].
>> >> > +
>> >> > [...]
>> >> > +Global config
>> >> > +----
>> >> > +  [hook "post-commit"]
>> >> > +    command = "linter"
>> >> > +    command = "~/typocheck.sh"
>> >> > +
>> >> > +  [hookcmd "linter"]
>> >> > +    command = "/bin/linter --c"
>> >> > +----
>> >> > +
>> >> > +Local config
>> >> > +----
>> >> > +  [hook "prepare-commit-msg"]
>> >> > +    command = "linter"
>> >> > +  [hook "post-commit"]
>> >> > +    command = "python ~/run-test-suite.py"
>> >> > +----
>> >> > +
>> >> > +With these configs, you'd then run post-commit hooks in this order:
>> >> > +
>> >> > +  /bin/linter --c
>> >> > +  ~/typocheck.sh
>> >> > +  python ~/run-test-suite.py
>> >> > +  .git/hooks/post-commit (if present)
>> >> > +
>> >> > +and prepare-commit-msg hooks in this order:
>> >> > +
>> >> > +  /bin/linter --c
>> >> > +  .git/hooks/prepare-commit-msg (if present)
>> >> >  
>> >> 
>> >> I still have outstanding feedback on the fundamental design
>> >> here. I.e. why is this not:
>> >> 
>> >>     hook.<name>.event = post-commit
>> >>     hook.<name>.command = <path>
>> >> 
>> >> See:
>> >> 
>> >> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/
>> >> 
>> >> As noted there I don't see why not, and more complexity results from the
>> >> design choice of doing it the way you're proposing. I.e. we can't
>> >> discover hooks based on config prefixes, and we end up sticking full FS
>> >> paths in config keys.
>> >
>> > Interesting. My gut says that it would make it harder to neatly write a
>> > config with the same hook running at the very beginning of one event and
>> > the very end of another, but I'm not sure whether that's a case to
>> > optimize for.
>> >
>> > I would also need twice as many lines to run a script I wrote as a hook
>> > - that is, the base case which is probably all most people will need. So
>> > with your proposal I *must* name every hook I want to add. Instead of
>> > "hook.pre-commit.command = ~/check-for-debug-strings" I need to
>> > configure "hook.check-debug-strings.event = pre-commit" and
>> > "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a
>> > little frustrating, and if I never want to skip that check or move it
>> > later or something, it will be extra overhead for no gain for me.
>> 
>> The gain is that "git hook list" becomes a trivial "git config
>> -show-origin --show-scope --get-regexp" wrapper.
>
> This isn't a very compelling reason to me, if it makes the user
> experience worse. (I'm not convinced that it does - just saying I
> disagree with this reasoning.)

...(see below)...

>> 
>> So the series either doesn't need "git hook list" or such a thing
>> becomes much less complex, especially given the proposed addition of
>> other features in the area like "git hook edit", i.e. (quoting the
>> linked E-Mail):
>
> I still think "git hook list" is useful to have for end-users, compared
> to remembering the appropriate "git config" invocation. It futureproofs
> us if we do want to change the execution ordering to something besides
> config order, it's easier to remember/more discoverable, etc.

To clarify: I'm not against there being a "git hook list", I think we
should have it.

It's useful UI. I.e. we have "run", it makes sense to have the same
mechanism spew out its idea of what hooks are where, especially since
that's not entirely contained in the config, but a union of config and
GIT_DIR/hooks (and then there's core.hooksPath...).

So I'm very much for it being there.

I've just been commenting on the relative complexity of the config
schema, which in one way surfaces as not being able to do the config
part of the "git hook list" as a simple shell-out (or equivalent library
invocation) of "git config" under the hood, but also means that users
need to mentally track a more complex format when e.g. manually editing
config...

>> [...]
>> But isn't the real difference not in the differing prefixes, i.e. hook.*
>> and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly
>> focusing on, i.e. what requires the added complexity.
>> 
>> But that in that in your proposed way of doing it it's:
>> 
>>     hook.<event-or-name>.*
>> 
>> V.s. my suggestion of:
>> 
>>     hook.<name>.*
>> 
>
> No, I think that's a misunderstanding of proposal A. My proposal A is
> always hook.<event>.*, and hookcmd.<name>.*, and the way that I read
> your proposal B is that it's always hook.<name>.*.

Ah, thanks.

>> And thus whenever you have a <event-or-name> that just happens to be a
>> built-in hook listed in githooks(5) we in (A) implicitly expand config
>> like:
>> 
>>     hook.post-commit.command = echo foo
>> 
>> 
>>     hook.post-commit.command = echo hi
>>     hook.post-commit.type    = post-commit
>
> This is where I'm becoming confused. I *think* you're saying that with
> your proposal B, if the subsection happens to be the name of a built-in
> hook, we can skip the .event field. But it also reads like you're saying
> this is how my proposal A works, which isn't the case, because the
> subsection is always assumed to be the name of the hook event we're
> trying to run.

I *think* this is best addressed below, moving on...

>> 
>> But not knowing about "foo" we don't expand:
>> 
>>     hook.foo.command = echo foo
>> 
>> 
>>     hook.foo.command = echo hi
>>     hook.foo.type    = foo # This would be an error, or ignored.
>> 
>> But rather leave "dangling" for the user to later supply the "*.event"
>> themselves, i.e.:
>> 
>>     hook.foo.command = echo hi
>>     hook.foo.event = post-commit
>
> If you're talking about proposal B, then no, this isn't how it works.
> This config would imply that someone else (like a wrapper) can run "echo
> hi" by asking for "git hook run foo". But I'm not sure exactly that
> that's what you mean...

...and on (maybe)....

>> 
>> And means that you presumably need to detect this case and error about
>> it, but my proposed model does not:
>> 
>>     hook.post-commit.command = echo hi
>>     # User error: "*.type" and <event-or-name>" does not match for
>>     # "hook.*.command"
>>     hook.post-commit.type    = pre-commit
>> 
>> And furthermore, that if someone in your model were to do:
>> 
>>     hook.verify-commit.command = echo hi
>> 
>> It's "dangling" today, but if a later version of git learns about a
>> "verify-commit" hook we'll start running it unexpectedly.
>
> No, that's in fact as designed, with my model B. The user configured
> "echo hi" to run on "verify-commit" events; if those events are
> initially used by some wrapper, but later we decide they're a great idea
> and absorb the verify-commit event into native Git, then this is working
> as intended. I think your argument is based on a misunderstanding of the
> design.
>
> Could it be that the way I provided the examples (my schema after A: and
> your schema after B:) made it hard to parse? Sorry about that if so.

Aren't you assuming that users who specify a verify-commit will be happy
because git's usurping of that will 1=1 match what they were using
"verify-commit" for.

I'm pointing out that we can't know that, and since you want to make
"git hook run" a general thing that runs any <name> of script you've
configured, and not just what's in githooks(5) that it becomes very
likely that if we add a new hook with some obvious name that we'll
either break things for users, or subtly change behavior.

Which isn't just theoretical, e.g. I tend to run something like a "git
log --check @{u}.." before I run git-send-email, with this configurable
hook mechanism having a "git hook run sendemail-check" would be a way I
might expose that in my own ~/.gitconfig.

But if git-send-email learns a "sendemail-check" and the behavior
doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects
me to return a status code to ask me if I want to abort on a failed
--check, but mine expects a revision range to run "log --check".

In practice that's a non-issue with the current hook mechanism,
i.e. nobody's sticking a script into .git/hooks/my-custom-name and
expecting it to do anything useful (and if they are, they have only
themselves to blame).

Whereas we'd now be actively inviting users to squat on the same
namespace we ourselves will need for future hooks.

>> 
>> Because your design conflates the slot for known hook type names and
>> user-supplied names.
>> 
>> On balance I think it's better just to always supply two lines per-hook,
>> but whether we have this proposed shorthand or not is mostly orthogonal
>> to everything I mentioned in
>> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/
>> 
>> I.e. my proposed version could also have it, but thinking about it I
>> think it's not worth it, we should always use <name>, not
>> <event-or-name> for the reasons noted if you'll read ahead...
>> 
>> > #4 - Skip a script configured more globally
>> > A:
>> > (original config)
>> > [hook "post-commit"]
>> > 	command = /corp/stuff/corp-post-commit
>> > (local config)
>> > [hookcmd "/corp/stuff/corp-post-commit"]
>> > 	skip = true
>> > OR,
>> > (original config)
>> > [hookcmd "corp-pc"]
>> > 	command = /corp/stuff/corp-post-commit
>> > [hook "post-commit"]
>> > 	command = corp-pc
>> > (local config)
>> > [hookcmd "corp-pc"]
>> > 	skip = true
>> > B:
>> > (original config)
>> > [hook "corp-pc"]
>> > 	event = post-commit
>> > 	command = /corp/stuff/corp-post-commit
>> > (local config)
>> > [hook "corp-pc"]
>> > 	skip = true
>> 
>> ...which are among other things that no, my proposed version doesn't
>> have "skip" at all, see point #3 at
>> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/
>> 
>> I.e. I think the "skip" is a thing that falls out of a complexity of
>> your design that I'm proposing to do away with.
>> 
>> That complexity being that you use <event-or-name> and I use <name>, and
>> you want to in turn support N number of "*.command" for any
>> "hook.<event-or-name>".
>> 
>> I'm suggesting we use the typical "last key wins" semantics, if you want
>> multiple commands for a given hook type you'll just supply multiple
>> "hook.<name>" sections with the same "hook.*.event = type" in all.
>> 
>> The way to skip hooks in my proposal is:
>> 
>>     hook.<name>.command = true
>
> I see - instead of fiddly reorganizing the list, you're just changing
> what the specific hook name is redirecting to.
>
> Then, instead of storing the resolved command in the 'struct hook', we
> can store the name, and always dereference the name right before
> execution, using last-one-wins semantics.
>
> The only thing I don't like about this is that it's a little bit
> confusing to the user. I personally would rather have more complicated
> code and less complicated user experience, so I like 'skip' more because
> it's readable and because you don't need to guess "oh, it'll just
> execute /bin/true which is a noop". That's easy to understand for
> someone familiar with Unix scripting (and therefore for we Git
> contributors), but confusing for everybody else.
>
> Maybe we can take the essential spirit of this implementation but make
> it a little bit simpler. In pseudocode:
>
> populate_list(hookname):
>   # look for everybody who says "hook.*.event = $hookname"
>   hooks = get_configs_matching(hook.*.event, hookname)
>   # the "*" bit is the bit we dereference later
>   for hook in hooks:
>     hook_list.add({.name = hook.key.subsection})
>
> run_hooks(hook_list):
>   for hook in hook_list:
>     # should we even run?
>     if (get_last_config("hook." + hook.name + ".skip"))
>       continue
>
>     # dereference the hook name and find out the command
>     cmd = get_last_config("hook." + hook.name + ".command")
>     run(cmd)
>
> The price for this improved readability is an extra config lookup in
> each case. But the code is still readable, definitely moreso than
> rewalking the config list over and over as the implementation stands
> now. I personally think it's worth it.

I think the less complicated user experience is for our config system to
work as consistently as possible with other config. If we define special
rules and what's effectively special syntax with "skip" etc. we're
asking users to read how that works, and keep that all in their head
just to deal with that one area of our config.

> (One could also imagine a more readable placeholder than /bin/true in
> the "hook.<name>.command" field, but then we get to decide just how hard
> we want to avoid colliding with legitimate user hook names. Is .command
> = skip enough? is .command = RESERVED_NAME_SKIP_THIS_HOOK enough? Is
> something unrunnable like .command = "skip me" enough, although it could
> conceivably collide with a determined user?)

All of that's something you'll need to explain in detail to users, which
all seems way more complex than a simple:

    To skip a previously defined hook insert a noop-command, any will
    do, but setting it to "true" (usually /bin/true) is a handy
    convention for doing nothing.

I.e. by keeping the config field as doing one thing only you avoid any
such collisions etc.

>> 
>> Or whatever noop command you want on the RHS. In practice we wouldn't
>> invoke "true", just like we don't invoke "cat" for "PAGER=cat".
>> 
>> But unlike "*.skip" that doesn't require complexity in implementation or
>> user understanding, it just falls naturally out of the rest of the
>> model.
>> 
>> > #5 - Execute one script for multiple events
>> > A:
>> > [hookcmd "reusable-hook"]
>> > 	command = /long/path/reusable-hook
>> > [hook "post-commit"]
>> > 	command = reusable-hook
>> > [hook "pre-commit"]
>> > 	command = reusable-hook
>> > [hook "prepare-commit-msg"]
>> > 	command = reusable-hook
>> > B:
>> > [hook "reusable-hook"]
>> > 	command = /long/path/reusable-hook
>> > 	event = post-commit
>> > 	event = pre-commit
>> > 	event = prepare-commit-msg
>> 
>> It's been so long since I wrote the original E-Mail that I'm having to
>> skim my own summary there & come up with things as I go along, so maybe
>> this conflicts with something I said earlier.
>> 
>> But no, I don't think I meant that *.event should be multi-value, or
>> that that magic is worthwhile. I.e. I think we just want:
>> 
>>     [hook "not-a-reusable-section-1"]
>>         command = /long/path/reusable-hook
>>         event = post-commit
>>     [hook "not-a-reusable-section-2"]
>>         command = /long/path/reusable-hook
>>         event = pre-commit
>>     [hook "not-a-reusable-section-3"]
>>         command = /long/path/reusable-hook
>>         event = prepare-commit-msg
>> 
>> I.e. is wanting to use the same command for N different hooks really
>> that common a use-case to avoid a little verbosity, and quite a lot of
>> complexity?
>> 
>> How would such a hook even be implemented?
>> 
>> We don't have anything in your hook execution model (and I don't
>> remember you adding this) which passes down a "you are this type of
>> hook" to hooks.
>
> No, but it's something I'm interested in passing as an environment
> variable. I didn't add it to this series because it seemed to me to
> distract from the core feature. We would like to add it to simplify our
> invocations of https://github.com/awslabs/git-secrets, so it's on my
> radar.

Having such an env var as part of the initial series seems like a
sensible thing to have.

> And even without this, just today I wanted to configure a hook to really
> make sure I had config.mak set up in all my Git worktrees (since you
> pointed out I pushed code that fails -Wall.... ;) ) and set it up to run
> on both pre-commit and post-checkout. So yes, it is feasible now, for
> very stupid hooks.

Aside: It's nice to use DEVELOPER=1 in config.mak (or as make param)
when hacking on git, it'll set -Wall -Werror and other nice warning
flags.

> [...]
>> > [...]Please feel free to chime in with more use cases that you think would
>> > be handy which I forgot :)
>> 
>> I couldn't find this at a quick glance but I think we also had a
>> disussion at some point about hooks controlling parallelism. AFAICT your
>> current implementation just has global:
>> 
>>     hook.jobs=N
>> 
>> And we then blacklist certain hooks to always have hook.jobs=1, e.g. the
>> commit-msg hook that needs an implicit "lock" on that file (or rather,
>> we think that's the most common use-case).
>> 
>> I think my version of always having hook.<name>.{event,command} be one
>> value is also better in that case, i.e. we'd then:
>> 
>>     [hook "myhook"]
>>     command = some-command
>>     event = pre-receive
>>     parallel = true # the default
>> 
>>     [hook "myhook2"]
>>     command = some-command2
>>     event = pre-receive
>>     parallel = true # the default
>> 
>>     [hook "myhook3"]
>>     command = some-unconcurrent-command
>>     event = pre-receive
>>     parallel = false # I'm not OK with concurrency
>
> I am not sure what it means for a single executable to write "parallel =
> true" - it is a single executable.
>
> Ok, that is me being facetious - I think you are saying we can AND
> together all of the 'hook.<thing-with-event-we-care-about>.parallel' to
> decide whether or not to run in parallel.

Right, the case (whatever the config mechanism) wanting to use several
off-the-shelf hooks and accomplish through git some version of this:

    parallel -j8 pre-receive-parallel-*.sh &&
    parallel -j1 pre-receive-non-parallel-*.sh

I.e. since we have N scripts for the "pre-receive" type, and we're
expecting to say whether on not parallelism is OK or not, it seems like
a natural thing we'll want to declare that differently for some of those
than for others.

> I would rather not discuss this now, for this series, because regardless
> of which config schema we use today, we can figure out "parallel unless
> we really don't want it" later on. It is too complex to discuss in the
> context of "hey, we should also configure hooks somewhere else". Let's
> leave it for future work.

The point is that no, we really can't figure it out as easily later on
regardless of the config schema.

Because with 1=many you can't have 1=many.someAttribute=XYZ without that
*.someAttribute=XYZ declaring something for all of 1=many, whereas if
it's 1=1 then 1=1.someAttribute.XYZ obviously applies only to that 1=1.

>> I haven't thought about it deeply, but have a hunch that having sections
>> be a 1=1 mapping with a specific command instead of either 1=1 or 1=many
>> is going to make that easier to implement and understand.
>
> Sure, probably. But it can be added later to either schema... ;)
>
>
> Anyway, I did a quick strawpoll with my colleagues (Jonathan Nieder,
> Jonathan Tan, and Josh Steadmon) and they all like your syntax more. I
> am ambivalent between the two, personally.
>
> So what I will do, hopefully today, maybe not, is the following:
>
>  - no more hookcmd. (and there was much rejoicing)
>  - Hooks are specified with "[hook "name-of-hook"]"

Thanks, I look forward to checking that out. As should be obvious from
my misunderstanding of some of your config proposal (I snippet out some
of those from the reply) I'm still not entirely clear on even what the
current proposed behavior is, hopefully something simpler will be easier
to grok ... :)

>  - I do see value in having an explicit .skip field rather than mapping
>    .command to a noop, so "hook.name-of-hook.skip" as described above.
>    Of course the method you described will work regardless, since its
>    mechanism is based on the inherent result of executing /bin/true.

I think we've mainly focused on the theoretical aspect of this, but FWIW
I'm still entirely unclear on what this feature is even aimed for.

All of the rest of our config does not have an explicit "skip" for
anything, just last-set-wins. In terms of a real-world use-case wouldn't
a user just edit or comment out the config earlier in ~/.gitconfig, and
not "skip" it at the end with "git config [...] --add"?

I suspect that the use-case is some Googly centrally managed
/etc/gitconfig, but that's just speculation...

>  - I do see value in allowing "hook.name-of-hook.event" to be defined
>    repeatably, as described above, so I will include that.
> [...]
>  - figuring out what the heck we want to do with allowing hooks to
>    describe whether they allow parallelism

Which just to check if we're on the same page, needs to be figured out
because of the complexity of that "defined repeatably", no?

> Unfortunately I've dropped the design doc from this series, since nobody
> seemed interested in having it checked in, but I'll try to rework the
> help doc to make the schema more clear.

FWIW I wasn't opposed to it per-se, but remembered commenting on it in
earlier rounds, and it being some combination of docs that should really
be in manpages (i.e. describing current actual behavior), more general
rational and "why is it like this" (which correctly belongs in design
docs), and musings about hypothetical future features (e.g. "git hook
edit" and the like)'.

Emily Shaffer Aug. 5, 2021, 9:45 p.m. UTC | #7

On Thu, Aug 05, 2021 at 02:17:29AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> The gain is that "git hook list" becomes a trivial "git config
> >> -show-origin --show-scope --get-regexp" wrapper.
> >
> > This isn't a very compelling reason to me, if it makes the user
> > experience worse. (I'm not convinced that it does - just saying I
> > disagree with this reasoning.)
> 
> ...(see below)...
> 
> >> 
> >> So the series either doesn't need "git hook list" or such a thing
> >> becomes much less complex, especially given the proposed addition of
> >> other features in the area like "git hook edit", i.e. (quoting the
> >> linked E-Mail):
> >
> > I still think "git hook list" is useful to have for end-users, compared
> > to remembering the appropriate "git config" invocation. It futureproofs
> > us if we do want to change the execution ordering to something besides
> > config order, it's easier to remember/more discoverable, etc.
> 
> To clarify: I'm not against there being a "git hook list", I think we
> should have it.
> 
> It's useful UI. I.e. we have "run", it makes sense to have the same
> mechanism spew out its idea of what hooks are where, especially since
> that's not entirely contained in the config, but a union of config and
> GIT_DIR/hooks (and then there's core.hooksPath...).
> 
> So I'm very much for it being there.
> 
> I've just been commenting on the relative complexity of the config
> schema, which in one way surfaces as not being able to do the config
> part of the "git hook list" as a simple shell-out (or equivalent library
> invocation) of "git config" under the hood, but also means that users
> need to mentally track a more complex format when e.g. manually editing
> config...

Sure, makes sense.

> >> 
> >> And means that you presumably need to detect this case and error about
> >> it, but my proposed model does not:
> >> 
> >>     hook.post-commit.command = echo hi
> >>     # User error: "*.type" and <event-or-name>" does not match for
> >>     # "hook.*.command"
> >>     hook.post-commit.type    = pre-commit
> >> 
> >> And furthermore, that if someone in your model were to do:
> >> 
> >>     hook.verify-commit.command = echo hi
> >> 
> >> It's "dangling" today, but if a later version of git learns about a
> >> "verify-commit" hook we'll start running it unexpectedly.
> >
> > No, that's in fact as designed, with my model B. The user configured
> > "echo hi" to run on "verify-commit" events; if those events are
> > initially used by some wrapper, but later we decide they're a great idea
> > and absorb the verify-commit event into native Git, then this is working
> > as intended. I think your argument is based on a misunderstanding of the
> > design.
> >
> > Could it be that the way I provided the examples (my schema after A: and
> > your schema after B:) made it hard to parse? Sorry about that if so.
> 
> Aren't you assuming that users who specify a verify-commit will be happy
> because git's usurping of that will 1=1 match what they were using
> "verify-commit" for.
> 
> I'm pointing out that we can't know that, and since you want to make
> "git hook run" a general thing that runs any <name> of script you've
> configured, and not just what's in githooks(5) that it becomes very
> likely that if we add a new hook with some obvious name that we'll
> either break things for users, or subtly change behavior.
> 
> Which isn't just theoretical, e.g. I tend to run something like a "git
> log --check @{u}.." before I run git-send-email, with this configurable
> hook mechanism having a "git hook run sendemail-check" would be a way I
> might expose that in my own ~/.gitconfig.
> 
> But if git-send-email learns a "sendemail-check" and the behavior
> doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects
> me to return a status code to ask me if I want to abort on a failed
> --check, but mine expects a revision range to run "log --check".
> 
> In practice that's a non-issue with the current hook mechanism,
> i.e. nobody's sticking a script into .git/hooks/my-custom-name and
> expecting it to do anything useful (and if they are, they have only
> themselves to blame).
> 
> Whereas we'd now be actively inviting users to squat on the same
> namespace we ourselves will need for future hooks.

Yeah, this is a good point. Seems worth a note in the 'git hook run'
doc, making a point that "you can use this for your wrapper to run
specific hooks, but be careful about namespace collisions". We're a lot
less likely to add a hook named "repotool-verify-commit" than we are to
add a hook named "verify-commit", for example.

I think it's enough to warn about future namespace collisions and make
an "at your own risk" note.

> > Maybe we can take the essential spirit of this implementation but make
> > it a little bit simpler. In pseudocode:
> >
> > populate_list(hookname):
> >   # look for everybody who says "hook.*.event = $hookname"
> >   hooks = get_configs_matching(hook.*.event, hookname)
> >   # the "*" bit is the bit we dereference later
> >   for hook in hooks:
> >     hook_list.add({.name = hook.key.subsection})
> >
> > run_hooks(hook_list):
> >   for hook in hook_list:
> >     # should we even run?
> >     if (get_last_config("hook." + hook.name + ".skip"))
> >       continue
> >
> >     # dereference the hook name and find out the command
> >     cmd = get_last_config("hook." + hook.name + ".command")
> >     run(cmd)
> >
> > The price for this improved readability is an extra config lookup in
> > each case. But the code is still readable, definitely moreso than
> > rewalking the config list over and over as the implementation stands
> > now. I personally think it's worth it.
> 
> I think the less complicated user experience is for our config system to
> work as consistently as possible with other config. If we define special
> rules and what's effectively special syntax with "skip" etc. we're
> asking users to read how that works, and keep that all in their head
> just to deal with that one area of our config.
> 
> > (One could also imagine a more readable placeholder than /bin/true in
> > the "hook.<name>.command" field, but then we get to decide just how hard
> > we want to avoid colliding with legitimate user hook names. Is .command
> > = skip enough? is .command = RESERVED_NAME_SKIP_THIS_HOOK enough? Is
> > something unrunnable like .command = "skip me" enough, although it could
> > conceivably collide with a determined user?)
> 
> All of that's something you'll need to explain in detail to users, which
> all seems way more complex than a simple:
> 
>     To skip a previously defined hook insert a noop-command, any will
>     do, but setting it to "true" (usually /bin/true) is a handy
>     convention for doing nothing.
> 
> I.e. by keeping the config field as doing one thing only you avoid any
> such collisions etc.

I disagree fundamentally that "find and run a noop command like
/bin/true" is simpler to average users than "skip it by setting a
config". Like I said below, by including "skip" both approaches will
work.

> > No, but it's something I'm interested in passing as an environment
> > variable. I didn't add it to this series because it seemed to me to
> > distract from the core feature. We would like to add it to simplify our
> > invocations of https://github.com/awslabs/git-secrets, so it's on my
> > radar.
> 
> Having such an env var as part of the initial series seems like a
> sensible thing to have.

Eh. To me, it feels like feature creep. It also is something we could
add today to the existing hook mechanism (even if it's a little
pointless since you can basename, like you say), so it feels orthogonal.
I would prefer not to add it in this series.

> 
> > And even without this, just today I wanted to configure a hook to really
> > make sure I had config.mak set up in all my Git worktrees (since you
> > pointed out I pushed code that fails -Wall.... ;) ) and set it up to run
> > on both pre-commit and post-checkout. So yes, it is feasible now, for
> > very stupid hooks.
> 
> Aside: It's nice to use DEVELOPER=1 in config.mak (or as make param)
> when hacking on git, it'll set -Wall -Werror and other nice warning
> flags.

This is literally what I was saying I wrote the hook to do, and is also
a tip I included when I wrote MyFirstContribution.txt.

> > I am not sure what it means for a single executable to write "parallel =
> > true" - it is a single executable.
> >
> > Ok, that is me being facetious - I think you are saying we can AND
> > together all of the 'hook.<thing-with-event-we-care-about>.parallel' to
> > decide whether or not to run in parallel.
> 
> Right, the case (whatever the config mechanism) wanting to use several
> off-the-shelf hooks and accomplish through git some version of this:
> 
>     parallel -j8 pre-receive-parallel-*.sh &&
>     parallel -j1 pre-receive-non-parallel-*.sh
> 
> I.e. since we have N scripts for the "pre-receive" type, and we're
> expecting to say whether on not parallelism is OK or not, it seems like
> a natural thing we'll want to declare that differently for some of those
> than for others.
> 
> > I would rather not discuss this now, for this series, because regardless
> > of which config schema we use today, we can figure out "parallel unless
> > we really don't want it" later on. It is too complex to discuss in the
> > context of "hey, we should also configure hooks somewhere else". Let's
> > leave it for future work.
> 
> The point is that no, we really can't figure it out as easily later on
> regardless of the config schema.
> 
> Because with 1=many you can't have 1=many.someAttribute=XYZ without that
> *.someAttribute=XYZ declaring something for all of 1=many, whereas if
> it's 1=1 then 1=1.someAttribute.XYZ obviously applies only to that 1=1.

I think this is moot, since we are moving to "all config hooks have a
name", but my plan previously was to let this be set on a hookcmd.
Essentially, your suggestion is to make every hook a hookcmd. My point
was that it's easy to extend [object which represents an executable] in
the config to include "always run me in series" or "run me in series for
this specific event" regardless. That is, one could imagine, discarding
entirely the hookcmd junk and going with the schema I sketched in my
last email (which lands somewhere between yours and mine):

[hook "linter"]
  command = ~/linter.sh
  event = pre-commit
  parallel = false

or...

[hook "linter"]
  command = ~/linter.sh
  event = pre-commit
  event = commit-msg

[hook "linter.commit-msg"]
  parallel = false

Or even...

[hook "linter"]
  command = ~/linter.sh
  event = pre-commit
  event = commit-msg
  parallel = commit-msg

The possibilities go on, as far as configuration goes.

To me, the harder part of this problem is actually implementing the
execution. We had some discussions at length early on in the
config-based hook series about ways to do this kind of complex "some
stuff needs synchronous execution and some stuff doesn't, in the same
event" and decided that it mostly resolved to "you ain't gonna need it"
principle. So I would prefer to discuss this when we find out we do
actually need it.

> 
> >> I haven't thought about it deeply, but have a hunch that having sections
> >> be a 1=1 mapping with a specific command instead of either 1=1 or 1=many
> >> is going to make that easier to implement and understand.
> >
> > Sure, probably. But it can be added later to either schema... ;)
> >
> >
> > Anyway, I did a quick strawpoll with my colleagues (Jonathan Nieder,
> > Jonathan Tan, and Josh Steadmon) and they all like your syntax more. I
> > am ambivalent between the two, personally.
> >
> > So what I will do, hopefully today, maybe not, is the following:
> >
> >  - no more hookcmd. (and there was much rejoicing)
> >  - Hooks are specified with "[hook "name-of-hook"]"
> 
> Thanks, I look forward to checking that out. As should be obvious from
> my misunderstanding of some of your config proposal (I snippet out some
> of those from the reply) I'm still not entirely clear on even what the
> current proposed behavior is, hopefully something simpler will be easier
> to grok ... :)

Fingers crossed ;)

> 
> >  - I do see value in having an explicit .skip field rather than mapping
> >    .command to a noop, so "hook.name-of-hook.skip" as described above.
> >    Of course the method you described will work regardless, since its
> >    mechanism is based on the inherent result of executing /bin/true.
> 
> I think we've mainly focused on the theoretical aspect of this, but FWIW
> I'm still entirely unclear on what this feature is even aimed for.
> 
> All of the rest of our config does not have an explicit "skip" for
> anything, just last-set-wins. In terms of a real-world use-case wouldn't
> a user just edit or comment out the config earlier in ~/.gitconfig, and
> not "skip" it at the end with "git config [...] --add"?
> 
> I suspect that the use-case is some Googly centrally managed
> /etc/gitconfig, but that's just speculation...

Yep, this is exactly why. We've talked often on-list about how we ship
and configure Git for Googlers, but the upshot is "we pack up 'next' and
also ship an '/etc/gitconfig'".

But I can also think of one really basic scenario when I'd want to skip
a hook in one repo without just commenting out my ~/.gitconfig: the
Gerrit Change-Id hook.

Gerrit requires all commit messages to contain this Change-Id: abc123
footer. It adds the footer by way of a commit-msg hook. That hook works
the same regardless of what your Gerrit remote is, so you can run the
same script on any project that uses Gerrit for code review. If, as I have in
the past, the vast majority of my projects use Gerrit, but I have one
project which does not, then I would love to configure the Gerrit
Change-Id hook globally and un-configure it for my one non-Gerrit
project.

(At that time, I maintained a subsystem in a project based on Yocto, so
I needed to regularly contribute to 5-10 projects, all but one of which
used Gerrit. The one non-Gerrit one used a mailing list. I also had a
hobby project and my dotfiles, neither of which used Gerrit. This is not
an uncommon use case.)

> 
> >  - I do see value in allowing "hook.name-of-hook.event" to be defined
> >    repeatably, as described above, so I will include that.
> > [...]
> >  - figuring out what the heck we want to do with allowing hooks to
> >    describe whether they allow parallelism
> 
> Which just to check if we're on the same page, needs to be figured out
> because of the complexity of that "defined repeatably", no?

As I wrote above, I think this falls under YAGNI, to be honest, and I
don't think we've painted ourselves into a corner if we do end up
needing it. I think the semantical complexity is a harder problem than
the syntactical complexity, and I'd rather just punt it ;)

> 
> > Unfortunately I've dropped the design doc from this series, since nobody
> > seemed interested in having it checked in, but I'll try to rework the
> > help doc to make the schema more clear.
> 
> FWIW I wasn't opposed to it per-se, but remembered commenting on it in
> earlier rounds, and it being some combination of docs that should really
> be in manpages (i.e. describing current actual behavior), more general
> rational and "why is it like this" (which correctly belongs in design
> docs), and musings about hypothetical future features (e.g. "git hook
> edit" and the like)'.

Yeah, that's pretty typical of design docs at places I've worked now and
in the past. I don't have a big problem with not checking it in - those
kinds of docs are most valuable during the early design stage, and so I
think that doc has now served its purpose.

 - Emily

Ævar Arnfjörð Bjarmason Aug. 5, 2021, 10:26 p.m. UTC | #8

On Thu, Aug 05 2021, Emily Shaffer wrote:

> On Thu, Aug 05, 2021 at 02:17:29AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> [...]
>> > No, that's in fact as designed, with my model B. The user configured
>> > "echo hi" to run on "verify-commit" events; if those events are
>> > initially used by some wrapper, but later we decide they're a great idea
>> > and absorb the verify-commit event into native Git, then this is working
>> > as intended. I think your argument is based on a misunderstanding of the
>> > design.
>> >
>> > Could it be that the way I provided the examples (my schema after A: and
>> > your schema after B:) made it hard to parse? Sorry about that if so.
>> 
>> Aren't you assuming that users who specify a verify-commit will be happy
>> because git's usurping of that will 1=1 match what they were using
>> "verify-commit" for.
>> 
>> I'm pointing out that we can't know that, and since you want to make
>> "git hook run" a general thing that runs any <name> of script you've
>> configured, and not just what's in githooks(5) that it becomes very
>> likely that if we add a new hook with some obvious name that we'll
>> either break things for users, or subtly change behavior.
>> 
>> Which isn't just theoretical, e.g. I tend to run something like a "git
>> log --check @{u}.." before I run git-send-email, with this configurable
>> hook mechanism having a "git hook run sendemail-check" would be a way I
>> might expose that in my own ~/.gitconfig.
>> 
>> But if git-send-email learns a "sendemail-check" and the behavior
>> doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects
>> me to return a status code to ask me if I want to abort on a failed
>> --check, but mine expects a revision range to run "log --check".
>> 
>> In practice that's a non-issue with the current hook mechanism,
>> i.e. nobody's sticking a script into .git/hooks/my-custom-name and
>> expecting it to do anything useful (and if they are, they have only
>> themselves to blame).
>> 
>> Whereas we'd now be actively inviting users to squat on the same
>> namespace we ourselves will need for future hooks.
>
> Yeah, this is a good point. Seems worth a note in the 'git hook run'
> doc, making a point that "you can use this for your wrapper to run
> specific hooks, but be careful about namespace collisions". We're a lot
> less likely to add a hook named "repotool-verify-commit" than we are to
> add a hook named "verify-commit", for example.
>
> I think it's enough to warn about future namespace collisions and make
> an "at your own risk" note.

I might have lost track at this point, but later examples in this E-Mail
you show don't seem to require such a note.

I.e. it's only an issue if we conflate a semantically meaningful slot
like "pre-commit" in the config with one that can also have the meaning
of simply defining an arbitrary user-decided name.

There's no such collision if the config uses
e.g. hook.mycheck.event=pre-commit & hook.mycheck.command=mycmd, as
opposed to hook.pre-commit.command=mycmd.

On the specifics of that example: I don't really care about the
bikeshedding of the config key naming specifics, just the semantics of
not putting user defined names and hook type names in the same slot if
we can avoid it.

>> > No, but it's something I'm interested in passing as an environment
>> > variable. I didn't add it to this series because it seemed to me to
>> > distract from the core feature. We would like to add it to simplify our
>> > invocations of https://github.com/awslabs/git-secrets, so it's on my
>> > radar.
>> 
>> Having such an env var as part of the initial series seems like a
>> sensible thing to have.
>
> Eh. To me, it feels like feature creep. It also is something we could
> add today to the existing hook mechanism (even if it's a little
> pointless since you can basename, like you say), so it feels orthogonal.
> I would prefer not to add it in this series.

Sure, I guess you can add two hook sections to replace e.g. your
{pre,post}-receive hook (which are commonly routed to the same script
with file-based hooks). Having a single setenv() seems easy enough, and
I'd bet a way more common use-case than wanting to skip earlier defined
hooks...

>> > I am not sure what it means for a single executable to write "parallel =
>> > true" - it is a single executable.
>> >
>> > Ok, that is me being facetious - I think you are saying we can AND
>> > together all of the 'hook.<thing-with-event-we-care-about>.parallel' to
>> > decide whether or not to run in parallel.
>> 
>> Right, the case (whatever the config mechanism) wanting to use several
>> off-the-shelf hooks and accomplish through git some version of this:
>> 
>>     parallel -j8 pre-receive-parallel-*.sh &&
>>     parallel -j1 pre-receive-non-parallel-*.sh
>> 
>> I.e. since we have N scripts for the "pre-receive" type, and we're
>> expecting to say whether on not parallelism is OK or not, it seems like
>> a natural thing we'll want to declare that differently for some of those
>> than for others.
>> 
>> > I would rather not discuss this now, for this series, because regardless
>> > of which config schema we use today, we can figure out "parallel unless
>> > we really don't want it" later on. It is too complex to discuss in the
>> > context of "hey, we should also configure hooks somewhere else". Let's
>> > leave it for future work.
>> 
>> The point is that no, we really can't figure it out as easily later on
>> regardless of the config schema.
>> 
>> Because with 1=many you can't have 1=many.someAttribute=XYZ without that
>> *.someAttribute=XYZ declaring something for all of 1=many, whereas if
>> it's 1=1 then 1=1.someAttribute.XYZ obviously applies only to that 1=1.
>
> I think this is moot, since we are moving to "all config hooks have a
> name", but my plan previously was to let this be set on a hookcmd.
> Essentially, your suggestion is to make every hook a hookcmd. My point
> was that it's easy to extend [object which represents an executable] in
> the config to include "always run me in series" or "run me in series for
> this specific event" regardless. That is, one could imagine, discarding
> entirely the hookcmd junk and going with the schema I sketched in my
> last email (which lands somewhere between yours and mine):

Just to be clear, I don't have any concrete suggestion in mind right now
(actually as I write this I can only vaguely recall what I suggested
before).

What I have been suggesting is not any specific implementation, but that
we have a bias for the simple over the complex for an initial
implementation.

Complexity can always be added later, whereas coming up with a config
schema that's irregular compared to other existing config in git is
something we might regret sooner than later.

> [hook "linter"]
>   command = ~/linter.sh
>   event = pre-commit
>   parallel = false
>
> or...
>
> [hook "linter"]
>   command = ~/linter.sh
>   event = pre-commit
>   event = commit-msg
>
> [hook "linter.commit-msg"]
>   parallel = false
>
> Or even...
>
> [hook "linter"]
>   command = ~/linter.sh
>   event = pre-commit
>   event = commit-msg
>   parallel = commit-msg
>
> The possibilities go on, as far as configuration goes.
>
> To me, the harder part of this problem is actually implementing the
> execution. We had some discussions at length early on in the
> config-based hook series about ways to do this kind of complex "some
> stuff needs synchronous execution and some stuff doesn't, in the same
> event" and decided that it mostly resolved to "you ain't gonna need it"
> principle. So I would prefer to discuss this when we find out we do
> actually need it.

What I was mainly going for with "we really can't figure it out as
easily later" above was not that this tweaking of "jobs" or parallelism
was essential per-hook.

But that it was a handy shorthand for a config attribute you might want
to define for hooks, and having what are effectively groups of hooks,
with N "command" or "event" in one section might make things more
complex once you'd want to define optional attributes for one of those
commands or events.

>> [...]
>> >  - I do see value in having an explicit .skip field rather than mapping
>> >    .command to a noop, so "hook.name-of-hook.skip" as described above.
>> >    Of course the method you described will work regardless, since its
>> >    mechanism is based on the inherent result of executing /bin/true.
>> 
>> I think we've mainly focused on the theoretical aspect of this, but FWIW
>> I'm still entirely unclear on what this feature is even aimed for.
>> 
>> All of the rest of our config does not have an explicit "skip" for
>> anything, just last-set-wins. In terms of a real-world use-case wouldn't
>> a user just edit or comment out the config earlier in ~/.gitconfig, and
>> not "skip" it at the end with "git config [...] --add"?
>> 
>> I suspect that the use-case is some Googly centrally managed
>> /etc/gitconfig, but that's just speculation...
>
> Yep, this is exactly why. We've talked often on-list about how we ship
> and configure Git for Googlers, but the upshot is "we pack up 'next' and
> also ship an '/etc/gitconfig'".
>
> But I can also think of one really basic scenario when I'd want to skip
> a hook in one repo without just commenting out my ~/.gitconfig: the
> Gerrit Change-Id hook.
>
> Gerrit requires all commit messages to contain this Change-Id: abc123
> footer. It adds the footer by way of a commit-msg hook. That hook works
> the same regardless of what your Gerrit remote is, so you can run the
> same script on any project that uses Gerrit for code review. If, as I have in
> the past, the vast majority of my projects use Gerrit, but I have one
> project which does not, then I would love to configure the Gerrit
> Change-Id hook globally and un-configure it for my one non-Gerrit
> project.
>
> (At that time, I maintained a subsystem in a project based on Yocto, so
> I needed to regularly contribute to 5-10 projects, all but one of which
> used Gerrit. The one non-Gerrit one used a mailing list. I also had a
> hobby project and my dotfiles, neither of which used Gerrit. This is not
> an uncommon use case.)

> I disagree fundamentally that "find and run a noop command like
> /bin/true" is simpler to average users than "skip it by setting a
> config". Like I said below, by including "skip" both approaches will
> work.

In reply to this, and moving things around a bit in the reply:

>> All of that's something you'll need to explain in detail to users, which
>> all seems way more complex than a simple:
>> 
>>     To skip a previously defined hook insert a noop-command, any will
>>     do, but setting it to "true" (usually /bin/true) is a handy
>>     convention for doing nothing.
>> 
>> I.e. by keeping the config field as doing one thing only you avoid any
>> such collisions etc.
>
> I disagree fundamentally that "find and run a noop command like
> /bin/true" is simpler to average users than "skip it by setting a
> config". Like I said below, by including "skip" both approaches will
> work.

To clarify, I haven't been advocating for that "skip = true" convention
because I think it's a sensible thing per-se, but that I think this
use-case is something that an individual configurable feature in git
doesn't need stateful syntax to deal with.

We have any number of multi-value and single-value config within git. I
just don't see why on balance hooks need a special syntax to skip
earlier set config for hooks specifically.

E.g. this gerrit example would also be true of someone in a corporate
setting using git-send-email, and wanting a list of sendemail.cc on all
but their dotfiles project, or one other non-work project.

Does that mean we need a sendemail.skipCC and special handling for it in
git-send-email.perl? No, I think we'd generally advice users to just put
those projects under ~/work or whatever, and then use config includes to
set config for that group of projects based on the path:

    [includeIf "gitdir:~/work/"]
        path = ~/.gitconfig.d/work

Or, if a hook is really so special that it's needed everywhere define it
in /etc/gitconfig, and then just make the hook itself do:

    if test "$(git config --bool googleHooks.disableOurGlobalHook)" = "true"
    then
        exit 0
    fi

Which is pretty much (with the hook.* config prefix) how we've adviced
users to do this since approximately forever with the sample hooks we
ship.

The advantage of using includes in that way is e.g. that you can easily
see how your hook came to be configured with:

    git config --list --show-origin

I.e. that (by convention) it comes via a conditionally included
~/.gitconfig.d/gerrit file. If it's a multi-value like sendemail.cc the
semantics are also clear, e.g. you can get all values we'll use with
"git config --get-all".

Whereas choosing to implement this with something that *looks* like a
config keyword, but really isn't is just confusing.

We need to explain in one way how users might arrange for the likes of
sendemail.cc to be defined for some, but not all of their repos, and
explain it differently when it comes to hooks. There's inherent value in
that explanation being the same for both.

Emily Shaffer Aug. 6, 2021, 8:18 p.m. UTC | #9

On Fri, Aug 06, 2021 at 12:26:02AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Thu, Aug 05 2021, Emily Shaffer wrote:
> 
> > On Thu, Aug 05, 2021 at 02:17:29AM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> [...]
> >> > No, that's in fact as designed, with my model B. The user configured
> >> > "echo hi" to run on "verify-commit" events; if those events are
> >> > initially used by some wrapper, but later we decide they're a great idea
> >> > and absorb the verify-commit event into native Git, then this is working
> >> > as intended. I think your argument is based on a misunderstanding of the
> >> > design.
> >> >
> >> > Could it be that the way I provided the examples (my schema after A: and
> >> > your schema after B:) made it hard to parse? Sorry about that if so.
> >> 
> >> Aren't you assuming that users who specify a verify-commit will be happy
> >> because git's usurping of that will 1=1 match what they were using
> >> "verify-commit" for.
> >> 
> >> I'm pointing out that we can't know that, and since you want to make
> >> "git hook run" a general thing that runs any <name> of script you've
> >> configured, and not just what's in githooks(5) that it becomes very
> >> likely that if we add a new hook with some obvious name that we'll
> >> either break things for users, or subtly change behavior.
> >> 
> >> Which isn't just theoretical, e.g. I tend to run something like a "git
> >> log --check @{u}.." before I run git-send-email, with this configurable
> >> hook mechanism having a "git hook run sendemail-check" would be a way I
> >> might expose that in my own ~/.gitconfig.
> >> 
> >> But if git-send-email learns a "sendemail-check" and the behavior
> >> doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects
> >> me to return a status code to ask me if I want to abort on a failed
> >> --check, but mine expects a revision range to run "log --check".
> >> 
> >> In practice that's a non-issue with the current hook mechanism,
> >> i.e. nobody's sticking a script into .git/hooks/my-custom-name and
> >> expecting it to do anything useful (and if they are, they have only
> >> themselves to blame).
> >> 
> >> Whereas we'd now be actively inviting users to squat on the same
> >> namespace we ourselves will need for future hooks.
> >
> > Yeah, this is a good point. Seems worth a note in the 'git hook run'
> > doc, making a point that "you can use this for your wrapper to run
> > specific hooks, but be careful about namespace collisions". We're a lot
> > less likely to add a hook named "repotool-verify-commit" than we are to
> > add a hook named "verify-commit", for example.
> >
> > I think it's enough to warn about future namespace collisions and make
> > an "at your own risk" note.
> 
> I might have lost track at this point, but later examples in this E-Mail
> you show don't seem to require such a note.
> 
> I.e. it's only an issue if we conflate a semantically meaningful slot
> like "pre-commit" in the config with one that can also have the meaning
> of simply defining an arbitrary user-decided name.

No, I don't think that's the case. The examples I used later in the mail
don't have anything to do with this "wrapper wants to use 'git hooks
run'" case.

Let's imagine a current commonly-used wrapper, 'repo'
(https://gerrit.googlesource.com/git-repo/+/refs/heads/master/README.md).
This wrapper is used to manage multiple Git repos into something
semantically similar to submodules.

Let's say that 'repo' decides that instead of rolling its own hook
system, it would like to use Git's native hook system. Let's say that it
wants to call user-defined hooks any time someone has just completed
running 'repo sync'. It could ask users to define a config like so:

  [hook "myrepohook"]
    event = post-sync
    command = ~/some/downloaded/myrepohook --with --args

Then 'repo' itself can, at the end of its 'repo sync' implementation,
call 'git hook run post-sync' (instead of using some other hook
specification schema).

But Git itself has a 'git submodule sync' command, and let's imagine
that we decided we want to run some user-defined hook at the end of 'git
submodule sync'. Naturally, we can call it 'post-sync', and then we will
accidentally invoke the hooks which someone configured for 'repo' - as
you pointed out would be a concern with my design, this will *also* be
a concern with your design.

This specific use case is what I think we can get around with some "at
your own risk" documentation, because we are much less likely to collide
with someone asking their users to configure like so:

  [hook "myrepohook"]
    event = repo-tool-post-sync
    command = ~/some/downloaded/myrepohook --with-args

> >> > No, but it's something I'm interested in passing as an environment
> >> > variable. I didn't add it to this series because it seemed to me to
> >> > distract from the core feature. We would like to add it to simplify our
> >> > invocations of https://github.com/awslabs/git-secrets, so it's on my
> >> > radar.
> >> 
> >> Having such an env var as part of the initial series seems like a
> >> sensible thing to have.
> >
> > Eh. To me, it feels like feature creep. It also is something we could
> > add today to the existing hook mechanism (even if it's a little
> > pointless since you can basename, like you say), so it feels orthogonal.
> > I would prefer not to add it in this series.
> 
> Sure, I guess you can add two hook sections to replace e.g. your
> {pre,post}-receive hook (which are commonly routed to the same script
> with file-based hooks). Having a single setenv() seems easy enough, and
> I'd bet a way more common use-case than wanting to skip earlier defined
> hooks...

More than anything else, I think this comment convinced me that the skip
config also falls under YAGNI and I can drop it (and we can add it later
if "hook.myhook.command = true" isn't clear enough).

I snipped the rest because I don't disagree or have anything further to
say. Look for a reroll this weekend, we hope.

 - Emily

[8/9] hook: teach 'hookcmd' config to alias hook scripts

Commit Message

Comments

Patch