Message ID | 20210715232603.3415111-9-emilyshaffer@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | config-based hooks restarted | expand |
On Thu, Jul 15 2021, Emily Shaffer wrote: > To enable fine-grained options which apply to a single hook executable, > and to make it easier for a single executable to be run on multiple hook > events, teach "hookcmd.<alias>.config". These can be configured as > follows: > [...] > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt > index a97b980cca..5b35170664 100644 > --- a/Documentation/config/hook.txt > +++ b/Documentation/config/hook.txt > @@ -3,6 +3,11 @@ hook.<command>.command:: > executable on your device, a oneliner for your shell, or the name of a > hookcmd. See linkgit:git-hook[1]. > > +hookcmd.<name>.command:: > + A command to execute during a hook for which <name> has been specified > + as a command. This can be an executable on your device or a oneliner for > + your shell. See linkgit:git-hook[1]. > + > [...] > +Global config > +---- > + [hook "post-commit"] > + command = "linter" > + command = "~/typocheck.sh" > + > + [hookcmd "linter"] > + command = "/bin/linter --c" > +---- > + > +Local config > +---- > + [hook "prepare-commit-msg"] > + command = "linter" > + [hook "post-commit"] > + command = "python ~/run-test-suite.py" > +---- > + > +With these configs, you'd then run post-commit hooks in this order: > + > + /bin/linter --c > + ~/typocheck.sh > + python ~/run-test-suite.py > + .git/hooks/post-commit (if present) > + > +and prepare-commit-msg hooks in this order: > + > + /bin/linter --c > + .git/hooks/prepare-commit-msg (if present) > I still have outstanding feedback on the fundamental design here. I.e. why is this not: hook.<name>.event = post-commit hook.<name>.command = <path> See: https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ As noted there I don't see why not, and more complexity results from the design choice of doing it the way you're proposing. I.e. we can't discover hooks based on config prefixes, and we end up sticking full FS paths in config keys.
On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote: > > > On Thu, Jul 15 2021, Emily Shaffer wrote: > > > To enable fine-grained options which apply to a single hook executable, > > and to make it easier for a single executable to be run on multiple hook > > events, teach "hookcmd.<alias>.config". These can be configured as > > follows: > > [...] > > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt > > index a97b980cca..5b35170664 100644 > > --- a/Documentation/config/hook.txt > > +++ b/Documentation/config/hook.txt > > @@ -3,6 +3,11 @@ hook.<command>.command:: > > executable on your device, a oneliner for your shell, or the name of a > > hookcmd. See linkgit:git-hook[1]. > > > > +hookcmd.<name>.command:: > > + A command to execute during a hook for which <name> has been specified > > + as a command. This can be an executable on your device or a oneliner for > > + your shell. See linkgit:git-hook[1]. > > + > > [...] > > +Global config > > +---- > > + [hook "post-commit"] > > + command = "linter" > > + command = "~/typocheck.sh" > > + > > + [hookcmd "linter"] > > + command = "/bin/linter --c" > > +---- > > + > > +Local config > > +---- > > + [hook "prepare-commit-msg"] > > + command = "linter" > > + [hook "post-commit"] > > + command = "python ~/run-test-suite.py" > > +---- > > + > > +With these configs, you'd then run post-commit hooks in this order: > > + > > + /bin/linter --c > > + ~/typocheck.sh > > + python ~/run-test-suite.py > > + .git/hooks/post-commit (if present) > > + > > +and prepare-commit-msg hooks in this order: > > + > > + /bin/linter --c > > + .git/hooks/prepare-commit-msg (if present) > > > > I still have outstanding feedback on the fundamental design > here. I.e. why is this not: > > hook.<name>.event = post-commit > hook.<name>.command = <path> > > See: > > https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ > > As noted there I don't see why not, and more complexity results from the > design choice of doing it the way you're proposing. I.e. we can't > discover hooks based on config prefixes, and we end up sticking full FS > paths in config keys. Interesting. My gut says that it would make it harder to neatly write a config with the same hook running at the very beginning of one event and the very end of another, but I'm not sure whether that's a case to optimize for. I would also need twice as many lines to run a script I wrote as a hook - that is, the base case which is probably all most people will need. So with your proposal I *must* name every hook I want to add. Instead of "hook.pre-commit.command = ~/check-for-debug-strings" I need to configure "hook.check-debug-strings.event = pre-commit" and "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a little frustrating, and if I never want to skip that check or move it later or something, it will be extra overhead for no gain for me. I do agree that your approach bans the regrettably awkward "hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure whether or not it's worth it. To help us visualize the change, here's some common and complicated tasks and how they look with either schema (let mine be A and yours be B): #1 - Add a oneliner (not executing a script) A: [hook "post-commit"] command = echo post commit B: [hook "oneliner"] event = post-commit command = echo post commit #2 - Execute a script A: [hook "post-commit"] command = ~/my-post-commit-hook B: [hook "script"] event = post-commit command = ~/my-post-commit-hook #3 - Run a handful of scripts in a specific order on one event A: [hook "post-commit"] command = ~/my-post-commit-1 command = ~/my-post-commit-2 command = ~/my-post-commit-3 B: [hook "script 1"] event = post-commit command = ~/my-post-commit-1 [hook "script 2"] event = post-commit command = ~/my-post-commit-2 [hook "script 3"] event = post-commit command = ~/my-post-commit-3 #4 - Skip a script configured more globally A: (original config) [hook "post-commit"] command = /corp/stuff/corp-post-commit (local config) [hookcmd "/corp/stuff/corp-post-commit"] skip = true OR, (original config) [hookcmd "corp-pc"] command = /corp/stuff/corp-post-commit [hook "post-commit"] command = corp-pc (local config) [hookcmd "corp-pc"] skip = true B: (original config) [hook "corp-pc"] event = post-commit command = /corp/stuff/corp-post-commit (local config) [hook "corp-pc"] skip = true #5 - Execute one script for multiple events A: [hookcmd "reusable-hook"] command = /long/path/reusable-hook [hook "post-commit"] command = reusable-hook [hook "pre-commit"] command = reusable-hook [hook "prepare-commit-msg"] command = reusable-hook B: [hook "reusable-hook"] command = /long/path/reusable-hook event = post-commit event = pre-commit event = prepare-commit-msg #6 - Execute the same script early for one event and late for another event A: (global) [hookcmd "reusable-hook"] command = /long/path/reusable-hook [hook "pre-commit"] command = reusable-hook (local) [hook "post-commit"] command = other command = hooks command = reusable-hook B: (global) [hook "reusable-hook"] command = /long/path/reusable-hook event = pre-commit (local) [hook "other"] event = post-commit command = other [hook "hooks"] event = post-commit command = hooks [hook "reusable-hook"] event = reusable-hook I'm not going to comment on which one I like more yet - I think I will study this for a while and let others weigh in on their preferences too. I tried to order the cases from most common to less common. Please feel free to chime in with more use cases that you think would be handy which I forgot :) - Emily
On Thu, Jul 22 2021, Emily Shaffer wrote: > On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> >> On Thu, Jul 15 2021, Emily Shaffer wrote: >> >> > To enable fine-grained options which apply to a single hook executable, >> > and to make it easier for a single executable to be run on multiple hook >> > events, teach "hookcmd.<alias>.config". These can be configured as >> > follows: >> > [...] >> > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt >> > index a97b980cca..5b35170664 100644 >> > --- a/Documentation/config/hook.txt >> > +++ b/Documentation/config/hook.txt >> > @@ -3,6 +3,11 @@ hook.<command>.command:: >> > executable on your device, a oneliner for your shell, or the name of a >> > hookcmd. See linkgit:git-hook[1]. >> > >> > +hookcmd.<name>.command:: >> > + A command to execute during a hook for which <name> has been specified >> > + as a command. This can be an executable on your device or a oneliner for >> > + your shell. See linkgit:git-hook[1]. >> > + >> > [...] >> > +Global config >> > +---- >> > + [hook "post-commit"] >> > + command = "linter" >> > + command = "~/typocheck.sh" >> > + >> > + [hookcmd "linter"] >> > + command = "/bin/linter --c" >> > +---- >> > + >> > +Local config >> > +---- >> > + [hook "prepare-commit-msg"] >> > + command = "linter" >> > + [hook "post-commit"] >> > + command = "python ~/run-test-suite.py" >> > +---- >> > + >> > +With these configs, you'd then run post-commit hooks in this order: >> > + >> > + /bin/linter --c >> > + ~/typocheck.sh >> > + python ~/run-test-suite.py >> > + .git/hooks/post-commit (if present) >> > + >> > +and prepare-commit-msg hooks in this order: >> > + >> > + /bin/linter --c >> > + .git/hooks/prepare-commit-msg (if present) >> > >> >> I still have outstanding feedback on the fundamental design >> here. I.e. why is this not: >> >> hook.<name>.event = post-commit >> hook.<name>.command = <path> >> >> See: >> >> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ >> >> As noted there I don't see why not, and more complexity results from the >> design choice of doing it the way you're proposing. I.e. we can't >> discover hooks based on config prefixes, and we end up sticking full FS >> paths in config keys. > > Interesting. My gut says that it would make it harder to neatly write a > config with the same hook running at the very beginning of one event and > the very end of another, but I'm not sure whether that's a case to > optimize for. > > I would also need twice as many lines to run a script I wrote as a hook > - that is, the base case which is probably all most people will need. So > with your proposal I *must* name every hook I want to add. Instead of > "hook.pre-commit.command = ~/check-for-debug-strings" I need to > configure "hook.check-debug-strings.event = pre-commit" and > "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a > little frustrating, and if I never want to skip that check or move it > later or something, it will be extra overhead for no gain for me. The gain is that "git hook list" becomes a trivial "git config -show-origin --show-scope --get-regexp" wrapper. So the series either doesn't need "git hook list" or such a thing becomes much less complex, especially given the proposed addition of other features in the area like "git hook edit", i.e. (quoting the linked E-Mail): As just one example; surely "git config edit <name>" would need to run around and find config files to edit, then open them in a loop for you, no? Which we'd eventually want for "git config" in general with an --edit-regexp option or whatever, which brings us (well, at least me) back to "then let's just add it to git-config?". > I do agree that your approach bans the regrettably awkward > "hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure > whether or not it's worth it. That design choice also means that you can't expand the path using "git config --get --type=path. We do have that with the "includeIf" construct, but if we can avoid it we should, it makes it play nicer with other assumptions and features of the config system. As noted in the follow-up reply while we don't case normalize the LeVeL" part of "ThReE.LeVeL.KeY" that's tolower(), which we know isn't a 1=1 mapping on some FS's. https://lore.kernel.org/git/87y2ebo16v.fsf@evledraar.gmail.com/ > To help us visualize the change, here's some common and complicated > tasks and how they look with either schema (let mine be A and yours be > B): Before diving into that, I'll just say I don't care about the trivial specifics of how this is done, i.e. the bikeshedding of what the config keys etc. are named. Just in (as noted above) design choices here forcing avoidable complexity in other areas. > #1 - Add a oneliner (not executing a script) > A: > [hook "post-commit"] > command = echo post commit > B: > [hook "oneliner"] > event = post-commit > command = echo post commit > #2 - Execute a script > A: > [hook "post-commit"] > command = ~/my-post-commit-hook > B: > [hook "script"] > event = post-commit > command = ~/my-post-commit-hook ... > #3 - Run a handful of scripts in a specific order on one event > A: > [hook "post-commit"] > command = ~/my-post-commit-1 > command = ~/my-post-commit-2 > command = ~/my-post-commit-3 > B: > [hook "script 1"] > event = post-commit > command = ~/my-post-commit-1 > [hook "script 2"] > event = post-commit > command = ~/my-post-commit-2 > [hook "script 3"] > event = post-commit > command = ~/my-post-commit-3 To reply to all the above, yes, your suggestion comes out ahead in being less verbose. But isn't the real difference not in the differing prefixes, i.e. hook.* and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly focusing on, i.e. what requires the added complexity. But that in that in your proposed way of doing it it's: hook.<event-or-name>.* V.s. my suggestion of: hook.<name>.* And thus whenever you have a <event-or-name> that just happens to be a built-in hook listed in githooks(5) we in (A) implicitly expand config like: hook.post-commit.command = echo foo To: hook.post-commit.command = echo hi hook.post-commit.type = post-commit But not knowing about "foo" we don't expand: hook.foo.command = echo foo To: hook.foo.command = echo hi hook.foo.type = foo # This would be an error, or ignored. But rather leave "dangling" for the user to later supply the "*.event" themselves, i.e.: hook.foo.command = echo hi hook.foo.event = post-commit And means that you presumably need to detect this case and error about it, but my proposed model does not: hook.post-commit.command = echo hi # User error: "*.type" and <event-or-name>" does not match for # "hook.*.command" hook.post-commit.type = pre-commit And furthermore, that if someone in your model were to do: hook.verify-commit.command = echo hi It's "dangling" today, but if a later version of git learns about a "verify-commit" hook we'll start running it unexpectedly. Because your design conflates the slot for known hook type names and user-supplied names. On balance I think it's better just to always supply two lines per-hook, but whether we have this proposed shorthand or not is mostly orthogonal to everything I mentioned in https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ I.e. my proposed version could also have it, but thinking about it I think it's not worth it, we should always use <name>, not <event-or-name> for the reasons noted if you'll read ahead... > #4 - Skip a script configured more globally > A: > (original config) > [hook "post-commit"] > command = /corp/stuff/corp-post-commit > (local config) > [hookcmd "/corp/stuff/corp-post-commit"] > skip = true > OR, > (original config) > [hookcmd "corp-pc"] > command = /corp/stuff/corp-post-commit > [hook "post-commit"] > command = corp-pc > (local config) > [hookcmd "corp-pc"] > skip = true > B: > (original config) > [hook "corp-pc"] > event = post-commit > command = /corp/stuff/corp-post-commit > (local config) > [hook "corp-pc"] > skip = true ...which are among other things that no, my proposed version doesn't have "skip" at all, see point #3 at https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ I.e. I think the "skip" is a thing that falls out of a complexity of your design that I'm proposing to do away with. That complexity being that you use <event-or-name> and I use <name>, and you want to in turn support N number of "*.command" for any "hook.<event-or-name>". I'm suggesting we use the typical "last key wins" semantics, if you want multiple commands for a given hook type you'll just supply multiple "hook.<name>" sections with the same "hook.*.event = type" in all. The way to skip hooks in my proposal is: hook.<name>.command = true Or whatever noop command you want on the RHS. In practice we wouldn't invoke "true", just like we don't invoke "cat" for "PAGER=cat". But unlike "*.skip" that doesn't require complexity in implementation or user understanding, it just falls naturally out of the rest of the model. > #5 - Execute one script for multiple events > A: > [hookcmd "reusable-hook"] > command = /long/path/reusable-hook > [hook "post-commit"] > command = reusable-hook > [hook "pre-commit"] > command = reusable-hook > [hook "prepare-commit-msg"] > command = reusable-hook > B: > [hook "reusable-hook"] > command = /long/path/reusable-hook > event = post-commit > event = pre-commit > event = prepare-commit-msg It's been so long since I wrote the original E-Mail that I'm having to skim my own summary there & come up with things as I go along, so maybe this conflicts with something I said earlier. But no, I don't think I meant that *.event should be multi-value, or that that magic is worthwhile. I.e. I think we just want: [hook "not-a-reusable-section-1"] command = /long/path/reusable-hook event = post-commit [hook "not-a-reusable-section-2"] command = /long/path/reusable-hook event = pre-commit [hook "not-a-reusable-section-3"] command = /long/path/reusable-hook event = prepare-commit-msg I.e. is wanting to use the same command for N different hooks really that common a use-case to avoid a little verbosity, and quite a lot of complexity? How would such a hook even be implemented? We don't have anything in your hook execution model (and I don't remember you adding this) which passes down a "you are this type of hook" to hooks. It's now implicit in that the hook invoked at .git/hooks/post-commit or .git/hooks/pre-commit can check its own basename, but it won't be with configurable hooks. We could start passing down a GIT_HOOK_TYPE in the environment or whatever, but I think the simpler case of just having the user do it is better. I'm assuming that mainly because people have wanted a "/long/path/reusable-hook" router script because we have not supported executing N hooks, or supported concurrency. Once we do that the complexity of such routing scripts (mostly, not everyone's needs will be met) will be replaced by a little bit of config. I don't see why it's worth it to micro-optimize for lines in that config at the cost of increased config complexity. For example, how do you "skip" an "event" type? You have it for "*.command"? So let's say in your model (A) I have system config like: [hook "my-reusable"] command = /some/commit-msg-hook event = prepare-commit-msg Now I want to skip that, easy enough, in my local config: [hook "my-reusable"] skip = true Or if I wanted to also invoke it for commit-msg events, in my local config (which piggy-backs on the system one): # Now runs for both prepare-commit-msg and commit-message [hook "my-reusable"] event = commit-msg Or, if I want to skip the command name and substitute my own: [hook "my-reusable"] skip = true command = /my/commit-msg-hook I'm just assuming you mean that to work, as discussed above I think all this is leading us down the road of too much complexity. But anyway, now I want to instead not run it as a prepare-commit-msg hook, but a commit-msg hook. As far as I can tell you haven't specified that, but I think something consistent with your model would be [hook "my-reusable"] event = skip event = commit-msg Or rather, because that conflates "skip" with a hook event type name: [hook "my-reusable"] skipEvent = true event = commit-msg I.e. the "skip" you have now really means "skipCommand", so we'll need a skipSomeName to skip other "hook.*.someName". As noted above I think all of this is too complex, let's just do the system config like this (exact same as the above): [hook "my-reusable"] command = /some/commit-msg-hook event = prepare-commit-msg To skip it, in my local config: [hook "my-reusable"] command = true And to re-use /some/commit-msg-hook without having to re-include it in your local config the answer is that we don't support that level of cleverness. Just do this: # First skip the system hook [hook "my-reusable"] command = true # Set up another hook, just copy/pasting the /some/commit-msg-hook command [hook "my-local"] command = /some/commit-msg-hook event = commit-msg I.e. I think these cases of someone having say a /etc/gitconfig hook on their system not under their control that they want to not run, *but* not as the "event" there, but as another event type is too specific a use-case for us to worry about. > #6 - Execute the same script early for one event and late for another > event > A: > (global) > [hookcmd "reusable-hook"] > command = /long/path/reusable-hook > [hook "pre-commit"] > command = reusable-hook > (local) > [hook "post-commit"] > command = other > command = hooks > command = reusable-hook Even with I think it's fair to say deep knowledge of your proposal at this point I still needed to read this a few times to see if that: command = reusable-hook Is referring to: [hookcmd "reusable-hook"] I.e. is it going to run: command = /long/path/reusable-hook Or is it just re-specifying /long/path/reusable-hook but relying on a $PATH lookup? Having reasoned through that I think the answer is the former. But that also means that in your model: [hookcmd "rm -rf /"] command = echo this will not actually destroy your data [hook "pre-commit"] command = rm -rf / Is going to run that friendly "echo" command, since "command = rm -rf /" just refers to the "rm -rf /" <name>, not <command>, right the "hookcmd" line is removed, at which point we'll stop treating it as a <name> and run it as a <command>? In practice I think users are unlikely to use actively harmful names like that. I'm just making the point that I should not need to know about previous config to see if a "hook.pre-commit.command = rm -rf /" is harmless or not, or need to carefully squint to see if the "reusable-hook" is referring to a section name or command name. Or am I just confused at this point? > B: > (global) > [hook "reusable-hook"] > command = /long/path/reusable-hook > event = pre-commit > (local) > [hook "other"] > event = post-commit > command = other > [hook "hooks"] > event = post-commit > command = hooks > [hook "reusable-hook"] > event = reusable-hook No, that's not what I'm proposing. I.e. "*.event" is reserved for built-in events listed in githooks(5) that git itself knows about. If you say "event = reusable-hook" that's going to be one of ignored, warned or errored about. (Probably "ignored" is the best thing to support multi-version setups, but that's not hook-specific, just how we treat unknown config in general) My version of this would be the same as noted with "we don't support that level of cleverness[...]" above. I.e. you'd not re-use that "/long/path/reusable-hook", you can skip it through "command = true", then just copy the relevnt part into your new config. So: (global) [hook "reusable-hook"] command = /long/path/reusable-hook event = pre-commit (local) [hook "reusable-hook"] command = true # skip it # The "hooks" name is arbitrary, "my-hooks" or whatever would be # clearer, but just going with your example... [hook "hooks"] event = post-commit command = hooks # Not very reusable then... [hook "reusable-hook"] command = /long/path/reusable-hook event = pre-commit > [...]Please feel free to chime in with more use cases that you think would > be handy which I forgot :) I couldn't find this at a quick glance but I think we also had a disussion at some point about hooks controlling parallelism. AFAICT your current implementation just has global: hook.jobs=N And we then blacklist certain hooks to always have hook.jobs=1, e.g. the commit-msg hook that needs an implicit "lock" on that file (or rather, we think that's the most common use-case). I think my version of always having hook.<name>.{event,command} be one value is also better in that case, i.e. we'd then: [hook "myhook"] command = some-command event = pre-receive parallel = true # the default [hook "myhook2"] command = some-command2 event = pre-receive parallel = true # the default [hook "myhook3"] command = some-unconcurrent-command event = pre-receive parallel = false # I'm not OK with concurrency To go along with something like: hook jobs = 8 Or: [hookEvent "pre-receive"] jobs = 4 But if you have N numer of "command" in a section it gets murky, does "parallel = false" then apply to the whole section, but sections can have one or more values. So we'd need both a "hookEvent.pre-receive.jobs=N" and per-section config to control/suppress parallelism? I haven't thought about it deeply, but have a hunch that having sections be a 1=1 mapping with a specific command instead of either 1=1 or 1=many is going to make that easier to implement and understand.
On Fri, Jul 23, 2021 at 09:41:35AM +0200, Ævar Arnfjörð Bjarmason wrote: > > > On Thu, Jul 22 2021, Emily Shaffer wrote: > > > On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > >> > >> On Thu, Jul 15 2021, Emily Shaffer wrote: > >> > >> > To enable fine-grained options which apply to a single hook executable, > >> > and to make it easier for a single executable to be run on multiple hook > >> > events, teach "hookcmd.<alias>.config". These can be configured as > >> > follows: > >> > [...] > >> > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt > >> > index a97b980cca..5b35170664 100644 > >> > --- a/Documentation/config/hook.txt > >> > +++ b/Documentation/config/hook.txt > >> > @@ -3,6 +3,11 @@ hook.<command>.command:: > >> > executable on your device, a oneliner for your shell, or the name of a > >> > hookcmd. See linkgit:git-hook[1]. > >> > > >> > +hookcmd.<name>.command:: > >> > + A command to execute during a hook for which <name> has been specified > >> > + as a command. This can be an executable on your device or a oneliner for > >> > + your shell. See linkgit:git-hook[1]. > >> > + > >> > [...] > >> > +Global config > >> > +---- > >> > + [hook "post-commit"] > >> > + command = "linter" > >> > + command = "~/typocheck.sh" > >> > + > >> > + [hookcmd "linter"] > >> > + command = "/bin/linter --c" > >> > +---- > >> > + > >> > +Local config > >> > +---- > >> > + [hook "prepare-commit-msg"] > >> > + command = "linter" > >> > + [hook "post-commit"] > >> > + command = "python ~/run-test-suite.py" > >> > +---- > >> > + > >> > +With these configs, you'd then run post-commit hooks in this order: > >> > + > >> > + /bin/linter --c > >> > + ~/typocheck.sh > >> > + python ~/run-test-suite.py > >> > + .git/hooks/post-commit (if present) > >> > + > >> > +and prepare-commit-msg hooks in this order: > >> > + > >> > + /bin/linter --c > >> > + .git/hooks/prepare-commit-msg (if present) > >> > > >> > >> I still have outstanding feedback on the fundamental design > >> here. I.e. why is this not: > >> > >> hook.<name>.event = post-commit > >> hook.<name>.command = <path> > >> > >> See: > >> > >> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ > >> > >> As noted there I don't see why not, and more complexity results from the > >> design choice of doing it the way you're proposing. I.e. we can't > >> discover hooks based on config prefixes, and we end up sticking full FS > >> paths in config keys. > > > > Interesting. My gut says that it would make it harder to neatly write a > > config with the same hook running at the very beginning of one event and > > the very end of another, but I'm not sure whether that's a case to > > optimize for. > > > > I would also need twice as many lines to run a script I wrote as a hook > > - that is, the base case which is probably all most people will need. So > > with your proposal I *must* name every hook I want to add. Instead of > > "hook.pre-commit.command = ~/check-for-debug-strings" I need to > > configure "hook.check-debug-strings.event = pre-commit" and > > "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a > > little frustrating, and if I never want to skip that check or move it > > later or something, it will be extra overhead for no gain for me. > > The gain is that "git hook list" becomes a trivial "git config > -show-origin --show-scope --get-regexp" wrapper. This isn't a very compelling reason to me, if it makes the user experience worse. (I'm not convinced that it does - just saying I disagree with this reasoning.) > > So the series either doesn't need "git hook list" or such a thing > becomes much less complex, especially given the proposed addition of > other features in the area like "git hook edit", i.e. (quoting the > linked E-Mail): I still think "git hook list" is useful to have for end-users, compared to remembering the appropriate "git config" invocation. It futureproofs us if we do want to change the execution ordering to something besides config order, it's easier to remember/more discoverable, etc. > As just one example; surely "git config edit <name>" would need to > run around and find config files to edit, then open them in a loop > for you, no? > > Which we'd eventually want for "git config" in general with an > --edit-regexp option or whatever, which brings us (well, at least > me) back to "then let's just add it to git-config?". > > > I do agree that your approach bans the regrettably awkward > > "hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure > > whether or not it's worth it. > > That design choice also means that you can't expand the path using "git > config --get --type=path. Yeah, this is a pretty handy point. > > We do have that with the "includeIf" construct, but if we can avoid it > we should, it makes it play nicer with other assumptions and features of > the config system. > > As noted in the follow-up reply while we don't case normalize the LeVeL" > part of "ThReE.LeVeL.KeY" that's tolower(), which we know isn't a 1=1 > mapping on some > FS's. https://lore.kernel.org/git/87y2ebo16v.fsf@evledraar.gmail.com/ This, too. > > > To help us visualize the change, here's some common and complicated > > tasks and how they look with either schema (let mine be A and yours be > > B): > > Before diving into that, I'll just say I don't care about the trivial > specifics of how this is done, i.e. the bikeshedding of what the config > keys etc. are named. > > Just in (as noted above) design choices here forcing avoidable > complexity in other areas. > > > #1 - Add a oneliner (not executing a script) > > A: > > [hook "post-commit"] > > command = echo post commit > > B: > > [hook "oneliner"] > > event = post-commit > > command = echo post commit > > #2 - Execute a script > > A: > > [hook "post-commit"] > > command = ~/my-post-commit-hook > > B: > > [hook "script"] > > event = post-commit > > command = ~/my-post-commit-hook > > ... > > > #3 - Run a handful of scripts in a specific order on one event > > A: > > [hook "post-commit"] > > command = ~/my-post-commit-1 > > command = ~/my-post-commit-2 > > command = ~/my-post-commit-3 > > B: > > [hook "script 1"] > > event = post-commit > > command = ~/my-post-commit-1 > > [hook "script 2"] > > event = post-commit > > command = ~/my-post-commit-2 > > [hook "script 3"] > > event = post-commit > > command = ~/my-post-commit-3 > > To reply to all the above, yes, your suggestion comes out ahead in being > less verbose. > > But isn't the real difference not in the differing prefixes, i.e. hook.* > and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly > focusing on, i.e. what requires the added complexity. > > But that in that in your proposed way of doing it it's: > > hook.<event-or-name>.* > > V.s. my suggestion of: > > hook.<name>.* > No, I think that's a misunderstanding of proposal A. My proposal A is always hook.<event>.*, and hookcmd.<name>.*, and the way that I read your proposal B is that it's always hook.<name>.*. > And thus whenever you have a <event-or-name> that just happens to be a > built-in hook listed in githooks(5) we in (A) implicitly expand config > like: > > hook.post-commit.command = echo foo > > > hook.post-commit.command = echo hi > hook.post-commit.type = post-commit This is where I'm becoming confused. I *think* you're saying that with your proposal B, if the subsection happens to be the name of a built-in hook, we can skip the .event field. But it also reads like you're saying this is how my proposal A works, which isn't the case, because the subsection is always assumed to be the name of the hook event we're trying to run. > > But not knowing about "foo" we don't expand: > > hook.foo.command = echo foo > > > hook.foo.command = echo hi > hook.foo.type = foo # This would be an error, or ignored. > > But rather leave "dangling" for the user to later supply the "*.event" > themselves, i.e.: > > hook.foo.command = echo hi > hook.foo.event = post-commit If you're talking about proposal B, then no, this isn't how it works. This config would imply that someone else (like a wrapper) can run "echo hi" by asking for "git hook run foo". But I'm not sure exactly that that's what you mean... > > And means that you presumably need to detect this case and error about > it, but my proposed model does not: > > hook.post-commit.command = echo hi > # User error: "*.type" and <event-or-name>" does not match for > # "hook.*.command" > hook.post-commit.type = pre-commit > > And furthermore, that if someone in your model were to do: > > hook.verify-commit.command = echo hi > > It's "dangling" today, but if a later version of git learns about a > "verify-commit" hook we'll start running it unexpectedly. No, that's in fact as designed, with my model B. The user configured "echo hi" to run on "verify-commit" events; if those events are initially used by some wrapper, but later we decide they're a great idea and absorb the verify-commit event into native Git, then this is working as intended. I think your argument is based on a misunderstanding of the design. Could it be that the way I provided the examples (my schema after A: and your schema after B:) made it hard to parse? Sorry about that if so. > > Because your design conflates the slot for known hook type names and > user-supplied names. > > On balance I think it's better just to always supply two lines per-hook, > but whether we have this proposed shorthand or not is mostly orthogonal > to everything I mentioned in > https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ > > I.e. my proposed version could also have it, but thinking about it I > think it's not worth it, we should always use <name>, not > <event-or-name> for the reasons noted if you'll read ahead... > > > #4 - Skip a script configured more globally > > A: > > (original config) > > [hook "post-commit"] > > command = /corp/stuff/corp-post-commit > > (local config) > > [hookcmd "/corp/stuff/corp-post-commit"] > > skip = true > > OR, > > (original config) > > [hookcmd "corp-pc"] > > command = /corp/stuff/corp-post-commit > > [hook "post-commit"] > > command = corp-pc > > (local config) > > [hookcmd "corp-pc"] > > skip = true > > B: > > (original config) > > [hook "corp-pc"] > > event = post-commit > > command = /corp/stuff/corp-post-commit > > (local config) > > [hook "corp-pc"] > > skip = true > > ...which are among other things that no, my proposed version doesn't > have "skip" at all, see point #3 at > https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ > > I.e. I think the "skip" is a thing that falls out of a complexity of > your design that I'm proposing to do away with. > > That complexity being that you use <event-or-name> and I use <name>, and > you want to in turn support N number of "*.command" for any > "hook.<event-or-name>". > > I'm suggesting we use the typical "last key wins" semantics, if you want > multiple commands for a given hook type you'll just supply multiple > "hook.<name>" sections with the same "hook.*.event = type" in all. > > The way to skip hooks in my proposal is: > > hook.<name>.command = true I see - instead of fiddly reorganizing the list, you're just changing what the specific hook name is redirecting to. Then, instead of storing the resolved command in the 'struct hook', we can store the name, and always dereference the name right before execution, using last-one-wins semantics. The only thing I don't like about this is that it's a little bit confusing to the user. I personally would rather have more complicated code and less complicated user experience, so I like 'skip' more because it's readable and because you don't need to guess "oh, it'll just execute /bin/true which is a noop". That's easy to understand for someone familiar with Unix scripting (and therefore for we Git contributors), but confusing for everybody else. Maybe we can take the essential spirit of this implementation but make it a little bit simpler. In pseudocode: populate_list(hookname): # look for everybody who says "hook.*.event = $hookname" hooks = get_configs_matching(hook.*.event, hookname) # the "*" bit is the bit we dereference later for hook in hooks: hook_list.add({.name = hook.key.subsection}) run_hooks(hook_list): for hook in hook_list: # should we even run? if (get_last_config("hook." + hook.name + ".skip")) continue # dereference the hook name and find out the command cmd = get_last_config("hook." + hook.name + ".command") run(cmd) The price for this improved readability is an extra config lookup in each case. But the code is still readable, definitely moreso than rewalking the config list over and over as the implementation stands now. I personally think it's worth it. (One could also imagine a more readable placeholder than /bin/true in the "hook.<name>.command" field, but then we get to decide just how hard we want to avoid colliding with legitimate user hook names. Is .command = skip enough? is .command = RESERVED_NAME_SKIP_THIS_HOOK enough? Is something unrunnable like .command = "skip me" enough, although it could conceivably collide with a determined user?) > > Or whatever noop command you want on the RHS. In practice we wouldn't > invoke "true", just like we don't invoke "cat" for "PAGER=cat". > > But unlike "*.skip" that doesn't require complexity in implementation or > user understanding, it just falls naturally out of the rest of the > model. > > > #5 - Execute one script for multiple events > > A: > > [hookcmd "reusable-hook"] > > command = /long/path/reusable-hook > > [hook "post-commit"] > > command = reusable-hook > > [hook "pre-commit"] > > command = reusable-hook > > [hook "prepare-commit-msg"] > > command = reusable-hook > > B: > > [hook "reusable-hook"] > > command = /long/path/reusable-hook > > event = post-commit > > event = pre-commit > > event = prepare-commit-msg > > It's been so long since I wrote the original E-Mail that I'm having to > skim my own summary there & come up with things as I go along, so maybe > this conflicts with something I said earlier. > > But no, I don't think I meant that *.event should be multi-value, or > that that magic is worthwhile. I.e. I think we just want: > > [hook "not-a-reusable-section-1"] > command = /long/path/reusable-hook > event = post-commit > [hook "not-a-reusable-section-2"] > command = /long/path/reusable-hook > event = pre-commit > [hook "not-a-reusable-section-3"] > command = /long/path/reusable-hook > event = prepare-commit-msg > > I.e. is wanting to use the same command for N different hooks really > that common a use-case to avoid a little verbosity, and quite a lot of > complexity? > > How would such a hook even be implemented? > > We don't have anything in your hook execution model (and I don't > remember you adding this) which passes down a "you are this type of > hook" to hooks. No, but it's something I'm interested in passing as an environment variable. I didn't add it to this series because it seemed to me to distract from the core feature. We would like to add it to simplify our invocations of https://github.com/awslabs/git-secrets, so it's on my radar. And even without this, just today I wanted to configure a hook to really make sure I had config.mak set up in all my Git worktrees (since you pointed out I pushed code that fails -Wall.... ;) ) and set it up to run on both pre-commit and post-checkout. So yes, it is feasible now, for very stupid hooks. > > It's now implicit in that the hook invoked at .git/hooks/post-commit or > .git/hooks/pre-commit can check its own basename, but it won't be with > configurable hooks. > > We could start passing down a GIT_HOOK_TYPE in the environment or > whatever, but I think the simpler case of just having the user do it is > better. > > I'm assuming that mainly because people have wanted a > "/long/path/reusable-hook" router script because we have not supported > executing N hooks, or supported concurrency. Once we do that the > complexity of such routing scripts (mostly, not everyone's needs will be > met) will be replaced by a little bit of config. The above linked git-secrets is an example; the config is irritating and doesn't really need to be there, when we could just teach the script itself how to check GIT_HOOK_TYPE envvar. > > I don't see why it's worth it to micro-optimize for lines in that config > at the cost of increased config complexity. > > For example, how do you "skip" an "event" type? You have it for > "*.command"? So let's say in your model (A) I have system config like: (The following isn't how my model A works, because it refers always to "hook.<event>", as I mentioned above. But I also covered this same topic in detail earlier in this email, so I'm going to snip this section.) > > #6 - Execute the same script early for one event and late for another > > event > > A: > > (global) > > [hookcmd "reusable-hook"] > > command = /long/path/reusable-hook > > [hook "pre-commit"] > > command = reusable-hook > > (local) > > [hook "post-commit"] > > command = other > > command = hooks > > command = reusable-hook > > Even with I think it's fair to say deep knowledge of your proposal at > this point I still needed to read this a few times to see if that: > > command = reusable-hook > > Is referring to: > > [hookcmd "reusable-hook"] > > I.e. is it going to run: > > command = /long/path/reusable-hook > > Or is it just re-specifying /long/path/reusable-hook but relying on a > $PATH lookup? > > Having reasoned through that I think the answer is the former. But that > also means that in your model: > > [hookcmd "rm -rf /"] > command = echo this will not actually destroy your data > [hook "pre-commit"] > command = rm -rf / > > Is going to run that friendly "echo" command, since "command = rm -rf /" > just refers to the "rm -rf /" <name>, not <command>, right the "hookcmd" > line is removed, at which point we'll stop treating it as a <name> and > run it as a <command>? > > In practice I think users are unlikely to use actively harmful names > like that. > > I'm just making the point that I should not need to know about previous > config to see if a "hook.pre-commit.command = rm -rf /" is harmless or > not, or need to carefully squint to see if the "reusable-hook" is > referring to a section name or command name. > > Or am I just confused at this point? Nah, that's an accurate understanding of how it would work with the current implementation. Sure, it's a good point. > > > B: > > (global) > > [hook "reusable-hook"] > > command = /long/path/reusable-hook > > event = pre-commit > > (local) > > [hook "other"] > > event = post-commit > > command = other > > [hook "hooks"] > > event = post-commit > > command = hooks > > [hook "reusable-hook"] > > event = reusable-hook Hm, I think I mistyped when I wrote "event = reusable-hook" here, and meant to write "event = post-commit" to indicate that we would move "reusable-hook" to the bottom of the execution list for "post-commit". I'll snip most of the rebuttal about my typo. > > (global) > [hook "reusable-hook"] > command = /long/path/reusable-hook > event = pre-commit > > (local) > [hook "reusable-hook"] > command = true # skip it > > # The "hooks" name is arbitrary, "my-hooks" or whatever would be > # clearer, but just going with your example... > > [hook "hooks"] > event = post-commit > command = hooks > # Not very reusable then... > [hook "reusable-hook"] > command = /long/path/reusable-hook > event = pre-commit > > > > [...]Please feel free to chime in with more use cases that you think would > > be handy which I forgot :) > > I couldn't find this at a quick glance but I think we also had a > disussion at some point about hooks controlling parallelism. AFAICT your > current implementation just has global: > > hook.jobs=N > > And we then blacklist certain hooks to always have hook.jobs=1, e.g. the > commit-msg hook that needs an implicit "lock" on that file (or rather, > we think that's the most common use-case). > > I think my version of always having hook.<name>.{event,command} be one > value is also better in that case, i.e. we'd then: > > [hook "myhook"] > command = some-command > event = pre-receive > parallel = true # the default > > [hook "myhook2"] > command = some-command2 > event = pre-receive > parallel = true # the default > > [hook "myhook3"] > command = some-unconcurrent-command > event = pre-receive > parallel = false # I'm not OK with concurrency I am not sure what it means for a single executable to write "parallel = true" - it is a single executable. Ok, that is me being facetious - I think you are saying we can AND together all of the 'hook.<thing-with-event-we-care-about>.parallel' to decide whether or not to run in parallel. I would rather not discuss this now, for this series, because regardless of which config schema we use today, we can figure out "parallel unless we really don't want it" later on. It is too complex to discuss in the context of "hey, we should also configure hooks somewhere else". Let's leave it for future work. > I haven't thought about it deeply, but have a hunch that having sections > be a 1=1 mapping with a specific command instead of either 1=1 or 1=many > is going to make that easier to implement and understand. Sure, probably. But it can be added later to either schema... ;) Anyway, I did a quick strawpoll with my colleagues (Jonathan Nieder, Jonathan Tan, and Josh Steadmon) and they all like your syntax more. I am ambivalent between the two, personally. So what I will do, hopefully today, maybe not, is the following: - no more hookcmd. (and there was much rejoicing) - Hooks are specified with "[hook "name-of-hook"]" - I do see value in having an explicit .skip field rather than mapping .command to a noop, so "hook.name-of-hook.skip" as described above. Of course the method you described will work regardless, since its mechanism is based on the inherent result of executing /bin/true. - I do see value in allowing "hook.name-of-hook.event" to be defined repeatably, as described above, so I will include that. And left for later work: - teaching HOOK_EVENT_NAME=post-commit (etc) - figuring out what the heck we want to do with allowing hooks to describe whether they allow parallelism Unfortunately I've dropped the design doc from this series, since nobody seemed interested in having it checked in, but I'll try to rework the help doc to make the schema more clear. Thanks, I think it was useful to hash this out. - Emily
> > I would also need twice as many lines to run a script I wrote as a hook > > - that is, the base case which is probably all most people will need. So > > with your proposal I *must* name every hook I want to add. Instead of > > "hook.pre-commit.command = ~/check-for-debug-strings" I need to > > configure "hook.check-debug-strings.event = pre-commit" and > > "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a > > little frustrating, and if I never want to skip that check or move it > > later or something, it will be extra overhead for no gain for me. > > The gain is that "git hook list" becomes a trivial "git config > -show-origin --show-scope --get-regexp" wrapper. I think that in both schemes, "git hook list" is not difficult to implement. > > I do agree that your approach bans the regrettably awkward > > "hookcmd.~/check-for-debug-strings.skip = true", but I'm not sure > > whether or not it's worth it. > > That design choice also means that you can't expand the path using "git > config --get --type=path. I'm not sure what the use of this is, especially when the more important position is in "hook.pre-commit.command" (which can be expanded, I believe). > As noted in the follow-up reply while we don't case normalize the LeVeL" > part of "ThReE.LeVeL.KeY" that's tolower(), which we know isn't a 1=1 > mapping on some > FS's. https://lore.kernel.org/git/87y2ebo16v.fsf@evledraar.gmail.com/ We're comparing strings in the config file, though, not strings against what's on the filesystem. > To reply to all the above, yes, your suggestion comes out ahead in being > less verbose. > > But isn't the real difference not in the differing prefixes, i.e. hook.* > and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly > focusing on, i.e. what requires the added complexity. OK, this is a good point - it may be confusing for the user to remember the difference between "hook" and "hookcmd", and your scheme eliminates that. > But that in that in your proposed way of doing it it's: > > hook.<event-or-name>.* > > V.s. my suggestion of: > > hook.<name>.* [snip discussion of how hook.<event-or-name> is bad] I think you were thinking that someone would say "let's unify 'hook' and 'hookcmd' then" in response to you saying "'hook' vs 'hookcmd' is bad", and you're arguing against this new point. I don't think anyone is saying that, though. > That complexity being that you use <event-or-name> and I use <name>, and > you want to in turn support N number of "*.command" for any > "hook.<event-or-name>". I don't know if anyone is wanting to support this. > The way to skip hooks in my proposal is: > > hook.<name>.command = true > > Or whatever noop command you want on the RHS. In practice we wouldn't > invoke "true", just like we don't invoke "cat" for "PAGER=cat". > > But unlike "*.skip" that doesn't require complexity in implementation or > user understanding, it just falls naturally out of the rest of the > model. This is a good point. > Even with I think it's fair to say deep knowledge of your proposal at > this point I still needed to read this a few times to see if that: > > command = reusable-hook > > Is referring to: > > [hookcmd "reusable-hook"] > > I.e. is it going to run: > > command = /long/path/reusable-hook > > Or is it just re-specifying /long/path/reusable-hook but relying on a > $PATH lookup? This is true - seeing "hook.post-commit.command = reusable-hook", we need to look at the rest of the config to see how it is interpreted. Going back to the central idea, though, I think that the main advantage of the scheme Emily proposed is the ability to write, more concisely: [hook.pre-commit] command = /path/to/command-1 command = /path/to/command-2 instead of [hook.command-1] event = pre-commit command = /path/to/command-1 [hook.command-2] event = pre-commit command = /path/to/command-2 But Ævar's scheme gives us the advantage that if need to do anything more complicated (even merely slightly more complicated - for example, skipping a hook or overriding the command of a hook), it would be hard to write (and even to just figure it out) in Emily's scheme. In Ævar's scheme, just following the standard "last config wins" rule (and knowing about /usr/bin/true) can already do a lot. For this reason, I think it's worth considering Ævar's scheme - writing 3 lines instead of 2 is not much more difficult to teach and to do.
On Wed, Aug 04 2021, Emily Shaffer wrote: > On Fri, Jul 23, 2021 at 09:41:35AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> >> On Thu, Jul 22 2021, Emily Shaffer wrote: >> >> > On Fri, Jul 16, 2021 at 11:13:42AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> >> >> >> >> On Thu, Jul 15 2021, Emily Shaffer wrote: >> >> >> >> > To enable fine-grained options which apply to a single hook executable, >> >> > and to make it easier for a single executable to be run on multiple hook >> >> > events, teach "hookcmd.<alias>.config". These can be configured as >> >> > follows: >> >> > [...] >> >> > diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt >> >> > index a97b980cca..5b35170664 100644 >> >> > --- a/Documentation/config/hook.txt >> >> > +++ b/Documentation/config/hook.txt >> >> > @@ -3,6 +3,11 @@ hook.<command>.command:: >> >> > executable on your device, a oneliner for your shell, or the name of a >> >> > hookcmd. See linkgit:git-hook[1]. >> >> > >> >> > +hookcmd.<name>.command:: >> >> > + A command to execute during a hook for which <name> has been specified >> >> > + as a command. This can be an executable on your device or a oneliner for >> >> > + your shell. See linkgit:git-hook[1]. >> >> > + >> >> > [...] >> >> > +Global config >> >> > +---- >> >> > + [hook "post-commit"] >> >> > + command = "linter" >> >> > + command = "~/typocheck.sh" >> >> > + >> >> > + [hookcmd "linter"] >> >> > + command = "/bin/linter --c" >> >> > +---- >> >> > + >> >> > +Local config >> >> > +---- >> >> > + [hook "prepare-commit-msg"] >> >> > + command = "linter" >> >> > + [hook "post-commit"] >> >> > + command = "python ~/run-test-suite.py" >> >> > +---- >> >> > + >> >> > +With these configs, you'd then run post-commit hooks in this order: >> >> > + >> >> > + /bin/linter --c >> >> > + ~/typocheck.sh >> >> > + python ~/run-test-suite.py >> >> > + .git/hooks/post-commit (if present) >> >> > + >> >> > +and prepare-commit-msg hooks in this order: >> >> > + >> >> > + /bin/linter --c >> >> > + .git/hooks/prepare-commit-msg (if present) >> >> > >> >> >> >> I still have outstanding feedback on the fundamental design >> >> here. I.e. why is this not: >> >> >> >> hook.<name>.event = post-commit >> >> hook.<name>.command = <path> >> >> >> >> See: >> >> >> >> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ >> >> >> >> As noted there I don't see why not, and more complexity results from the >> >> design choice of doing it the way you're proposing. I.e. we can't >> >> discover hooks based on config prefixes, and we end up sticking full FS >> >> paths in config keys. >> > >> > Interesting. My gut says that it would make it harder to neatly write a >> > config with the same hook running at the very beginning of one event and >> > the very end of another, but I'm not sure whether that's a case to >> > optimize for. >> > >> > I would also need twice as many lines to run a script I wrote as a hook >> > - that is, the base case which is probably all most people will need. So >> > with your proposal I *must* name every hook I want to add. Instead of >> > "hook.pre-commit.command = ~/check-for-debug-strings" I need to >> > configure "hook.check-debug-strings.event = pre-commit" and >> > "hook.check-debug-strings.command = ~/check-for-debug-strings". That's a >> > little frustrating, and if I never want to skip that check or move it >> > later or something, it will be extra overhead for no gain for me. >> >> The gain is that "git hook list" becomes a trivial "git config >> -show-origin --show-scope --get-regexp" wrapper. > > This isn't a very compelling reason to me, if it makes the user > experience worse. (I'm not convinced that it does - just saying I > disagree with this reasoning.) ...(see below)... >> >> So the series either doesn't need "git hook list" or such a thing >> becomes much less complex, especially given the proposed addition of >> other features in the area like "git hook edit", i.e. (quoting the >> linked E-Mail): > > I still think "git hook list" is useful to have for end-users, compared > to remembering the appropriate "git config" invocation. It futureproofs > us if we do want to change the execution ordering to something besides > config order, it's easier to remember/more discoverable, etc. To clarify: I'm not against there being a "git hook list", I think we should have it. It's useful UI. I.e. we have "run", it makes sense to have the same mechanism spew out its idea of what hooks are where, especially since that's not entirely contained in the config, but a union of config and GIT_DIR/hooks (and then there's core.hooksPath...). So I'm very much for it being there. I've just been commenting on the relative complexity of the config schema, which in one way surfaces as not being able to do the config part of the "git hook list" as a simple shell-out (or equivalent library invocation) of "git config" under the hood, but also means that users need to mentally track a more complex format when e.g. manually editing config... >> [...] >> But isn't the real difference not in the differing prefixes, i.e. hook.* >> and hookcmd.* (A) v.s. always hook.* (B, which is what I'm mainly >> focusing on, i.e. what requires the added complexity. >> >> But that in that in your proposed way of doing it it's: >> >> hook.<event-or-name>.* >> >> V.s. my suggestion of: >> >> hook.<name>.* >> > > No, I think that's a misunderstanding of proposal A. My proposal A is > always hook.<event>.*, and hookcmd.<name>.*, and the way that I read > your proposal B is that it's always hook.<name>.*. Ah, thanks. >> And thus whenever you have a <event-or-name> that just happens to be a >> built-in hook listed in githooks(5) we in (A) implicitly expand config >> like: >> >> hook.post-commit.command = echo foo >> >> >> hook.post-commit.command = echo hi >> hook.post-commit.type = post-commit > > This is where I'm becoming confused. I *think* you're saying that with > your proposal B, if the subsection happens to be the name of a built-in > hook, we can skip the .event field. But it also reads like you're saying > this is how my proposal A works, which isn't the case, because the > subsection is always assumed to be the name of the hook event we're > trying to run. I *think* this is best addressed below, moving on... >> >> But not knowing about "foo" we don't expand: >> >> hook.foo.command = echo foo >> >> >> hook.foo.command = echo hi >> hook.foo.type = foo # This would be an error, or ignored. >> >> But rather leave "dangling" for the user to later supply the "*.event" >> themselves, i.e.: >> >> hook.foo.command = echo hi >> hook.foo.event = post-commit > > If you're talking about proposal B, then no, this isn't how it works. > This config would imply that someone else (like a wrapper) can run "echo > hi" by asking for "git hook run foo". But I'm not sure exactly that > that's what you mean... ...and on (maybe).... >> >> And means that you presumably need to detect this case and error about >> it, but my proposed model does not: >> >> hook.post-commit.command = echo hi >> # User error: "*.type" and <event-or-name>" does not match for >> # "hook.*.command" >> hook.post-commit.type = pre-commit >> >> And furthermore, that if someone in your model were to do: >> >> hook.verify-commit.command = echo hi >> >> It's "dangling" today, but if a later version of git learns about a >> "verify-commit" hook we'll start running it unexpectedly. > > No, that's in fact as designed, with my model B. The user configured > "echo hi" to run on "verify-commit" events; if those events are > initially used by some wrapper, but later we decide they're a great idea > and absorb the verify-commit event into native Git, then this is working > as intended. I think your argument is based on a misunderstanding of the > design. > > Could it be that the way I provided the examples (my schema after A: and > your schema after B:) made it hard to parse? Sorry about that if so. Aren't you assuming that users who specify a verify-commit will be happy because git's usurping of that will 1=1 match what they were using "verify-commit" for. I'm pointing out that we can't know that, and since you want to make "git hook run" a general thing that runs any <name> of script you've configured, and not just what's in githooks(5) that it becomes very likely that if we add a new hook with some obvious name that we'll either break things for users, or subtly change behavior. Which isn't just theoretical, e.g. I tend to run something like a "git log --check @{u}.." before I run git-send-email, with this configurable hook mechanism having a "git hook run sendemail-check" would be a way I might expose that in my own ~/.gitconfig. But if git-send-email learns a "sendemail-check" and the behavior doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects me to return a status code to ask me if I want to abort on a failed --check, but mine expects a revision range to run "log --check". In practice that's a non-issue with the current hook mechanism, i.e. nobody's sticking a script into .git/hooks/my-custom-name and expecting it to do anything useful (and if they are, they have only themselves to blame). Whereas we'd now be actively inviting users to squat on the same namespace we ourselves will need for future hooks. >> >> Because your design conflates the slot for known hook type names and >> user-supplied names. >> >> On balance I think it's better just to always supply two lines per-hook, >> but whether we have this proposed shorthand or not is mostly orthogonal >> to everything I mentioned in >> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ >> >> I.e. my proposed version could also have it, but thinking about it I >> think it's not worth it, we should always use <name>, not >> <event-or-name> for the reasons noted if you'll read ahead... >> >> > #4 - Skip a script configured more globally >> > A: >> > (original config) >> > [hook "post-commit"] >> > command = /corp/stuff/corp-post-commit >> > (local config) >> > [hookcmd "/corp/stuff/corp-post-commit"] >> > skip = true >> > OR, >> > (original config) >> > [hookcmd "corp-pc"] >> > command = /corp/stuff/corp-post-commit >> > [hook "post-commit"] >> > command = corp-pc >> > (local config) >> > [hookcmd "corp-pc"] >> > skip = true >> > B: >> > (original config) >> > [hook "corp-pc"] >> > event = post-commit >> > command = /corp/stuff/corp-post-commit >> > (local config) >> > [hook "corp-pc"] >> > skip = true >> >> ...which are among other things that no, my proposed version doesn't >> have "skip" at all, see point #3 at >> https://lore.kernel.org/git/87mtv8fww3.fsf@evledraar.gmail.com/ >> >> I.e. I think the "skip" is a thing that falls out of a complexity of >> your design that I'm proposing to do away with. >> >> That complexity being that you use <event-or-name> and I use <name>, and >> you want to in turn support N number of "*.command" for any >> "hook.<event-or-name>". >> >> I'm suggesting we use the typical "last key wins" semantics, if you want >> multiple commands for a given hook type you'll just supply multiple >> "hook.<name>" sections with the same "hook.*.event = type" in all. >> >> The way to skip hooks in my proposal is: >> >> hook.<name>.command = true > > I see - instead of fiddly reorganizing the list, you're just changing > what the specific hook name is redirecting to. > > Then, instead of storing the resolved command in the 'struct hook', we > can store the name, and always dereference the name right before > execution, using last-one-wins semantics. > > The only thing I don't like about this is that it's a little bit > confusing to the user. I personally would rather have more complicated > code and less complicated user experience, so I like 'skip' more because > it's readable and because you don't need to guess "oh, it'll just > execute /bin/true which is a noop". That's easy to understand for > someone familiar with Unix scripting (and therefore for we Git > contributors), but confusing for everybody else. > > Maybe we can take the essential spirit of this implementation but make > it a little bit simpler. In pseudocode: > > populate_list(hookname): > # look for everybody who says "hook.*.event = $hookname" > hooks = get_configs_matching(hook.*.event, hookname) > # the "*" bit is the bit we dereference later > for hook in hooks: > hook_list.add({.name = hook.key.subsection}) > > run_hooks(hook_list): > for hook in hook_list: > # should we even run? > if (get_last_config("hook." + hook.name + ".skip")) > continue > > # dereference the hook name and find out the command > cmd = get_last_config("hook." + hook.name + ".command") > run(cmd) > > The price for this improved readability is an extra config lookup in > each case. But the code is still readable, definitely moreso than > rewalking the config list over and over as the implementation stands > now. I personally think it's worth it. I think the less complicated user experience is for our config system to work as consistently as possible with other config. If we define special rules and what's effectively special syntax with "skip" etc. we're asking users to read how that works, and keep that all in their head just to deal with that one area of our config. > (One could also imagine a more readable placeholder than /bin/true in > the "hook.<name>.command" field, but then we get to decide just how hard > we want to avoid colliding with legitimate user hook names. Is .command > = skip enough? is .command = RESERVED_NAME_SKIP_THIS_HOOK enough? Is > something unrunnable like .command = "skip me" enough, although it could > conceivably collide with a determined user?) All of that's something you'll need to explain in detail to users, which all seems way more complex than a simple: To skip a previously defined hook insert a noop-command, any will do, but setting it to "true" (usually /bin/true) is a handy convention for doing nothing. I.e. by keeping the config field as doing one thing only you avoid any such collisions etc. >> >> Or whatever noop command you want on the RHS. In practice we wouldn't >> invoke "true", just like we don't invoke "cat" for "PAGER=cat". >> >> But unlike "*.skip" that doesn't require complexity in implementation or >> user understanding, it just falls naturally out of the rest of the >> model. >> >> > #5 - Execute one script for multiple events >> > A: >> > [hookcmd "reusable-hook"] >> > command = /long/path/reusable-hook >> > [hook "post-commit"] >> > command = reusable-hook >> > [hook "pre-commit"] >> > command = reusable-hook >> > [hook "prepare-commit-msg"] >> > command = reusable-hook >> > B: >> > [hook "reusable-hook"] >> > command = /long/path/reusable-hook >> > event = post-commit >> > event = pre-commit >> > event = prepare-commit-msg >> >> It's been so long since I wrote the original E-Mail that I'm having to >> skim my own summary there & come up with things as I go along, so maybe >> this conflicts with something I said earlier. >> >> But no, I don't think I meant that *.event should be multi-value, or >> that that magic is worthwhile. I.e. I think we just want: >> >> [hook "not-a-reusable-section-1"] >> command = /long/path/reusable-hook >> event = post-commit >> [hook "not-a-reusable-section-2"] >> command = /long/path/reusable-hook >> event = pre-commit >> [hook "not-a-reusable-section-3"] >> command = /long/path/reusable-hook >> event = prepare-commit-msg >> >> I.e. is wanting to use the same command for N different hooks really >> that common a use-case to avoid a little verbosity, and quite a lot of >> complexity? >> >> How would such a hook even be implemented? >> >> We don't have anything in your hook execution model (and I don't >> remember you adding this) which passes down a "you are this type of >> hook" to hooks. > > No, but it's something I'm interested in passing as an environment > variable. I didn't add it to this series because it seemed to me to > distract from the core feature. We would like to add it to simplify our > invocations of https://github.com/awslabs/git-secrets, so it's on my > radar. Having such an env var as part of the initial series seems like a sensible thing to have. > And even without this, just today I wanted to configure a hook to really > make sure I had config.mak set up in all my Git worktrees (since you > pointed out I pushed code that fails -Wall.... ;) ) and set it up to run > on both pre-commit and post-checkout. So yes, it is feasible now, for > very stupid hooks. Aside: It's nice to use DEVELOPER=1 in config.mak (or as make param) when hacking on git, it'll set -Wall -Werror and other nice warning flags. > [...] >> > [...]Please feel free to chime in with more use cases that you think would >> > be handy which I forgot :) >> >> I couldn't find this at a quick glance but I think we also had a >> disussion at some point about hooks controlling parallelism. AFAICT your >> current implementation just has global: >> >> hook.jobs=N >> >> And we then blacklist certain hooks to always have hook.jobs=1, e.g. the >> commit-msg hook that needs an implicit "lock" on that file (or rather, >> we think that's the most common use-case). >> >> I think my version of always having hook.<name>.{event,command} be one >> value is also better in that case, i.e. we'd then: >> >> [hook "myhook"] >> command = some-command >> event = pre-receive >> parallel = true # the default >> >> [hook "myhook2"] >> command = some-command2 >> event = pre-receive >> parallel = true # the default >> >> [hook "myhook3"] >> command = some-unconcurrent-command >> event = pre-receive >> parallel = false # I'm not OK with concurrency > > I am not sure what it means for a single executable to write "parallel = > true" - it is a single executable. > > Ok, that is me being facetious - I think you are saying we can AND > together all of the 'hook.<thing-with-event-we-care-about>.parallel' to > decide whether or not to run in parallel. Right, the case (whatever the config mechanism) wanting to use several off-the-shelf hooks and accomplish through git some version of this: parallel -j8 pre-receive-parallel-*.sh && parallel -j1 pre-receive-non-parallel-*.sh I.e. since we have N scripts for the "pre-receive" type, and we're expecting to say whether on not parallelism is OK or not, it seems like a natural thing we'll want to declare that differently for some of those than for others. > I would rather not discuss this now, for this series, because regardless > of which config schema we use today, we can figure out "parallel unless > we really don't want it" later on. It is too complex to discuss in the > context of "hey, we should also configure hooks somewhere else". Let's > leave it for future work. The point is that no, we really can't figure it out as easily later on regardless of the config schema. Because with 1=many you can't have 1=many.someAttribute=XYZ without that *.someAttribute=XYZ declaring something for all of 1=many, whereas if it's 1=1 then 1=1.someAttribute.XYZ obviously applies only to that 1=1. >> I haven't thought about it deeply, but have a hunch that having sections >> be a 1=1 mapping with a specific command instead of either 1=1 or 1=many >> is going to make that easier to implement and understand. > > Sure, probably. But it can be added later to either schema... ;) > > > Anyway, I did a quick strawpoll with my colleagues (Jonathan Nieder, > Jonathan Tan, and Josh Steadmon) and they all like your syntax more. I > am ambivalent between the two, personally. > > So what I will do, hopefully today, maybe not, is the following: > > - no more hookcmd. (and there was much rejoicing) > - Hooks are specified with "[hook "name-of-hook"]" Thanks, I look forward to checking that out. As should be obvious from my misunderstanding of some of your config proposal (I snippet out some of those from the reply) I'm still not entirely clear on even what the current proposed behavior is, hopefully something simpler will be easier to grok ... :) > - I do see value in having an explicit .skip field rather than mapping > .command to a noop, so "hook.name-of-hook.skip" as described above. > Of course the method you described will work regardless, since its > mechanism is based on the inherent result of executing /bin/true. I think we've mainly focused on the theoretical aspect of this, but FWIW I'm still entirely unclear on what this feature is even aimed for. All of the rest of our config does not have an explicit "skip" for anything, just last-set-wins. In terms of a real-world use-case wouldn't a user just edit or comment out the config earlier in ~/.gitconfig, and not "skip" it at the end with "git config [...] --add"? I suspect that the use-case is some Googly centrally managed /etc/gitconfig, but that's just speculation... > - I do see value in allowing "hook.name-of-hook.event" to be defined > repeatably, as described above, so I will include that. > [...] > - figuring out what the heck we want to do with allowing hooks to > describe whether they allow parallelism Which just to check if we're on the same page, needs to be figured out because of the complexity of that "defined repeatably", no? > Unfortunately I've dropped the design doc from this series, since nobody > seemed interested in having it checked in, but I'll try to rework the > help doc to make the schema more clear. FWIW I wasn't opposed to it per-se, but remembered commenting on it in earlier rounds, and it being some combination of docs that should really be in manpages (i.e. describing current actual behavior), more general rational and "why is it like this" (which correctly belongs in design docs), and musings about hypothetical future features (e.g. "git hook edit" and the like)'.
On Thu, Aug 05, 2021 at 02:17:29AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> The gain is that "git hook list" becomes a trivial "git config > >> -show-origin --show-scope --get-regexp" wrapper. > > > > This isn't a very compelling reason to me, if it makes the user > > experience worse. (I'm not convinced that it does - just saying I > > disagree with this reasoning.) > > ...(see below)... > > >> > >> So the series either doesn't need "git hook list" or such a thing > >> becomes much less complex, especially given the proposed addition of > >> other features in the area like "git hook edit", i.e. (quoting the > >> linked E-Mail): > > > > I still think "git hook list" is useful to have for end-users, compared > > to remembering the appropriate "git config" invocation. It futureproofs > > us if we do want to change the execution ordering to something besides > > config order, it's easier to remember/more discoverable, etc. > > To clarify: I'm not against there being a "git hook list", I think we > should have it. > > It's useful UI. I.e. we have "run", it makes sense to have the same > mechanism spew out its idea of what hooks are where, especially since > that's not entirely contained in the config, but a union of config and > GIT_DIR/hooks (and then there's core.hooksPath...). > > So I'm very much for it being there. > > I've just been commenting on the relative complexity of the config > schema, which in one way surfaces as not being able to do the config > part of the "git hook list" as a simple shell-out (or equivalent library > invocation) of "git config" under the hood, but also means that users > need to mentally track a more complex format when e.g. manually editing > config... Sure, makes sense. > >> > >> And means that you presumably need to detect this case and error about > >> it, but my proposed model does not: > >> > >> hook.post-commit.command = echo hi > >> # User error: "*.type" and <event-or-name>" does not match for > >> # "hook.*.command" > >> hook.post-commit.type = pre-commit > >> > >> And furthermore, that if someone in your model were to do: > >> > >> hook.verify-commit.command = echo hi > >> > >> It's "dangling" today, but if a later version of git learns about a > >> "verify-commit" hook we'll start running it unexpectedly. > > > > No, that's in fact as designed, with my model B. The user configured > > "echo hi" to run on "verify-commit" events; if those events are > > initially used by some wrapper, but later we decide they're a great idea > > and absorb the verify-commit event into native Git, then this is working > > as intended. I think your argument is based on a misunderstanding of the > > design. > > > > Could it be that the way I provided the examples (my schema after A: and > > your schema after B:) made it hard to parse? Sorry about that if so. > > Aren't you assuming that users who specify a verify-commit will be happy > because git's usurping of that will 1=1 match what they were using > "verify-commit" for. > > I'm pointing out that we can't know that, and since you want to make > "git hook run" a general thing that runs any <name> of script you've > configured, and not just what's in githooks(5) that it becomes very > likely that if we add a new hook with some obvious name that we'll > either break things for users, or subtly change behavior. > > Which isn't just theoretical, e.g. I tend to run something like a "git > log --check @{u}.." before I run git-send-email, with this configurable > hook mechanism having a "git hook run sendemail-check" would be a way I > might expose that in my own ~/.gitconfig. > > But if git-send-email learns a "sendemail-check" and the behavior > doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects > me to return a status code to ask me if I want to abort on a failed > --check, but mine expects a revision range to run "log --check". > > In practice that's a non-issue with the current hook mechanism, > i.e. nobody's sticking a script into .git/hooks/my-custom-name and > expecting it to do anything useful (and if they are, they have only > themselves to blame). > > Whereas we'd now be actively inviting users to squat on the same > namespace we ourselves will need for future hooks. Yeah, this is a good point. Seems worth a note in the 'git hook run' doc, making a point that "you can use this for your wrapper to run specific hooks, but be careful about namespace collisions". We're a lot less likely to add a hook named "repotool-verify-commit" than we are to add a hook named "verify-commit", for example. I think it's enough to warn about future namespace collisions and make an "at your own risk" note. > > Maybe we can take the essential spirit of this implementation but make > > it a little bit simpler. In pseudocode: > > > > populate_list(hookname): > > # look for everybody who says "hook.*.event = $hookname" > > hooks = get_configs_matching(hook.*.event, hookname) > > # the "*" bit is the bit we dereference later > > for hook in hooks: > > hook_list.add({.name = hook.key.subsection}) > > > > run_hooks(hook_list): > > for hook in hook_list: > > # should we even run? > > if (get_last_config("hook." + hook.name + ".skip")) > > continue > > > > # dereference the hook name and find out the command > > cmd = get_last_config("hook." + hook.name + ".command") > > run(cmd) > > > > The price for this improved readability is an extra config lookup in > > each case. But the code is still readable, definitely moreso than > > rewalking the config list over and over as the implementation stands > > now. I personally think it's worth it. > > I think the less complicated user experience is for our config system to > work as consistently as possible with other config. If we define special > rules and what's effectively special syntax with "skip" etc. we're > asking users to read how that works, and keep that all in their head > just to deal with that one area of our config. > > > (One could also imagine a more readable placeholder than /bin/true in > > the "hook.<name>.command" field, but then we get to decide just how hard > > we want to avoid colliding with legitimate user hook names. Is .command > > = skip enough? is .command = RESERVED_NAME_SKIP_THIS_HOOK enough? Is > > something unrunnable like .command = "skip me" enough, although it could > > conceivably collide with a determined user?) > > All of that's something you'll need to explain in detail to users, which > all seems way more complex than a simple: > > To skip a previously defined hook insert a noop-command, any will > do, but setting it to "true" (usually /bin/true) is a handy > convention for doing nothing. > > I.e. by keeping the config field as doing one thing only you avoid any > such collisions etc. I disagree fundamentally that "find and run a noop command like /bin/true" is simpler to average users than "skip it by setting a config". Like I said below, by including "skip" both approaches will work. > > No, but it's something I'm interested in passing as an environment > > variable. I didn't add it to this series because it seemed to me to > > distract from the core feature. We would like to add it to simplify our > > invocations of https://github.com/awslabs/git-secrets, so it's on my > > radar. > > Having such an env var as part of the initial series seems like a > sensible thing to have. Eh. To me, it feels like feature creep. It also is something we could add today to the existing hook mechanism (even if it's a little pointless since you can basename, like you say), so it feels orthogonal. I would prefer not to add it in this series. > > > And even without this, just today I wanted to configure a hook to really > > make sure I had config.mak set up in all my Git worktrees (since you > > pointed out I pushed code that fails -Wall.... ;) ) and set it up to run > > on both pre-commit and post-checkout. So yes, it is feasible now, for > > very stupid hooks. > > Aside: It's nice to use DEVELOPER=1 in config.mak (or as make param) > when hacking on git, it'll set -Wall -Werror and other nice warning > flags. This is literally what I was saying I wrote the hook to do, and is also a tip I included when I wrote MyFirstContribution.txt. > > I am not sure what it means for a single executable to write "parallel = > > true" - it is a single executable. > > > > Ok, that is me being facetious - I think you are saying we can AND > > together all of the 'hook.<thing-with-event-we-care-about>.parallel' to > > decide whether or not to run in parallel. > > Right, the case (whatever the config mechanism) wanting to use several > off-the-shelf hooks and accomplish through git some version of this: > > parallel -j8 pre-receive-parallel-*.sh && > parallel -j1 pre-receive-non-parallel-*.sh > > I.e. since we have N scripts for the "pre-receive" type, and we're > expecting to say whether on not parallelism is OK or not, it seems like > a natural thing we'll want to declare that differently for some of those > than for others. > > > I would rather not discuss this now, for this series, because regardless > > of which config schema we use today, we can figure out "parallel unless > > we really don't want it" later on. It is too complex to discuss in the > > context of "hey, we should also configure hooks somewhere else". Let's > > leave it for future work. > > The point is that no, we really can't figure it out as easily later on > regardless of the config schema. > > Because with 1=many you can't have 1=many.someAttribute=XYZ without that > *.someAttribute=XYZ declaring something for all of 1=many, whereas if > it's 1=1 then 1=1.someAttribute.XYZ obviously applies only to that 1=1. I think this is moot, since we are moving to "all config hooks have a name", but my plan previously was to let this be set on a hookcmd. Essentially, your suggestion is to make every hook a hookcmd. My point was that it's easy to extend [object which represents an executable] in the config to include "always run me in series" or "run me in series for this specific event" regardless. That is, one could imagine, discarding entirely the hookcmd junk and going with the schema I sketched in my last email (which lands somewhere between yours and mine): [hook "linter"] command = ~/linter.sh event = pre-commit parallel = false or... [hook "linter"] command = ~/linter.sh event = pre-commit event = commit-msg [hook "linter.commit-msg"] parallel = false Or even... [hook "linter"] command = ~/linter.sh event = pre-commit event = commit-msg parallel = commit-msg The possibilities go on, as far as configuration goes. To me, the harder part of this problem is actually implementing the execution. We had some discussions at length early on in the config-based hook series about ways to do this kind of complex "some stuff needs synchronous execution and some stuff doesn't, in the same event" and decided that it mostly resolved to "you ain't gonna need it" principle. So I would prefer to discuss this when we find out we do actually need it. > > >> I haven't thought about it deeply, but have a hunch that having sections > >> be a 1=1 mapping with a specific command instead of either 1=1 or 1=many > >> is going to make that easier to implement and understand. > > > > Sure, probably. But it can be added later to either schema... ;) > > > > > > Anyway, I did a quick strawpoll with my colleagues (Jonathan Nieder, > > Jonathan Tan, and Josh Steadmon) and they all like your syntax more. I > > am ambivalent between the two, personally. > > > > So what I will do, hopefully today, maybe not, is the following: > > > > - no more hookcmd. (and there was much rejoicing) > > - Hooks are specified with "[hook "name-of-hook"]" > > Thanks, I look forward to checking that out. As should be obvious from > my misunderstanding of some of your config proposal (I snippet out some > of those from the reply) I'm still not entirely clear on even what the > current proposed behavior is, hopefully something simpler will be easier > to grok ... :) Fingers crossed ;) > > > - I do see value in having an explicit .skip field rather than mapping > > .command to a noop, so "hook.name-of-hook.skip" as described above. > > Of course the method you described will work regardless, since its > > mechanism is based on the inherent result of executing /bin/true. > > I think we've mainly focused on the theoretical aspect of this, but FWIW > I'm still entirely unclear on what this feature is even aimed for. > > All of the rest of our config does not have an explicit "skip" for > anything, just last-set-wins. In terms of a real-world use-case wouldn't > a user just edit or comment out the config earlier in ~/.gitconfig, and > not "skip" it at the end with "git config [...] --add"? > > I suspect that the use-case is some Googly centrally managed > /etc/gitconfig, but that's just speculation... Yep, this is exactly why. We've talked often on-list about how we ship and configure Git for Googlers, but the upshot is "we pack up 'next' and also ship an '/etc/gitconfig'". But I can also think of one really basic scenario when I'd want to skip a hook in one repo without just commenting out my ~/.gitconfig: the Gerrit Change-Id hook. Gerrit requires all commit messages to contain this Change-Id: abc123 footer. It adds the footer by way of a commit-msg hook. That hook works the same regardless of what your Gerrit remote is, so you can run the same script on any project that uses Gerrit for code review. If, as I have in the past, the vast majority of my projects use Gerrit, but I have one project which does not, then I would love to configure the Gerrit Change-Id hook globally and un-configure it for my one non-Gerrit project. (At that time, I maintained a subsystem in a project based on Yocto, so I needed to regularly contribute to 5-10 projects, all but one of which used Gerrit. The one non-Gerrit one used a mailing list. I also had a hobby project and my dotfiles, neither of which used Gerrit. This is not an uncommon use case.) > > > - I do see value in allowing "hook.name-of-hook.event" to be defined > > repeatably, as described above, so I will include that. > > [...] > > - figuring out what the heck we want to do with allowing hooks to > > describe whether they allow parallelism > > Which just to check if we're on the same page, needs to be figured out > because of the complexity of that "defined repeatably", no? As I wrote above, I think this falls under YAGNI, to be honest, and I don't think we've painted ourselves into a corner if we do end up needing it. I think the semantical complexity is a harder problem than the syntactical complexity, and I'd rather just punt it ;) > > > Unfortunately I've dropped the design doc from this series, since nobody > > seemed interested in having it checked in, but I'll try to rework the > > help doc to make the schema more clear. > > FWIW I wasn't opposed to it per-se, but remembered commenting on it in > earlier rounds, and it being some combination of docs that should really > be in manpages (i.e. describing current actual behavior), more general > rational and "why is it like this" (which correctly belongs in design > docs), and musings about hypothetical future features (e.g. "git hook > edit" and the like)'. Yeah, that's pretty typical of design docs at places I've worked now and in the past. I don't have a big problem with not checking it in - those kinds of docs are most valuable during the early design stage, and so I think that doc has now served its purpose. - Emily
On Thu, Aug 05 2021, Emily Shaffer wrote: > On Thu, Aug 05, 2021 at 02:17:29AM +0200, Ævar Arnfjörð Bjarmason wrote: >> [...] >> > No, that's in fact as designed, with my model B. The user configured >> > "echo hi" to run on "verify-commit" events; if those events are >> > initially used by some wrapper, but later we decide they're a great idea >> > and absorb the verify-commit event into native Git, then this is working >> > as intended. I think your argument is based on a misunderstanding of the >> > design. >> > >> > Could it be that the way I provided the examples (my schema after A: and >> > your schema after B:) made it hard to parse? Sorry about that if so. >> >> Aren't you assuming that users who specify a verify-commit will be happy >> because git's usurping of that will 1=1 match what they were using >> "verify-commit" for. >> >> I'm pointing out that we can't know that, and since you want to make >> "git hook run" a general thing that runs any <name> of script you've >> configured, and not just what's in githooks(5) that it becomes very >> likely that if we add a new hook with some obvious name that we'll >> either break things for users, or subtly change behavior. >> >> Which isn't just theoretical, e.g. I tend to run something like a "git >> log --check @{u}.." before I run git-send-email, with this configurable >> hook mechanism having a "git hook run sendemail-check" would be a way I >> might expose that in my own ~/.gitconfig. >> >> But if git-send-email learns a "sendemail-check" and the behavior >> doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects >> me to return a status code to ask me if I want to abort on a failed >> --check, but mine expects a revision range to run "log --check". >> >> In practice that's a non-issue with the current hook mechanism, >> i.e. nobody's sticking a script into .git/hooks/my-custom-name and >> expecting it to do anything useful (and if they are, they have only >> themselves to blame). >> >> Whereas we'd now be actively inviting users to squat on the same >> namespace we ourselves will need for future hooks. > > Yeah, this is a good point. Seems worth a note in the 'git hook run' > doc, making a point that "you can use this for your wrapper to run > specific hooks, but be careful about namespace collisions". We're a lot > less likely to add a hook named "repotool-verify-commit" than we are to > add a hook named "verify-commit", for example. > > I think it's enough to warn about future namespace collisions and make > an "at your own risk" note. I might have lost track at this point, but later examples in this E-Mail you show don't seem to require such a note. I.e. it's only an issue if we conflate a semantically meaningful slot like "pre-commit" in the config with one that can also have the meaning of simply defining an arbitrary user-decided name. There's no such collision if the config uses e.g. hook.mycheck.event=pre-commit & hook.mycheck.command=mycmd, as opposed to hook.pre-commit.command=mycmd. On the specifics of that example: I don't really care about the bikeshedding of the config key naming specifics, just the semantics of not putting user defined names and hook type names in the same slot if we can avoid it. >> > No, but it's something I'm interested in passing as an environment >> > variable. I didn't add it to this series because it seemed to me to >> > distract from the core feature. We would like to add it to simplify our >> > invocations of https://github.com/awslabs/git-secrets, so it's on my >> > radar. >> >> Having such an env var as part of the initial series seems like a >> sensible thing to have. > > Eh. To me, it feels like feature creep. It also is something we could > add today to the existing hook mechanism (even if it's a little > pointless since you can basename, like you say), so it feels orthogonal. > I would prefer not to add it in this series. Sure, I guess you can add two hook sections to replace e.g. your {pre,post}-receive hook (which are commonly routed to the same script with file-based hooks). Having a single setenv() seems easy enough, and I'd bet a way more common use-case than wanting to skip earlier defined hooks... >> > I am not sure what it means for a single executable to write "parallel = >> > true" - it is a single executable. >> > >> > Ok, that is me being facetious - I think you are saying we can AND >> > together all of the 'hook.<thing-with-event-we-care-about>.parallel' to >> > decide whether or not to run in parallel. >> >> Right, the case (whatever the config mechanism) wanting to use several >> off-the-shelf hooks and accomplish through git some version of this: >> >> parallel -j8 pre-receive-parallel-*.sh && >> parallel -j1 pre-receive-non-parallel-*.sh >> >> I.e. since we have N scripts for the "pre-receive" type, and we're >> expecting to say whether on not parallelism is OK or not, it seems like >> a natural thing we'll want to declare that differently for some of those >> than for others. >> >> > I would rather not discuss this now, for this series, because regardless >> > of which config schema we use today, we can figure out "parallel unless >> > we really don't want it" later on. It is too complex to discuss in the >> > context of "hey, we should also configure hooks somewhere else". Let's >> > leave it for future work. >> >> The point is that no, we really can't figure it out as easily later on >> regardless of the config schema. >> >> Because with 1=many you can't have 1=many.someAttribute=XYZ without that >> *.someAttribute=XYZ declaring something for all of 1=many, whereas if >> it's 1=1 then 1=1.someAttribute.XYZ obviously applies only to that 1=1. > > I think this is moot, since we are moving to "all config hooks have a > name", but my plan previously was to let this be set on a hookcmd. > Essentially, your suggestion is to make every hook a hookcmd. My point > was that it's easy to extend [object which represents an executable] in > the config to include "always run me in series" or "run me in series for > this specific event" regardless. That is, one could imagine, discarding > entirely the hookcmd junk and going with the schema I sketched in my > last email (which lands somewhere between yours and mine): Just to be clear, I don't have any concrete suggestion in mind right now (actually as I write this I can only vaguely recall what I suggested before). What I have been suggesting is not any specific implementation, but that we have a bias for the simple over the complex for an initial implementation. Complexity can always be added later, whereas coming up with a config schema that's irregular compared to other existing config in git is something we might regret sooner than later. > [hook "linter"] > command = ~/linter.sh > event = pre-commit > parallel = false > > or... > > [hook "linter"] > command = ~/linter.sh > event = pre-commit > event = commit-msg > > [hook "linter.commit-msg"] > parallel = false > > Or even... > > [hook "linter"] > command = ~/linter.sh > event = pre-commit > event = commit-msg > parallel = commit-msg > > The possibilities go on, as far as configuration goes. > > To me, the harder part of this problem is actually implementing the > execution. We had some discussions at length early on in the > config-based hook series about ways to do this kind of complex "some > stuff needs synchronous execution and some stuff doesn't, in the same > event" and decided that it mostly resolved to "you ain't gonna need it" > principle. So I would prefer to discuss this when we find out we do > actually need it. What I was mainly going for with "we really can't figure it out as easily later" above was not that this tweaking of "jobs" or parallelism was essential per-hook. But that it was a handy shorthand for a config attribute you might want to define for hooks, and having what are effectively groups of hooks, with N "command" or "event" in one section might make things more complex once you'd want to define optional attributes for one of those commands or events. >> [...] >> > - I do see value in having an explicit .skip field rather than mapping >> > .command to a noop, so "hook.name-of-hook.skip" as described above. >> > Of course the method you described will work regardless, since its >> > mechanism is based on the inherent result of executing /bin/true. >> >> I think we've mainly focused on the theoretical aspect of this, but FWIW >> I'm still entirely unclear on what this feature is even aimed for. >> >> All of the rest of our config does not have an explicit "skip" for >> anything, just last-set-wins. In terms of a real-world use-case wouldn't >> a user just edit or comment out the config earlier in ~/.gitconfig, and >> not "skip" it at the end with "git config [...] --add"? >> >> I suspect that the use-case is some Googly centrally managed >> /etc/gitconfig, but that's just speculation... > > Yep, this is exactly why. We've talked often on-list about how we ship > and configure Git for Googlers, but the upshot is "we pack up 'next' and > also ship an '/etc/gitconfig'". > > But I can also think of one really basic scenario when I'd want to skip > a hook in one repo without just commenting out my ~/.gitconfig: the > Gerrit Change-Id hook. > > Gerrit requires all commit messages to contain this Change-Id: abc123 > footer. It adds the footer by way of a commit-msg hook. That hook works > the same regardless of what your Gerrit remote is, so you can run the > same script on any project that uses Gerrit for code review. If, as I have in > the past, the vast majority of my projects use Gerrit, but I have one > project which does not, then I would love to configure the Gerrit > Change-Id hook globally and un-configure it for my one non-Gerrit > project. > > (At that time, I maintained a subsystem in a project based on Yocto, so > I needed to regularly contribute to 5-10 projects, all but one of which > used Gerrit. The one non-Gerrit one used a mailing list. I also had a > hobby project and my dotfiles, neither of which used Gerrit. This is not > an uncommon use case.) > I disagree fundamentally that "find and run a noop command like > /bin/true" is simpler to average users than "skip it by setting a > config". Like I said below, by including "skip" both approaches will > work. In reply to this, and moving things around a bit in the reply: >> All of that's something you'll need to explain in detail to users, which >> all seems way more complex than a simple: >> >> To skip a previously defined hook insert a noop-command, any will >> do, but setting it to "true" (usually /bin/true) is a handy >> convention for doing nothing. >> >> I.e. by keeping the config field as doing one thing only you avoid any >> such collisions etc. > > I disagree fundamentally that "find and run a noop command like > /bin/true" is simpler to average users than "skip it by setting a > config". Like I said below, by including "skip" both approaches will > work. To clarify, I haven't been advocating for that "skip = true" convention because I think it's a sensible thing per-se, but that I think this use-case is something that an individual configurable feature in git doesn't need stateful syntax to deal with. We have any number of multi-value and single-value config within git. I just don't see why on balance hooks need a special syntax to skip earlier set config for hooks specifically. E.g. this gerrit example would also be true of someone in a corporate setting using git-send-email, and wanting a list of sendemail.cc on all but their dotfiles project, or one other non-work project. Does that mean we need a sendemail.skipCC and special handling for it in git-send-email.perl? No, I think we'd generally advice users to just put those projects under ~/work or whatever, and then use config includes to set config for that group of projects based on the path: [includeIf "gitdir:~/work/"] path = ~/.gitconfig.d/work Or, if a hook is really so special that it's needed everywhere define it in /etc/gitconfig, and then just make the hook itself do: if test "$(git config --bool googleHooks.disableOurGlobalHook)" = "true" then exit 0 fi Which is pretty much (with the hook.* config prefix) how we've adviced users to do this since approximately forever with the sample hooks we ship. The advantage of using includes in that way is e.g. that you can easily see how your hook came to be configured with: git config --list --show-origin I.e. that (by convention) it comes via a conditionally included ~/.gitconfig.d/gerrit file. If it's a multi-value like sendemail.cc the semantics are also clear, e.g. you can get all values we'll use with "git config --get-all". Whereas choosing to implement this with something that *looks* like a config keyword, but really isn't is just confusing. We need to explain in one way how users might arrange for the likes of sendemail.cc to be defined for some, but not all of their repos, and explain it differently when it comes to hooks. There's inherent value in that explanation being the same for both.
On Fri, Aug 06, 2021 at 12:26:02AM +0200, Ævar Arnfjörð Bjarmason wrote: > > > On Thu, Aug 05 2021, Emily Shaffer wrote: > > > On Thu, Aug 05, 2021 at 02:17:29AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> [...] > >> > No, that's in fact as designed, with my model B. The user configured > >> > "echo hi" to run on "verify-commit" events; if those events are > >> > initially used by some wrapper, but later we decide they're a great idea > >> > and absorb the verify-commit event into native Git, then this is working > >> > as intended. I think your argument is based on a misunderstanding of the > >> > design. > >> > > >> > Could it be that the way I provided the examples (my schema after A: and > >> > your schema after B:) made it hard to parse? Sorry about that if so. > >> > >> Aren't you assuming that users who specify a verify-commit will be happy > >> because git's usurping of that will 1=1 match what they were using > >> "verify-commit" for. > >> > >> I'm pointing out that we can't know that, and since you want to make > >> "git hook run" a general thing that runs any <name> of script you've > >> configured, and not just what's in githooks(5) that it becomes very > >> likely that if we add a new hook with some obvious name that we'll > >> either break things for users, or subtly change behavior. > >> > >> Which isn't just theoretical, e.g. I tend to run something like a "git > >> log --check @{u}.." before I run git-send-email, with this configurable > >> hook mechanism having a "git hook run sendemail-check" would be a way I > >> might expose that in my own ~/.gitconfig. > >> > >> But if git-send-email learns a "sendemail-check" and the behavior > >> doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects > >> me to return a status code to ask me if I want to abort on a failed > >> --check, but mine expects a revision range to run "log --check". > >> > >> In practice that's a non-issue with the current hook mechanism, > >> i.e. nobody's sticking a script into .git/hooks/my-custom-name and > >> expecting it to do anything useful (and if they are, they have only > >> themselves to blame). > >> > >> Whereas we'd now be actively inviting users to squat on the same > >> namespace we ourselves will need for future hooks. > > > > Yeah, this is a good point. Seems worth a note in the 'git hook run' > > doc, making a point that "you can use this for your wrapper to run > > specific hooks, but be careful about namespace collisions". We're a lot > > less likely to add a hook named "repotool-verify-commit" than we are to > > add a hook named "verify-commit", for example. > > > > I think it's enough to warn about future namespace collisions and make > > an "at your own risk" note. > > I might have lost track at this point, but later examples in this E-Mail > you show don't seem to require such a note. > > I.e. it's only an issue if we conflate a semantically meaningful slot > like "pre-commit" in the config with one that can also have the meaning > of simply defining an arbitrary user-decided name. No, I don't think that's the case. The examples I used later in the mail don't have anything to do with this "wrapper wants to use 'git hooks run'" case. Let's imagine a current commonly-used wrapper, 'repo' (https://gerrit.googlesource.com/git-repo/+/refs/heads/master/README.md). This wrapper is used to manage multiple Git repos into something semantically similar to submodules. Let's say that 'repo' decides that instead of rolling its own hook system, it would like to use Git's native hook system. Let's say that it wants to call user-defined hooks any time someone has just completed running 'repo sync'. It could ask users to define a config like so: [hook "myrepohook"] event = post-sync command = ~/some/downloaded/myrepohook --with --args Then 'repo' itself can, at the end of its 'repo sync' implementation, call 'git hook run post-sync' (instead of using some other hook specification schema). But Git itself has a 'git submodule sync' command, and let's imagine that we decided we want to run some user-defined hook at the end of 'git submodule sync'. Naturally, we can call it 'post-sync', and then we will accidentally invoke the hooks which someone configured for 'repo' - as you pointed out would be a concern with my design, this will *also* be a concern with your design. This specific use case is what I think we can get around with some "at your own risk" documentation, because we are much less likely to collide with someone asking their users to configure like so: [hook "myrepohook"] event = repo-tool-post-sync command = ~/some/downloaded/myrepohook --with-args > >> > No, but it's something I'm interested in passing as an environment > >> > variable. I didn't add it to this series because it seemed to me to > >> > distract from the core feature. We would like to add it to simplify our > >> > invocations of https://github.com/awslabs/git-secrets, so it's on my > >> > radar. > >> > >> Having such an env var as part of the initial series seems like a > >> sensible thing to have. > > > > Eh. To me, it feels like feature creep. It also is something we could > > add today to the existing hook mechanism (even if it's a little > > pointless since you can basename, like you say), so it feels orthogonal. > > I would prefer not to add it in this series. > > Sure, I guess you can add two hook sections to replace e.g. your > {pre,post}-receive hook (which are commonly routed to the same script > with file-based hooks). Having a single setenv() seems easy enough, and > I'd bet a way more common use-case than wanting to skip earlier defined > hooks... More than anything else, I think this comment convinced me that the skip config also falls under YAGNI and I can drop it (and we can add it later if "hook.myhook.command = true" isn't clear enough). I snipped the rest because I don't disagree or have anything further to say. Look for a reroll this weekend, we hope. - Emily
diff --git a/Documentation/config/hook.txt b/Documentation/config/hook.txt index a97b980cca..5b35170664 100644 --- a/Documentation/config/hook.txt +++ b/Documentation/config/hook.txt @@ -3,6 +3,11 @@ hook.<command>.command:: executable on your device, a oneliner for your shell, or the name of a hookcmd. See linkgit:git-hook[1]. +hookcmd.<name>.command:: + A command to execute during a hook for which <name> has been specified + as a command. This can be an executable on your device or a oneliner for + your shell. See linkgit:git-hook[1]. + hook.jobs:: Specifies how many hooks can be run simultaneously during parallelized hook execution. If unspecified, defaults to the number of processors on diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt index 8e2ab724e8..1a4d22fd90 100644 --- a/Documentation/git-hook.txt +++ b/Documentation/git-hook.txt @@ -18,10 +18,44 @@ This command is an interface to git hooks (see linkgit:githooks[5]). Currently it only provides a convenience wrapper for running hooks for use by git itself. In the future it might gain other functionality. -This command parses the default configuration files for sections like `hook -"<hook name>"`. `hook` is used to describe the commands which will be run during -a particular hook event; commands are run in the order Git encounters them -during the configuration parse (see linkgit:git-config[1]). +This command parses the default configuration files for sections `hook` and +`hookcmd`. `hook` is used to describe the commands which will be run during a +particular hook event; commands are run in the order Git encounters them during +the configuration parse (see linkgit:git-config[1]). `hookcmd` is used to +describe attributes of a specific command. If additional attributes don't need +to be specified, a command to run can be specified directly in the `hook` +section; if a `hookcmd` by that name isn't found, Git will attempt to run the +provided value directly. For example: + +Global config +---- + [hook "post-commit"] + command = "linter" + command = "~/typocheck.sh" + + [hookcmd "linter"] + command = "/bin/linter --c" +---- + +Local config +---- + [hook "prepare-commit-msg"] + command = "linter" + [hook "post-commit"] + command = "python ~/run-test-suite.py" +---- + +With these configs, you'd then run post-commit hooks in this order: + + /bin/linter --c + ~/typocheck.sh + python ~/run-test-suite.py + .git/hooks/post-commit (if present) + +and prepare-commit-msg hooks in this order: + + /bin/linter --c + .git/hooks/prepare-commit-msg (if present) In general, when instructions suggest adding a script to `.git/hooks/<something>`, you can specify it in the config instead by running diff --git a/hook.c b/hook.c index b08b876d5d..21904d90f6 100644 --- a/hook.c +++ b/hook.c @@ -184,7 +184,14 @@ static int hook_config_lookup(const char *key, const char *value, void *cb_data) /* TODO: implement skipping hooks */ - /* TODO: immplement hook aliases */ + /* + * Check if a hookcmd with that name exists. If it doesn't, + * 'git_config_get_value()' is documented not to touch &command, + * so we don't need to do anything. + */ + strbuf_reset(&hookcmd_name); + strbuf_addf(&hookcmd_name, "hookcmd.%s.command", command); + git_config_get_value(hookcmd_name.buf, &command); /* * TODO: implement an option-getting callback, e.g. diff --git a/t/t1360-config-based-hooks.sh b/t/t1360-config-based-hooks.sh index e4a7b06ad1..50ee824f05 100755 --- a/t/t1360-config-based-hooks.sh +++ b/t/t1360-config-based-hooks.sh @@ -18,6 +18,11 @@ setup_hooks () { test_config_global hook.pre-commit.command "/path/def" --add } +setup_hookcmd () { + test_config hook.pre-commit.command "abc" --add + test_config_global hookcmd.abc.command "/path/abc" --add +} + setup_hookdir () { mkdir .git/hooks write_script .git/hooks/pre-commit <<-EOF @@ -59,6 +64,20 @@ test_expect_success 'git hook list orders by config order' ' test_cmp expected actual ' +test_expect_success 'git hook list dereferences a hookcmd' ' + setup_hooks && + setup_hookcmd && + + cat >expected <<-EOF && + $ROOT/path/def + $ROOT/path/ghi + $ROOT/path/abc + EOF + + git hook list pre-commit >actual && + test_cmp expected actual +' + test_expect_success 'git hook list reorders on duplicate commands' ' setup_hooks &&
To enable fine-grained options which apply to a single hook executable, and to make it easier for a single executable to be run on multiple hook events, teach "hookcmd.<alias>.config". These can be configured as follows: [hookcmd.my-linter] command = ~/my-linter.sh [hook.pre-commit] command = my-linter During the config parse, we can attempt to dereference the 'hook.pre-commit.command' string 'my-linter' and check if it matches any hookcmd names; if so, we can run the command associated with that hookcmd alias instead. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- Documentation/config/hook.txt | 5 +++++ Documentation/git-hook.txt | 42 +++++++++++++++++++++++++++++++---- hook.c | 9 +++++++- t/t1360-config-based-hooks.sh | 19 ++++++++++++++++ 4 files changed, 70 insertions(+), 5 deletions(-)