[v2,0/7] Multiple hook support

Message ID	20190514002332.121089-1-sandals@crustytoothpaste.net (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> From: "brian m. carlson" <sandals@crustytoothpaste.net> To: <git@vger.kernel.org> Cc: Jeff King <peff@peff.net>, Duy Nguyen <pclouds@gmail.com>, Johannes Schindelin <Johannes.Schindelin@gmx.de>, Junio C Hamano <gitster@pobox.com>, Johannes Sixt <j6t@kdbg.org>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= <avarab@gmail.com>, Phillip Wood <phillip.wood123@gmail.com>, Jonathan Nieder <jrnieder@gmail.com> Subject: [PATCH v2 0/7] Multiple hook support Date: Tue, 14 May 2019 00:23:24 +0000 Message-Id: <20190514002332.121089-1-sandals@crustytoothpaste.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Multiple hook support \| expand [v2,0/7] Multiple hook support [v2,1/7] run-command: add preliminary support for multiple hooks [v2,2/7] builtin/receive-pack: add support for multiple hooks [v2,3/7] rebase: add support for multiple hooks [v2,4/7] builtin/worktree: add support for multiple post-checkout hooks [v2,5/7] transport: add support for multiple pre-push hooks [v2,6/7] config: allow configuration of multiple hook error behavior [v2,7/7] docs: document multiple hooks

brian m. carlson May 14, 2019, 12:23 a.m. UTC

This series introduces multiple hook support.

I've thought a lot about the discussion over whether this series should
use the configuration as the source for multiple hooks. Ultimately, I've
come to the decision that it's not a good idea. Even adopting the empty
entry as a reset marker, the fact that inheritance in the configuration
is in-order and can't be easily modified means that it's not likely to
be very useful, but it is likely to be quite surprising for the average
user. I think a solution that sticks with the existing model and builds
off a design used by other systems people are familiar with, like cron
and run-parts, is going to be a better choice. Moreover, this is the
design that people have already built with outside tooling, which is a
further argument in favor of it.

I have adopted one configuration-based option, which is the per-hook
errorBehavior option that Peff suggested. I think this reduces concerns
over what the best error handling strategy is and is a good thing to
have as part of a minimum viable product. I picked the names that Peff
chose, but if people like different names better, they can be changed.

Just as a preview of what's coming down the line, I plan to build on
this series to notify hooks when --quiet and --dry-run options have been
specified to commands so that they may honor them if they choose.

Changes from v1:
* Adopted several improvements from Duy's series, including an improved
  find_hooks prototype and a helper function.
* Switched to existence checks instead of executability checks for
  determining whether to invoke multiple hooks.
* Adjusted the commit message for patch 3.
* Added error behavior control using the names Peff provided in his
  comment.
* Added documentation.

brian m. carlson (7):
  run-command: add preliminary support for multiple hooks
  builtin/receive-pack: add support for multiple hooks
  rebase: add support for multiple hooks
  builtin/worktree: add support for multiple post-checkout hooks
  transport: add support for multiple pre-push hooks
  config: allow configuration of multiple hook error behavior
  docs: document multiple hooks

 Documentation/config.txt           |   2 +
 Documentation/config/hook.txt      |  19 ++
 Documentation/githooks.txt         |   9 +
 builtin/am.c                       |  20 +--
 builtin/commit.c                   |   2 +-
 builtin/receive-pack.c             |  78 ++++----
 builtin/worktree.c                 |  44 +++--
 config.c                           |  27 +++
 run-command.c                      | 175 +++++++++++++++---
 run-command.h                      |  22 ++-
 sequencer.c                        |  59 ++++---
 sequencer.h                        |   2 +
 t/lib-hooks.sh                     | 274 +++++++++++++++++++++++++++++
 t/t5403-post-checkout-hook.sh      |   8 +
 t/t5407-post-rewrite-hook.sh       |  15 ++
 t/t5516-fetch-push.sh              |  30 ++++
 t/t5571-pre-push-hook.sh           |  19 ++
 t/t7503-pre-commit-hook.sh         |  15 ++
 t/t7505-prepare-commit-msg-hook.sh |   9 +
 transport.c                        |  29 +--
 20 files changed, 730 insertions(+), 128 deletions(-)
 create mode 100644 Documentation/config/hook.txt
 create mode 100644 t/lib-hooks.sh

Jonathan Nieder May 14, 2019, 12:51 a.m. UTC | #1

Hi,

brian m. carlson wrote:

> I've thought a lot about the discussion over whether this series should
> use the configuration as the source for multiple hooks. Ultimately, I've
> come to the decision that it's not a good idea. Even adopting the empty
> entry as a reset marker, the fact that inheritance in the configuration
> is in-order and can't be easily modified means that it's not likely to
> be very useful, but it is likely to be quite surprising for the average
> user.

Can we discuss this some more?  What would it take to make it likely
to be useful in your view?

> I think a solution that sticks with the existing model and builds
> off a design used by other systems people are familiar with, like cron
> and run-parts, is going to be a better choice. Moreover, this is the
> design that people have already built with outside tooling, which is a
> further argument in favor of it.

To be clear, the main advantage I see for config versus the .git/hooks
model is that with the config model, a user doesn't have to search
throughout the filesystem for .git/hooks directories to update when a
hook is broken.

There are other models that similarly fix that --- e.g. putting hooks
in /etc.  But as long as they're in .git/hooks, I think we're digging
ourselves deeper into the same hole.

Thanks,
Jonathan

brian m. carlson May 14, 2019, 1:59 a.m. UTC | #2

On Mon, May 13, 2019 at 05:51:01PM -0700, Jonathan Nieder wrote:
> Hi,
> 
> brian m. carlson wrote:
> 
> > I've thought a lot about the discussion over whether this series should
> > use the configuration as the source for multiple hooks. Ultimately, I've
> > come to the decision that it's not a good idea. Even adopting the empty
> > entry as a reset marker, the fact that inheritance in the configuration
> > is in-order and can't be easily modified means that it's not likely to
> > be very useful, but it is likely to be quite surprising for the average
> > user.
> 
> Can we discuss this some more?  What would it take to make it likely
> to be useful in your view?

There are two aspects here which I think are worth discussing. Let's
discuss the inheritance issue first.

Order with multiple hooks matters. The best hook as an example for this
is prepare-commit-msg. If I have a hook which I want to run on every
repository, such as a hook that inserts some sort of ID (bug tracker,
Gerrit, etc.), that hook, due to inheritance, *has* to be first, before
any other prepare-commit-msg hooks. If I want a hook that runs before
it, perhaps because a particular repository needs a different
configuration, I have to wipe the list and insert both hooks. I'm now
maintaining two separate locations for the command lines instead of just
inserting a symlink to the global hook and dropping a new hook before
it.

I don't think there's a good way to make it easier unless we radically
customize the way configuration is done. I don't doubt that there are a
small number of configurations where the inheritance behavior is fine—I
believe GitHub's backend is one of them—but overall I think it's hard to
reason about and customize.

The second issue here is that it's surprising. Users don't know how to
reset a configuration option because we don't have a consistent way to
do that. Users will not expect for there to be multiple ways to set
hooks. Users will also not expect that their hooks in their
configuration aren't run if there are hooks in .git/hooks. Tooling that
has so far used .git/hooks will compete with users' global configuration
options, which I guarantee you will be a surprise for users using older
versions of tools.

The new behavior, which puts everything in the same directory
(.git/hooks) is much easier to reason about. I think we're obligated to
consider the experience for the average end user, who may not be
intimately familiar with how Git works but still needs to use it to get
work done.

It also provides a convenient place for hooks to live, which a
config-based option doesn't. We'll need to invoke things using /bin/sh,
so will they all have to live in PATH? What about one-offs that don't
really belong in PATH?

> > I think a solution that sticks with the existing model and builds
> > off a design used by other systems people are familiar with, like cron
> > and run-parts, is going to be a better choice. Moreover, this is the
> > design that people have already built with outside tooling, which is a
> > further argument in favor of it.
> 
> To be clear, the main advantage I see for config versus the .git/hooks
> model is that with the config model, a user doesn't have to search
> throughout the filesystem for .git/hooks directories to update when a
> hook is broken.

I agree this is an advantage if they don't hit the ordering issue. I
think a lot of the common use cases where this approach has benefits can
be handled well with core.hooksPath and hooks that can turn themselves
on or off depending on the repository config.

What might be an interesting approach that would address these concerns
is a core.globalHooks[0] option that points to a set (or sets,
depending) of multiple-hook directories. We then enumerate hooks in sort
order, considering both the global and the local directories as one
unit, perhaps with some way of disabling hooks. I'm not planning on
working on this myself, but I wouldn't be opposed to seeing someone else
work on it.

[0] Better name suggestions are, of course, welcome.

Jonathan Nieder May 14, 2019, 2:26 a.m. UTC | #3

Hi,

brian m. carlson wrote:
> On Mon, May 13, 2019 at 05:51:01PM -0700, Jonathan Nieder wrote:
>> brian m. carlson wrote:

>>>                          the fact that inheritance in the configuration
>>> is in-order and can't be easily modified means that it's not likely to
>>> be very useful, but it is likely to be quite surprising for the average
>>> user.
>>
>> Can we discuss this some more?  What would it take to make it likely
>> to be useful in your view?
>
> There are two aspects here which I think are worth discussing. Let's
> discuss the inheritance issue first.
>
> Order with multiple hooks matters. The best hook as an example for this
> is prepare-commit-msg. If I have a hook which I want to run on every
> repository, such as a hook that inserts some sort of ID (bug tracker,
> Gerrit, etc.), that hook, due to inheritance, *has* to be first, before
> any other prepare-commit-msg hooks. If I want a hook that runs before
> it, perhaps because a particular repository needs a different
> configuration, I have to wipe the list and insert both hooks. I'm now
> maintaining two separate locations for the command lines instead of just
> inserting a symlink to the global hook and dropping a new hook before
> it.
>
> I don't think there's a good way to make it easier unless we radically
> customize the way configuration is done.

Wouldn't a separate config item e.g. to reverse order (or to perform
whatever other customization seems appropriate) cover this?

In other words, use the standard config convention for the set of
hooks, and treat the order in which they are invoked as a separate
question.  You could even use the hooks.d style alphabetical order
convention.

[...]
> The second issue here is that it's surprising. Users don't know how to
> reset a configuration option because we don't have a consistent way to
> do that.

I agree that it's underdocumented and underimplemented.  But I'm not
aware of any other method that Git provides to reset a configuration
item.  What is it inconsistent with?

>           Users will not expect for there to be multiple ways to set
> hooks. Users will also not expect that their hooks in their
> configuration aren't run if there are hooks in .git/hooks. Tooling that
> has so far used .git/hooks will compete with users' global configuration
> options, which I guarantee you will be a surprise for users using older
> versions of tools.

Indeed, in the long term I think we should remove the .git/hooks/
mechanism entirely.

In the shorter term, I think the kind of inconsistency you're referring
to applies to hooks.d as well.

> The new behavior, which puts everything in the same directory
> (.git/hooks) is much easier to reason about.

That's a good point: a .git/hooks/README sounds like it would be
helpful here.

[...]
> It also provides a convenient place for hooks to live, which a
> config-based option doesn't. We'll need to invoke things using /bin/sh,
> so will they all have to live in PATH? What about one-offs that don't
> really belong in PATH?

This hasn't been a problem for remote helpers, merge drivers, etc in
the past.  Why are hooks different?

To be clear, I think it's a reasonable problem to solve, and I've
actually been surprised that it hasn't been a problem for people.

[...]
> I agree this is an advantage if they don't hit the ordering issue.

Wonderful.  Sounds like if I do some work on the ordering issue, then
we have a path forward.

>                                                                    I
> think a lot of the common use cases where this approach has benefits can
> be handled well with core.hooksPath and hooks that can turn themselves
> on or off depending on the repository config.

I think core.hooksPath attempted to solve this problem, but it has
several deficiencies:

1. It assumes a single, centrally managed hooks directory, and there's
   no standard for where that directory lives.  This means that it
   can't be counted on by tools like "git secrets" --- instead, each
   particular installation has to set up a custom hooks directory for
   themselves.

2. Since it assumes a single, centrally managed hooks directory,
   customizations in a single repository (e.g. to enable or disable a
   single hook) require duplicating the entire directory.

3. It had no migration path defined to becoming the default, so it
   doesn't end up being discoverable.  core.hooksPath is designed as
   a special case, making it hard to debug, instead of being a
   mainstream setting that can be recommended as a future default.

> What might be an interesting approach that would address these concerns
> is a core.globalHooks[0] option that points to a set (or sets,
> depending) of multiple-hook directories. We then enumerate hooks in sort
> order, considering both the global and the local directories as one
> unit, perhaps with some way of disabling hooks. I'm not planning on
> working on this myself, but I wouldn't be opposed to seeing someone else
> work on it.

This sounds overflexible to me.  Because of that, I don't think it
would end up as a default, so we wouldn't have a path to improving our
security stature.

If I implement a config based multiple hooks feature with name based
ordering, would that be useful to you?

Thanks,
Jonathan

> [0] Better name suggestions are, of course, welcome.

Duy Nguyen May 14, 2019, 1:30 p.m. UTC | #4

On Tue, May 14, 2019 at 7:23 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> This series introduces multiple hook support.
>
> I've thought a lot about the discussion over whether this series should
> use the configuration as the source for multiple hooks. Ultimately, I've
> come to the decision that it's not a good idea. Even adopting the empty
> entry as a reset marker, the fact that inheritance in the configuration
> is in-order and can't be easily modified means that it's not likely to
> be very useful, but it is likely to be quite surprising for the average
> user.

Can we just do like we do with hooks-in-directory? Ignore the config
variable order. Sort them all alphabetically. The user then has to
prefix a number or something to control order. Easier transition from
hooks-in-dir to hooks-in-config too.

> I think a solution that sticks with the existing model and builds
> off a design used by other systems people are familiar with, like cron
> and run-parts, is going to be a better choice. Moreover, this is the
> design that people have already built with outside tooling, which is a
> further argument in favor of it.

brian m. carlson May 16, 2019, 12:42 a.m. UTC | #5

On Mon, May 13, 2019 at 07:26:53PM -0700, Jonathan Nieder wrote:
> brian m. carlson wrote:
> In other words, use the standard config convention for the set of
> hooks, and treat the order in which they are invoked as a separate
> question.  You could even use the hooks.d style alphabetical order
> convention.

If we're going to have named items in PATH, we do not want to have those
items numbered.

> [...]
> > The second issue here is that it's surprising. Users don't know how to
> > reset a configuration option because we don't have a consistent way to
> > do that.
> 
> I agree that it's underdocumented and underimplemented.  But I'm not
> aware of any other method that Git provides to reset a configuration
> item.  What is it inconsistent with?

Some options support it, and some don't.

> >           Users will not expect for there to be multiple ways to set
> > hooks. Users will also not expect that their hooks in their
> > configuration aren't run if there are hooks in .git/hooks. Tooling that
> > has so far used .git/hooks will compete with users' global configuration
> > options, which I guarantee you will be a surprise for users using older
> > versions of tools.
> 
> Indeed, in the long term I think we should remove the .git/hooks/
> mechanism entirely.

See, I don't. I like the current mechanism and don't want to get rid of
it.

> In the shorter term, I think the kind of inconsistency you're referring
> to applies to hooks.d as well.

I think it's a lot easier to reason about adding a directory to the
existing mechanism than adopting a dramatically new one. Users can
easily understand that if there's a file, that the directory is ignored.

Also, this is the way that most other programs on Unix do this behavior,
and I think that is a compelling argument for this design in and of
itself. I think Unix has generally made the best decisions in operating
system design, and I aim to emulate it as much as possible.

> > It also provides a convenient place for hooks to live, which a
> > config-based option doesn't. We'll need to invoke things using /bin/sh,
> > so will they all have to live in PATH? What about one-offs that don't
> > really belong in PATH?
> 
> This hasn't been a problem for remote helpers, merge drivers, etc in
> the past.  Why are hooks different?

Because they're often customized or installed via the repository itself.
Some projects provide pre-commit hooks to run tests or such in a
repository-specific way. Some people have one-off scripts that are
appropriate for a repo but they don't want in tab-completion. Also, they
may wrap a system command with helpful information, but that doesn't
belong in PATH.

Git LFS, for example, installs hooks that warn the user if the command
isn't installed, since it's a common misconfiguration and not pushing
the LFS objects (via the pre-push hook) along with the Git objects is a
common source of data loss. Uninstalling the tool (or not installing it
if it's a shared repository) doesn't mean the hook still shouldn't be
run.

> > What might be an interesting approach that would address these concerns
> > is a core.globalHooks[0] option that points to a set (or sets,
> > depending) of multiple-hook directories. We then enumerate hooks in sort
> > order, considering both the global and the local directories as one
> > unit, perhaps with some way of disabling hooks. I'm not planning on
> > working on this myself, but I wouldn't be opposed to seeing someone else
> > work on it.
> 
> This sounds overflexible to me.  Because of that, I don't think it
> would end up as a default, so we wouldn't have a path to improving our
> security stature.
> 
> If I implement a config based multiple hooks feature with name based
> ordering, would that be useful to you?

One of my primary motivations for writing this series is because it's a
requested feature in conjunction with Git LFS. We often get requests for
features that are properly directed to Git itself, but because the issue
comes up in conjunction with the use of Git LFS, folks report the issues
to us. (Also, people don't like reporting issues to mailing lists.) I
think it's a useful feature for others, though, judging by the multiple
times it's come up.

I see the series that I'm proposing as easy to implement and work with
using the existing tools and with other tools that use the existing
hooks. It's a simple change of path. I anticipate there being more work
to implement a config-based multiple hooks solution in various tools,
and I expect it will be less intuitive and less discoverable. I can't
provide definitive evidence for this, but my experience answering a lot
of user questions on a daily basis leads me to believe that it's not the
right way forward.

I'm not opposed to extending the config system to implement multiple
hooks directories or add support for inheriting hooks, because that's a
common thing that people want. I just don't think our config system is
the right tool for specifying what commands to run, for the reasons I've
specified.

I can't prevent you from writing a series that implements a config-based
option, and if it's the solution the list wants, I'll go along with it,
but it's not the solution I want to see personally or as a tooling
implementer.

Jonathan Nieder May 16, 2019, 12:51 a.m. UTC | #6

brian m. carlson wrote:

> Also, this is the way that most other programs on Unix do this behavior,
> and I think that is a compelling argument for this design in and of
> itself. I think Unix has generally made the best decisions in operating
> system design, and I aim to emulate it as much as possible.

Do a lot of other programs run commands from a specially named
subdirectory of the current directory?

If you were talking about a hooks dir in /etc, I would completely
agree.  But we are talking about .git/hooks/, which has been a
constant source of real compromises.

Anyway, I think we've gone back and forth enough times to discover
we're not going to agree on this.

[...]
>> This hasn't been a problem for remote helpers, merge drivers, etc in
>> the past.  Why are hooks different?
[...]
> Git LFS, for example, installs hooks that warn the user if the command
> isn't installed, since it's a common misconfiguration and not pushing
> the LFS objects (via the pre-push hook) along with the Git objects is a
> common source of data loss. Uninstalling the tool (or not installing it
> if it's a shared repository) doesn't mean the hook still shouldn't be
> run.

I don't understand this example.  If the repository is configured to
use the Git LFS hooks, why wouldn't it print a friendly message?

[...]
> I'm not opposed to extending the config system to implement multiple
> hooks directories or add support for inheriting hooks, because that's a
> common thing that people want. I just don't think our config system is
> the right tool for specifying what commands to run, for the reasons I've
> specified.
>
> I can't prevent you from writing a series that implements a config-based
> option, and if it's the solution the list wants, I'll go along with it,
> but it's not the solution I want to see personally or as a tooling
> implementer.

I think you're answering a different question than I asked.

One thing I've been talking about is having a path to eventually
getting rid of support for .git/hooks/, on a user controlled timeline,
with a smooth migration.  I proposed one possible way to do that, and
I was asking whether it would work okay for your use case, or whether
there are problems with it (which would give me something to work with
on iterating toward something that would work for you).

Your answer is "I can't prevent you", which means you don't like the
proposal, but doesn't tell me what about it would not work.

Thanks,
Jonathan

Jeff King May 16, 2019, 4:51 a.m. UTC | #7

On Tue, May 14, 2019 at 01:59:28AM +0000, brian m. carlson wrote:

> There are two aspects here which I think are worth discussing. Let's
> discuss the inheritance issue first.
> 
> Order with multiple hooks matters. The best hook as an example for this
> is prepare-commit-msg. If I have a hook which I want to run on every
> repository, such as a hook that inserts some sort of ID (bug tracker,
> Gerrit, etc.), that hook, due to inheritance, *has* to be first, before
> any other prepare-commit-msg hooks. If I want a hook that runs before
> it, perhaps because a particular repository needs a different
> configuration, I have to wipe the list and insert both hooks. I'm now
> maintaining two separate locations for the command lines instead of just
> inserting a symlink to the global hook and dropping a new hook before
> it.

This part confuses me. In the config based scheme you describe, you have
to mention the original hook again. But isn't that exactly what creating
a symlink in your hooks directory is doing?

And in fact, I think a config based scheme can be a lot more flexible in
the end, simply because the parser _does_ see all of the hooks (whereas
in the hook directory scheme, we only ever look in one directory). So we
can have an option to sort them before running: alphabetically,
system-to-repo order, repo-to-system order, etc.

> The second issue here is that it's surprising. Users don't know how to
> reset a configuration option because we don't have a consistent way to
> do that. Users will not expect for there to be multiple ways to set
> hooks. Users will also not expect that their hooks in their
> configuration aren't run if there are hooks in .git/hooks. Tooling that
> has so far used .git/hooks will compete with users' global configuration
> options, which I guarantee you will be a surprise for users using older
> versions of tools.

I don't agree here. If the rule is "config takes precedence", and "if
config is absent "behave as if .git/hooks/whatever was in the config",
then the transition is easy to explain. And nobody sees any change at
all until they decide to set the config. It would probably also be
prudent to issue a warning if there's config _and_ an on-disk hook file
(or even run them both, though then you open up the question of
ordering).

Which isn't to say it's impossible for a person to get confused (new
tool sets the config, disabling their old hook?). But I think any
transition to multi-hook support (including your directory scheme) is
going to have some possibility that tools automatically setting things
up behind the scenes is going to confuse a user.

> It also provides a convenient place for hooks to live, which a
> config-based option doesn't. We'll need to invoke things using /bin/sh,
> so will they all have to live in PATH? What about one-offs that don't
> really belong in PATH?

Actually, my biggest beef with the current hooks mechanism is that it's
an _inconvenient_ place for them to live.

  - it's not version controlled, and putting a repository inside another
    repository is awkward (though it at least works for the bare case)

  - touching the filesystem is awkward and expensive if you have a large
    number of repositories whose hooks you want to update

  - it requires a manual step to modify the filesystem after a fresh
    clone (as opposed to putting things in ~/.gitconfig). Technically
    one can solve this by modifying .../share/git/templates, but that
    has its own issues.

In my world-view config-based hooks would just be run with SHELL_PATH,
just like our other config options. If you don't want one-offs in your
PATH, then use absolute paths (just like you would have to for
symlinks). If you really don't know where to put one-off hooks, perhaps
we could document a base directory from which the hook script is run,
and you could put it in a directory in .git and use a relative path? You
could even call that directory "hooks", but I suspect that would be too
confusing. :)

-Peff

[v2,0/7] Multiple hook support

Message

Comments