Message ID | 20190514002332.121089-1-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
Headers | show |
Series | Multiple hook support | expand |
Hi, brian m. carlson wrote: > I've thought a lot about the discussion over whether this series should > use the configuration as the source for multiple hooks. Ultimately, I've > come to the decision that it's not a good idea. Even adopting the empty > entry as a reset marker, the fact that inheritance in the configuration > is in-order and can't be easily modified means that it's not likely to > be very useful, but it is likely to be quite surprising for the average > user. Can we discuss this some more? What would it take to make it likely to be useful in your view? > I think a solution that sticks with the existing model and builds > off a design used by other systems people are familiar with, like cron > and run-parts, is going to be a better choice. Moreover, this is the > design that people have already built with outside tooling, which is a > further argument in favor of it. To be clear, the main advantage I see for config versus the .git/hooks model is that with the config model, a user doesn't have to search throughout the filesystem for .git/hooks directories to update when a hook is broken. There are other models that similarly fix that --- e.g. putting hooks in /etc. But as long as they're in .git/hooks, I think we're digging ourselves deeper into the same hole. Thanks, Jonathan
On Mon, May 13, 2019 at 05:51:01PM -0700, Jonathan Nieder wrote: > Hi, > > brian m. carlson wrote: > > > I've thought a lot about the discussion over whether this series should > > use the configuration as the source for multiple hooks. Ultimately, I've > > come to the decision that it's not a good idea. Even adopting the empty > > entry as a reset marker, the fact that inheritance in the configuration > > is in-order and can't be easily modified means that it's not likely to > > be very useful, but it is likely to be quite surprising for the average > > user. > > Can we discuss this some more? What would it take to make it likely > to be useful in your view? There are two aspects here which I think are worth discussing. Let's discuss the inheritance issue first. Order with multiple hooks matters. The best hook as an example for this is prepare-commit-msg. If I have a hook which I want to run on every repository, such as a hook that inserts some sort of ID (bug tracker, Gerrit, etc.), that hook, due to inheritance, *has* to be first, before any other prepare-commit-msg hooks. If I want a hook that runs before it, perhaps because a particular repository needs a different configuration, I have to wipe the list and insert both hooks. I'm now maintaining two separate locations for the command lines instead of just inserting a symlink to the global hook and dropping a new hook before it. I don't think there's a good way to make it easier unless we radically customize the way configuration is done. I don't doubt that there are a small number of configurations where the inheritance behavior is fine—I believe GitHub's backend is one of them—but overall I think it's hard to reason about and customize. The second issue here is that it's surprising. Users don't know how to reset a configuration option because we don't have a consistent way to do that. Users will not expect for there to be multiple ways to set hooks. Users will also not expect that their hooks in their configuration aren't run if there are hooks in .git/hooks. Tooling that has so far used .git/hooks will compete with users' global configuration options, which I guarantee you will be a surprise for users using older versions of tools. The new behavior, which puts everything in the same directory (.git/hooks) is much easier to reason about. I think we're obligated to consider the experience for the average end user, who may not be intimately familiar with how Git works but still needs to use it to get work done. It also provides a convenient place for hooks to live, which a config-based option doesn't. We'll need to invoke things using /bin/sh, so will they all have to live in PATH? What about one-offs that don't really belong in PATH? > > I think a solution that sticks with the existing model and builds > > off a design used by other systems people are familiar with, like cron > > and run-parts, is going to be a better choice. Moreover, this is the > > design that people have already built with outside tooling, which is a > > further argument in favor of it. > > To be clear, the main advantage I see for config versus the .git/hooks > model is that with the config model, a user doesn't have to search > throughout the filesystem for .git/hooks directories to update when a > hook is broken. I agree this is an advantage if they don't hit the ordering issue. I think a lot of the common use cases where this approach has benefits can be handled well with core.hooksPath and hooks that can turn themselves on or off depending on the repository config. What might be an interesting approach that would address these concerns is a core.globalHooks[0] option that points to a set (or sets, depending) of multiple-hook directories. We then enumerate hooks in sort order, considering both the global and the local directories as one unit, perhaps with some way of disabling hooks. I'm not planning on working on this myself, but I wouldn't be opposed to seeing someone else work on it. [0] Better name suggestions are, of course, welcome.
Hi, brian m. carlson wrote: > On Mon, May 13, 2019 at 05:51:01PM -0700, Jonathan Nieder wrote: >> brian m. carlson wrote: >>> the fact that inheritance in the configuration >>> is in-order and can't be easily modified means that it's not likely to >>> be very useful, but it is likely to be quite surprising for the average >>> user. >> >> Can we discuss this some more? What would it take to make it likely >> to be useful in your view? > > There are two aspects here which I think are worth discussing. Let's > discuss the inheritance issue first. > > Order with multiple hooks matters. The best hook as an example for this > is prepare-commit-msg. If I have a hook which I want to run on every > repository, such as a hook that inserts some sort of ID (bug tracker, > Gerrit, etc.), that hook, due to inheritance, *has* to be first, before > any other prepare-commit-msg hooks. If I want a hook that runs before > it, perhaps because a particular repository needs a different > configuration, I have to wipe the list and insert both hooks. I'm now > maintaining two separate locations for the command lines instead of just > inserting a symlink to the global hook and dropping a new hook before > it. > > I don't think there's a good way to make it easier unless we radically > customize the way configuration is done. Wouldn't a separate config item e.g. to reverse order (or to perform whatever other customization seems appropriate) cover this? In other words, use the standard config convention for the set of hooks, and treat the order in which they are invoked as a separate question. You could even use the hooks.d style alphabetical order convention. [...] > The second issue here is that it's surprising. Users don't know how to > reset a configuration option because we don't have a consistent way to > do that. I agree that it's underdocumented and underimplemented. But I'm not aware of any other method that Git provides to reset a configuration item. What is it inconsistent with? > Users will not expect for there to be multiple ways to set > hooks. Users will also not expect that their hooks in their > configuration aren't run if there are hooks in .git/hooks. Tooling that > has so far used .git/hooks will compete with users' global configuration > options, which I guarantee you will be a surprise for users using older > versions of tools. Indeed, in the long term I think we should remove the .git/hooks/ mechanism entirely. In the shorter term, I think the kind of inconsistency you're referring to applies to hooks.d as well. > The new behavior, which puts everything in the same directory > (.git/hooks) is much easier to reason about. That's a good point: a .git/hooks/README sounds like it would be helpful here. [...] > It also provides a convenient place for hooks to live, which a > config-based option doesn't. We'll need to invoke things using /bin/sh, > so will they all have to live in PATH? What about one-offs that don't > really belong in PATH? This hasn't been a problem for remote helpers, merge drivers, etc in the past. Why are hooks different? To be clear, I think it's a reasonable problem to solve, and I've actually been surprised that it hasn't been a problem for people. [...] > I agree this is an advantage if they don't hit the ordering issue. Wonderful. Sounds like if I do some work on the ordering issue, then we have a path forward. > I > think a lot of the common use cases where this approach has benefits can > be handled well with core.hooksPath and hooks that can turn themselves > on or off depending on the repository config. I think core.hooksPath attempted to solve this problem, but it has several deficiencies: 1. It assumes a single, centrally managed hooks directory, and there's no standard for where that directory lives. This means that it can't be counted on by tools like "git secrets" --- instead, each particular installation has to set up a custom hooks directory for themselves. 2. Since it assumes a single, centrally managed hooks directory, customizations in a single repository (e.g. to enable or disable a single hook) require duplicating the entire directory. 3. It had no migration path defined to becoming the default, so it doesn't end up being discoverable. core.hooksPath is designed as a special case, making it hard to debug, instead of being a mainstream setting that can be recommended as a future default. > What might be an interesting approach that would address these concerns > is a core.globalHooks[0] option that points to a set (or sets, > depending) of multiple-hook directories. We then enumerate hooks in sort > order, considering both the global and the local directories as one > unit, perhaps with some way of disabling hooks. I'm not planning on > working on this myself, but I wouldn't be opposed to seeing someone else > work on it. This sounds overflexible to me. Because of that, I don't think it would end up as a default, so we wouldn't have a path to improving our security stature. If I implement a config based multiple hooks feature with name based ordering, would that be useful to you? Thanks, Jonathan > [0] Better name suggestions are, of course, welcome.
On Tue, May 14, 2019 at 7:23 AM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > This series introduces multiple hook support. > > I've thought a lot about the discussion over whether this series should > use the configuration as the source for multiple hooks. Ultimately, I've > come to the decision that it's not a good idea. Even adopting the empty > entry as a reset marker, the fact that inheritance in the configuration > is in-order and can't be easily modified means that it's not likely to > be very useful, but it is likely to be quite surprising for the average > user. Can we just do like we do with hooks-in-directory? Ignore the config variable order. Sort them all alphabetically. The user then has to prefix a number or something to control order. Easier transition from hooks-in-dir to hooks-in-config too. > I think a solution that sticks with the existing model and builds > off a design used by other systems people are familiar with, like cron > and run-parts, is going to be a better choice. Moreover, this is the > design that people have already built with outside tooling, which is a > further argument in favor of it.
On Mon, May 13, 2019 at 07:26:53PM -0700, Jonathan Nieder wrote: > brian m. carlson wrote: > In other words, use the standard config convention for the set of > hooks, and treat the order in which they are invoked as a separate > question. You could even use the hooks.d style alphabetical order > convention. If we're going to have named items in PATH, we do not want to have those items numbered. > [...] > > The second issue here is that it's surprising. Users don't know how to > > reset a configuration option because we don't have a consistent way to > > do that. > > I agree that it's underdocumented and underimplemented. But I'm not > aware of any other method that Git provides to reset a configuration > item. What is it inconsistent with? Some options support it, and some don't. > > Users will not expect for there to be multiple ways to set > > hooks. Users will also not expect that their hooks in their > > configuration aren't run if there are hooks in .git/hooks. Tooling that > > has so far used .git/hooks will compete with users' global configuration > > options, which I guarantee you will be a surprise for users using older > > versions of tools. > > Indeed, in the long term I think we should remove the .git/hooks/ > mechanism entirely. See, I don't. I like the current mechanism and don't want to get rid of it. > In the shorter term, I think the kind of inconsistency you're referring > to applies to hooks.d as well. I think it's a lot easier to reason about adding a directory to the existing mechanism than adopting a dramatically new one. Users can easily understand that if there's a file, that the directory is ignored. Also, this is the way that most other programs on Unix do this behavior, and I think that is a compelling argument for this design in and of itself. I think Unix has generally made the best decisions in operating system design, and I aim to emulate it as much as possible. > > It also provides a convenient place for hooks to live, which a > > config-based option doesn't. We'll need to invoke things using /bin/sh, > > so will they all have to live in PATH? What about one-offs that don't > > really belong in PATH? > > This hasn't been a problem for remote helpers, merge drivers, etc in > the past. Why are hooks different? Because they're often customized or installed via the repository itself. Some projects provide pre-commit hooks to run tests or such in a repository-specific way. Some people have one-off scripts that are appropriate for a repo but they don't want in tab-completion. Also, they may wrap a system command with helpful information, but that doesn't belong in PATH. Git LFS, for example, installs hooks that warn the user if the command isn't installed, since it's a common misconfiguration and not pushing the LFS objects (via the pre-push hook) along with the Git objects is a common source of data loss. Uninstalling the tool (or not installing it if it's a shared repository) doesn't mean the hook still shouldn't be run. > > What might be an interesting approach that would address these concerns > > is a core.globalHooks[0] option that points to a set (or sets, > > depending) of multiple-hook directories. We then enumerate hooks in sort > > order, considering both the global and the local directories as one > > unit, perhaps with some way of disabling hooks. I'm not planning on > > working on this myself, but I wouldn't be opposed to seeing someone else > > work on it. > > This sounds overflexible to me. Because of that, I don't think it > would end up as a default, so we wouldn't have a path to improving our > security stature. > > If I implement a config based multiple hooks feature with name based > ordering, would that be useful to you? One of my primary motivations for writing this series is because it's a requested feature in conjunction with Git LFS. We often get requests for features that are properly directed to Git itself, but because the issue comes up in conjunction with the use of Git LFS, folks report the issues to us. (Also, people don't like reporting issues to mailing lists.) I think it's a useful feature for others, though, judging by the multiple times it's come up. I see the series that I'm proposing as easy to implement and work with using the existing tools and with other tools that use the existing hooks. It's a simple change of path. I anticipate there being more work to implement a config-based multiple hooks solution in various tools, and I expect it will be less intuitive and less discoverable. I can't provide definitive evidence for this, but my experience answering a lot of user questions on a daily basis leads me to believe that it's not the right way forward. I'm not opposed to extending the config system to implement multiple hooks directories or add support for inheriting hooks, because that's a common thing that people want. I just don't think our config system is the right tool for specifying what commands to run, for the reasons I've specified. I can't prevent you from writing a series that implements a config-based option, and if it's the solution the list wants, I'll go along with it, but it's not the solution I want to see personally or as a tooling implementer.
brian m. carlson wrote: > Also, this is the way that most other programs on Unix do this behavior, > and I think that is a compelling argument for this design in and of > itself. I think Unix has generally made the best decisions in operating > system design, and I aim to emulate it as much as possible. Do a lot of other programs run commands from a specially named subdirectory of the current directory? If you were talking about a hooks dir in /etc, I would completely agree. But we are talking about .git/hooks/, which has been a constant source of real compromises. Anyway, I think we've gone back and forth enough times to discover we're not going to agree on this. [...] >> This hasn't been a problem for remote helpers, merge drivers, etc in >> the past. Why are hooks different? [...] > Git LFS, for example, installs hooks that warn the user if the command > isn't installed, since it's a common misconfiguration and not pushing > the LFS objects (via the pre-push hook) along with the Git objects is a > common source of data loss. Uninstalling the tool (or not installing it > if it's a shared repository) doesn't mean the hook still shouldn't be > run. I don't understand this example. If the repository is configured to use the Git LFS hooks, why wouldn't it print a friendly message? [...] > I'm not opposed to extending the config system to implement multiple > hooks directories or add support for inheriting hooks, because that's a > common thing that people want. I just don't think our config system is > the right tool for specifying what commands to run, for the reasons I've > specified. > > I can't prevent you from writing a series that implements a config-based > option, and if it's the solution the list wants, I'll go along with it, > but it's not the solution I want to see personally or as a tooling > implementer. I think you're answering a different question than I asked. One thing I've been talking about is having a path to eventually getting rid of support for .git/hooks/, on a user controlled timeline, with a smooth migration. I proposed one possible way to do that, and I was asking whether it would work okay for your use case, or whether there are problems with it (which would give me something to work with on iterating toward something that would work for you). Your answer is "I can't prevent you", which means you don't like the proposal, but doesn't tell me what about it would not work. Thanks, Jonathan
On Tue, May 14, 2019 at 01:59:28AM +0000, brian m. carlson wrote: > There are two aspects here which I think are worth discussing. Let's > discuss the inheritance issue first. > > Order with multiple hooks matters. The best hook as an example for this > is prepare-commit-msg. If I have a hook which I want to run on every > repository, such as a hook that inserts some sort of ID (bug tracker, > Gerrit, etc.), that hook, due to inheritance, *has* to be first, before > any other prepare-commit-msg hooks. If I want a hook that runs before > it, perhaps because a particular repository needs a different > configuration, I have to wipe the list and insert both hooks. I'm now > maintaining two separate locations for the command lines instead of just > inserting a symlink to the global hook and dropping a new hook before > it. This part confuses me. In the config based scheme you describe, you have to mention the original hook again. But isn't that exactly what creating a symlink in your hooks directory is doing? And in fact, I think a config based scheme can be a lot more flexible in the end, simply because the parser _does_ see all of the hooks (whereas in the hook directory scheme, we only ever look in one directory). So we can have an option to sort them before running: alphabetically, system-to-repo order, repo-to-system order, etc. > The second issue here is that it's surprising. Users don't know how to > reset a configuration option because we don't have a consistent way to > do that. Users will not expect for there to be multiple ways to set > hooks. Users will also not expect that their hooks in their > configuration aren't run if there are hooks in .git/hooks. Tooling that > has so far used .git/hooks will compete with users' global configuration > options, which I guarantee you will be a surprise for users using older > versions of tools. I don't agree here. If the rule is "config takes precedence", and "if config is absent "behave as if .git/hooks/whatever was in the config", then the transition is easy to explain. And nobody sees any change at all until they decide to set the config. It would probably also be prudent to issue a warning if there's config _and_ an on-disk hook file (or even run them both, though then you open up the question of ordering). Which isn't to say it's impossible for a person to get confused (new tool sets the config, disabling their old hook?). But I think any transition to multi-hook support (including your directory scheme) is going to have some possibility that tools automatically setting things up behind the scenes is going to confuse a user. > It also provides a convenient place for hooks to live, which a > config-based option doesn't. We'll need to invoke things using /bin/sh, > so will they all have to live in PATH? What about one-offs that don't > really belong in PATH? Actually, my biggest beef with the current hooks mechanism is that it's an _inconvenient_ place for them to live. - it's not version controlled, and putting a repository inside another repository is awkward (though it at least works for the bare case) - touching the filesystem is awkward and expensive if you have a large number of repositories whose hooks you want to update - it requires a manual step to modify the filesystem after a fresh clone (as opposed to putting things in ~/.gitconfig). Technically one can solve this by modifying .../share/git/templates, but that has its own issues. In my world-view config-based hooks would just be run with SHELL_PATH, just like our other config options. If you don't want one-offs in your PATH, then use absolute paths (just like you would have to for symlinks). If you really don't know where to put one-off hooks, perhaps we could document a base directory from which the hook script is run, and you could put it in a directory in .git and use a relative path? You could even call that directory "hooks", but I suspect that would be too confusing. :) -Peff