Message ID | cover.1623881977.git.jonathantanmy@google.com (mailing list archive) |
---|---|
Headers | show |
Series | MVP implementation of remote-suggested hooks | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > 1. The remote repo administrator creates a new branch > "refs/heads/suggested-hooks" pointing to a commit that has all the > hooks that the administrator wants to suggest. The hooks are > directly referenced by the commit tree (i.e. they are in the "/" > directory). wants to suggest? They simply suggest ;-) > 2. When a user clones, Git notices that > "refs/remotes/origin/suggested-hooks" is present and prints out a > message about a command that can be run. Can be run to install? Or can be run to first inspect? Or both? > 3. If the user runs that command, Git will install the hooks pointed to > by that ref, and set hook.autoupdate to true. This config variable > is checked whenever "git fetch" is run: whenever it notices that > "refs/remotes/origin/suggested-hooks" changes, it will reinstall the > hooks. > > 4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks > will remain. OK, so "verify even if you implicitly trust" is actively discouraged. > Design choices: > > 1. Where should the suggested hooks be in the remote repo? A branch, > a non-branch ref, a config? I think that a branch is best - it is > relatively well-understood and any hooks there can be > version-controlled (and its history is independent of the other > branches). As people mentioned in the previous discussions, "independent of the other branches" has advantages and disadvantages. The most recent set of hooks may have some that would not work well with older codebase, so care must be taken to ensure any hook works on across versions of the main codebase. Which may not be a huge downside, but something users must be aware of. > 2. When and how should the local repo update its record of the remote's > suggested hooks? If we go with storing the hooks in a branch of a > remote side, this would automatically mean (with the default > refspec) that it would be in refs/remotes/<remote>/<name>. This > incurs network and hard disk cost even if the local repo does not > want to use the suggested hooks, but I think that typically they > would want to use it if they're going to do any work on the repo > (they would either have to trust or inspect Makefiles etc. anyway, > so they can do the same for the hooks), and even if they don't want > to use the remote's hooks, they probably still want to know what the > remote suggests. A way to see what changes are made to recommendation would be useful, and a branch that mostly linearly progresses is a good way to give it to the users. Of course, that can be done with suggested hooks inline with the rest of the main codebase, too. > 4. Should the local repo try to notice if the hooks have been changed > locally before overwriting upon autoupdate? This would be nice to > have, but I don't know how practical it would be. In particular, I > don't know if we can trust that > "refs/remotes/origin/suggested-hooks" has not been clobbered. Meaning clobbered by malicious parties? Then the whole thing is a no-go from security point of view. Presumably you trust the main content enough to work on top of it, so as long as you can trust that refs/remotes/origin/hooks to the same degree that you would trust refs/remotes/origin/master, I do not think it is a problem. Whatever mechanism you use to materialize an updated version of the hooks can and should record which commit on the suggested-hooks branch the .git/hooks/* file is based on. Then when the user wants to update to a different version of suggested-hooks (either because you auto-detected, or the user issued a command to update), you have - The current version in .git/hooks/* - The old pristine version (you recorded the commit when you updated the .git/hooks/* copy the last time) - The new pristine version (you have a remote-tracking branch). and it is not a brain surgery to perform three-way merge to update the first using the difference between the second and the third. > 5. Should we have a command that manually updates the hooks with what's > in "refs/heads/suggested-hooks"? This is not in this patch set, but > it sounds like a good idea. I wonder if having it bound as a submodule to a known location in the main contents tree makes it easier to manage and more flexible. Just like you can update the working tree of a submodule to a version that is bound to the superproject tree, or to a more recent version of the "branch" in the submodule, you can support workflows that allow suggested hooks to advance independent of the main contents version and that uses a specific version of suggested hooks tied to the main contents. > There are perhaps other points that I haven't thought of, of course. > > [1] https://lore.kernel.org/git/pull.908.git.1616105016055.gitgitgadget@gmail.com/ > > Jonathan Tan (2): > hook: move list of hooks > clone,fetch: remote-suggested auto-updating hooks > > builtin/bugreport.c | 38 ++------------ > builtin/clone.c | 10 ++++ > builtin/fetch.c | 21 ++++++++ > builtin/hook.c | 13 +++-- > hook.c | 118 ++++++++++++++++++++++++++++++++++++++++++++ > hook.h | 5 ++ > t/t5601-clone.sh | 36 ++++++++++++++ > 7 files changed, 202 insertions(+), 39 deletions(-)
On Wed, Jun 16, 2021 at 04:31:47PM -0700, Jonathan Tan wrote: > > I have had to make several design choices (which I will discuss later), > but now with this implementation, the following workflow is possible: > > 1. The remote repo administrator creates a new branch > "refs/heads/suggested-hooks" pointing to a commit that has all the > hooks that the administrator wants to suggest. The hooks are > directly referenced by the commit tree (i.e. they are in the "/" > directory). I don't really like that this is in the same namespace as branches users could create themselves. Hm, I think for 'git maintenance' prefetching we put those refs in some special namespace, right? Can we do something similar in this case? Would that prevent us from treating that ref like a normal branch? > > 2. When a user clones, Git notices that > "refs/remotes/origin/suggested-hooks" is present and prints out a > message about a command that can be run. > > 3. If the user runs that command, Git will install the hooks pointed to > by that ref, and set hook.autoupdate to true. This config variable > is checked whenever "git fetch" is run: whenever it notices that > "refs/remotes/origin/suggested-hooks" changes, it will reinstall the > hooks. I think this is where most people will have a lot to say from security perspective. It will be important to make sure that: - "git fetch" only automatically updates these scripts when fetching from the same remote they were initially fetched from, over some secure protocol - hook.autoupdate setting maybe should not be default, or should at least be able to override during the install command. (One suggestion was to also set some experimental.suggestedHooks setting, or something, to turn on "automatically set up autoupdate" behavior?) - we should notify the user that their hooks were updated/reinstalled when that happens later on. (Maybe you did this, I didn't look at the diff quite yet) > 4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks > will remain. > > Design choices: > > 1. Where should the suggested hooks be in the remote repo? A branch, > a non-branch ref, a config? I think that a branch is best - it is > relatively well-understood and any hooks there can be > version-controlled (and its history is independent of the other > branches). I agree - a branch without any prior parent sounds ideal to me, that is, the repo owner setting up the hooks can say 'git checkout --orphan suggested-hooks'. I guess if they want the branch to have history shared with the rest of the project there's no reason not to do that, either. Do we want to provide helpful tooling for the administrator to create this magic branch? At the very least I guess we want some documentation, but I wonder if it's possible to make a helper (maybe a 'git hook' subcommand) without being more prescriptive than Git project usually is. > 3. How should the local repo detect when the remote has updated its > suggested hooks? I'm thinking when a certain local ref is updated. > Right now it's hardcoded, but perhaps "git clone" could detect what > "refs/heads/suggested-hooks" would correspond to, and then set it in > a config accordingly. Other options include remembering what the > remote's "refs/heads/suggested-hooks" used to be and always > comparing it upon every "ls-refs" call, but I think that the local > ref method is more straightforward. Hm, I like that idea as it's analogous to remote tracking branch + local tracking branch (which is a pretty common pattern for people to use). > > 4. Should the local repo try to notice if the hooks have been changed > locally before overwriting upon autoupdate? This would be nice to > have, but I don't know how practical it would be. In particular, I > don't know if we can trust that > "refs/remotes/origin/suggested-hooks" has not been clobbered. I think it would be nice to have, yeah. It seems important to notify the users that they are executing something different from yesterday, so noticing when we're making changes to the hook sounds useful. > > 5. Should we have a command that manually updates the hooks with what's > in "refs/heads/suggested-hooks"? This is not in this patch set, but > it sounds like a good idea. Yeah, I think so - that lets a user say "no, I don't need those hooks (or this update)" at first, but later change their mind. One thing that sounds useful to me, even at this RFC stage, is documentation showing what this feature looks like to use - for both the administrator setting up the magic hook branch, and the developer cloning and using hooks. I think we want that in some Git manpage eventually anyways, and it would help to see the workflow you're trying to achieve. Thanks for sending - will review the diffs today too. - Emily > > There are perhaps other points that I haven't thought of, of course. > > [1] https://lore.kernel.org/git/pull.908.git.1616105016055.gitgitgadget@gmail.com/ > > Jonathan Tan (2): > hook: move list of hooks > clone,fetch: remote-suggested auto-updating hooks > > builtin/bugreport.c | 38 ++------------ > builtin/clone.c | 10 ++++ > builtin/fetch.c | 21 ++++++++ > builtin/hook.c | 13 +++-- > hook.c | 118 ++++++++++++++++++++++++++++++++++++++++++++ > hook.h | 5 ++ > t/t5601-clone.sh | 36 ++++++++++++++ > 7 files changed, 202 insertions(+), 39 deletions(-) > > -- > 2.32.0.272.g935e593368-goog >
> Jonathan Tan <jonathantanmy@google.com> writes: > > > 1. The remote repo administrator creates a new branch > > "refs/heads/suggested-hooks" pointing to a commit that has all the > > hooks that the administrator wants to suggest. The hooks are > > directly referenced by the commit tree (i.e. they are in the "/" > > directory). > > wants to suggest? They simply suggest ;-) Ah yes :-) > > > 2. When a user clones, Git notices that > > "refs/remotes/origin/suggested-hooks" is present and prints out a > > message about a command that can be run. > > Can be run to install? Or can be run to first inspect? Or both? Right now I only have a command that installs, but I can provide the appropriate "cat-file" invocations to inspect them as well. > > 3. If the user runs that command, Git will install the hooks pointed to > > by that ref, and set hook.autoupdate to true. This config variable > > is checked whenever "git fetch" is run: whenever it notices that > > "refs/remotes/origin/suggested-hooks" changes, it will reinstall the > > hooks. > > > > 4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks > > will remain. > > OK, so "verify even if you implicitly trust" is actively discouraged. Yes I was thinking of the model in which we already trust upstream, but I agree that verification can be useful. I think we can print the "cat-file" commands needed to verify before installing, and add a mode in which we tell the user that the hooks have been updated (but not automatically install them). > > Design choices: > > > > 1. Where should the suggested hooks be in the remote repo? A branch, > > a non-branch ref, a config? I think that a branch is best - it is > > relatively well-understood and any hooks there can be > > version-controlled (and its history is independent of the other > > branches). > > As people mentioned in the previous discussions, "independent of the > other branches" has advantages and disadvantages. The most recent > set of hooks may have some that would not work well with older > codebase, so care must be taken to ensure any hook works on across > versions of the main codebase. Which may not be a huge downside, > but something users must be aware of. That's true - and on the flip side, I would presume that the hook-introducing admin would usually want those hooks to apply retroactively too (say, to someone updating a "maint" branch). I think it's more flexible if hooks are in an independent branch, though - (if independent branch) a hook can be written to tolerate an old codebase, but if embedded in the code branch, such a hook cannot apply retroactively to old code. > > 2. When and how should the local repo update its record of the remote's > > suggested hooks? If we go with storing the hooks in a branch of a > > remote side, this would automatically mean (with the default > > refspec) that it would be in refs/remotes/<remote>/<name>. This > > incurs network and hard disk cost even if the local repo does not > > want to use the suggested hooks, but I think that typically they > > would want to use it if they're going to do any work on the repo > > (they would either have to trust or inspect Makefiles etc. anyway, > > so they can do the same for the hooks), and even if they don't want > > to use the remote's hooks, they probably still want to know what the > > remote suggests. > > A way to see what changes are made to recommendation would be > useful, and a branch that mostly linearly progresses is a good way > to give it to the users. > > Of course, that can be done with suggested hooks inline with the > rest of the main codebase, too. That's true. > > 4. Should the local repo try to notice if the hooks have been changed > > locally before overwriting upon autoupdate? This would be nice to > > have, but I don't know how practical it would be. In particular, I > > don't know if we can trust that > > "refs/remotes/origin/suggested-hooks" has not been clobbered. > > Meaning clobbered by malicious parties? Then the whole thing is a > no-go from security point of view. Presumably you trust the main > content enough to work on top of it, so as long as you can trust > that refs/remotes/origin/hooks to the same degree that you would > trust refs/remotes/origin/master, I do not think it is a problem. I meant clobbered by the user - should have made that more clear. To elaborate... > Whatever mechanism you use to materialize an updated version of the > hooks can and should record which commit on the suggested-hooks > branch the .git/hooks/* file is based on. Then when the user wants > to update to a different version of suggested-hooks (either because > you auto-detected, or the user issued a command to update), you have > > - The current version in .git/hooks/* > > - The old pristine version (you recorded the commit when you > updated the .git/hooks/* copy the last time) > > - The new pristine version (you have a remote-tracking branch). > > and it is not a brain surgery to perform three-way merge to update > the first using the difference between the second and the third. ...I was having a difficult time figuring out where to store such information (ref? config? I wouldn't be surprised if a user saw a value there and thought that they could change it). Perhaps it could be a separate file like .git/shallow. I'm not fully convinced that maintaining the ability to retain local hook modifications by supporting three-way merges is important, but if it is, it might be brain surgery to figure out the UX when merge conflicts occur. Normally these conflicts can be written to the worktree and index, but we don't have those in the case of hooks. > > 5. Should we have a command that manually updates the hooks with what's > > in "refs/heads/suggested-hooks"? This is not in this patch set, but > > it sounds like a good idea. > > I wonder if having it bound as a submodule to a known location in > the main contents tree makes it easier to manage and more flexible. > Just like you can update the working tree of a submodule to a > version that is bound to the superproject tree, or to a more recent > version of the "branch" in the submodule, you can support workflows > that allow suggested hooks to advance independent of the main > contents version and that uses a specific version of suggested hooks > tied to the main contents. This would make the hooks dependent on the commit checked out (with perhaps a bit more leeway in that our implementation could be flexible and use a commit later than what's in the gitlink), with its own pros and cons (as you said earlier).
> On Wed, Jun 16, 2021 at 04:31:47PM -0700, Jonathan Tan wrote: > > > > I have had to make several design choices (which I will discuss later), > > but now with this implementation, the following workflow is possible: > > > > 1. The remote repo administrator creates a new branch > > "refs/heads/suggested-hooks" pointing to a commit that has all the > > hooks that the administrator wants to suggest. The hooks are > > directly referenced by the commit tree (i.e. they are in the "/" > > directory). > > I don't really like that this is in the same namespace as branches users > could create themselves. Hm, I think for 'git maintenance' prefetching > we put those refs in some special namespace, right? Can we do something > similar in this case? Would that prevent us from treating that ref like > a normal branch? Do you mean that the server should put it in a different place, the client should put it in a different place, or both? > > 2. When a user clones, Git notices that > > "refs/remotes/origin/suggested-hooks" is present and prints out a > > message about a command that can be run. > > > > 3. If the user runs that command, Git will install the hooks pointed to > > by that ref, and set hook.autoupdate to true. This config variable > > is checked whenever "git fetch" is run: whenever it notices that > > "refs/remotes/origin/suggested-hooks" changes, it will reinstall the > > hooks. > > I think this is where most people will have a lot to say from security > perspective. It will be important to make sure that: > - "git fetch" only automatically updates these scripts when fetching > from the same remote they were initially fetched from, over some > secure protocol Putting the hooks in a branch does mean that "git fetch" gets these scripts from the same remote (as opposed to, say, putting them in a submodule or referring to them by URL), and I think it's easier to understand (from the user point of view) that branch equals same remote (rather than say that a submodule or URL is considered to be from the same remote if $CRITERIA). > - hook.autoupdate setting maybe should not be default, or should at > least be able to override during the install command. (One suggestion > was to also set some experimental.suggestedHooks setting, or > something, to turn on "automatically set up autoupdate" behavior?) OK, this makes sense. > - we should notify the user that their hooks were updated/reinstalled > when that happens later on. (Maybe you did this, I didn't look at the > diff quite yet) I didn't do this, but this is a good idea. > > Design choices: > > > > 1. Where should the suggested hooks be in the remote repo? A branch, > > a non-branch ref, a config? I think that a branch is best - it is > > relatively well-understood and any hooks there can be > > version-controlled (and its history is independent of the other > > branches). > > I agree - a branch without any prior parent sounds ideal to me, that is, > the repo owner setting up the hooks can say 'git checkout --orphan > suggested-hooks'. I guess if they want the branch to have history > shared with the rest of the project there's no reason not to do that, > either. How would the branch share history with the rest of the project? If you mean that we should allow the users to store hooks as part of the main codebase (within a configurable path, say, /suggested_hooks) and then point both "main" (or whatever the default branch is) and "refs/remotes/origin/suggested-hooks" to the same place, then that makes sense (although I would say that it still sounds better to have separate histories). > Do we want to provide helpful tooling for the administrator to create > this magic branch? At the very least I guess we want some documentation, > but I wonder if it's possible to make a helper (maybe a 'git hook' > subcommand) without being more prescriptive than Git project usually is. This sounds like the equivalent of "git checkout --orphan", as you suggested above. We could just write it out in the documentation. I'm not completely opposed to a special command, though. > > 3. How should the local repo detect when the remote has updated its > > suggested hooks? I'm thinking when a certain local ref is updated. > > Right now it's hardcoded, but perhaps "git clone" could detect what > > "refs/heads/suggested-hooks" would correspond to, and then set it in > > a config accordingly. Other options include remembering what the > > remote's "refs/heads/suggested-hooks" used to be and always > > comparing it upon every "ls-refs" call, but I think that the local > > ref method is more straightforward. > > Hm, I like that idea as it's analogous to remote tracking branch + local > tracking branch (which is a pretty common pattern for people to use). Yeah - that is partly why I like the branch idea (something already generally understood by people). > > 4. Should the local repo try to notice if the hooks have been changed > > locally before overwriting upon autoupdate? This would be nice to > > have, but I don't know how practical it would be. In particular, I > > don't know if we can trust that > > "refs/remotes/origin/suggested-hooks" has not been clobbered. > > I think it would be nice to have, yeah. It seems important to notify the > users that they are executing something different from yesterday, so > noticing when we're making changes to the hook sounds useful. Junio suggested [1] that we store more information about what's going on, so this might be practical. [1] https://lore.kernel.org/git/xmqq35thnuqp.fsf@gitster.g/ > > 5. Should we have a command that manually updates the hooks with what's > > in "refs/heads/suggested-hooks"? This is not in this patch set, but > > it sounds like a good idea. > > Yeah, I think so - that lets a user say "no, I don't need those hooks > (or this update)" at first, but later change their mind. OK. > One thing that sounds useful to me, even at this RFC stage, is > documentation showing what this feature looks like to use - for both > the administrator setting up the magic hook branch, and the developer > cloning and using hooks. I think we want that in some Git manpage > eventually anyways, and it would help to see the workflow you're trying > to achieve. > > Thanks for sending - will review the diffs today too. > > - Emily Agreed - at the very least, we would need to write this workflow as a test, and we can just reuse the same workflow in documentation. We just need to discuss what the workflow should be :-) (as it is, one thing that is becoming obvious is that currently people prefer to have verify-before-update instead of auto-update by default).
On June 18, 2021 5:59 PM, Jonathan Tan wrote: >> On Wed, Jun 16, 2021 at 04:31:47PM -0700, Jonathan Tan wrote: >> > >> > I have had to make several design choices (which I will discuss >> > later), but now with this implementation, the following workflow is possible: >> > >> > 1. The remote repo administrator creates a new branch >> > "refs/heads/suggested-hooks" pointing to a commit that has all the >> > hooks that the administrator wants to suggest. The hooks are >> > directly referenced by the commit tree (i.e. they are in the "/" >> > directory). >> >> I don't really like that this is in the same namespace as branches >> users could create themselves. Hm, I think for 'git maintenance' >> prefetching we put those refs in some special namespace, right? Can we >> do something similar in this case? Would that prevent us from treating >> that ref like a normal branch? > >Do you mean that the server should put it in a different place, the client should put it in a different place, or both? This brings up a very awkward question: How are enterprise git servers going to deal with this? I do not see the standard Pull Request mechanism available in GitHub handing placing hooks in different places during a merge operation. Or will this entire concept be omitted from PR? It seems like changes to hooks have to be managed in a similar way to standard managed files rather than as exceptions. -Randall
On Sat, Jun 19, 2021 at 3:32 AM Randall S. Becker <rsbecker@nexbridge.com> wrote: > > On June 18, 2021 5:59 PM, Jonathan Tan wrote: > This brings up a very awkward question: How are enterprise git servers going to deal with this? I do not see the standard Pull Request mechanism available in GitHub handing placing hooks in different places during a merge operation. Or will this entire concept be omitted from PR? > Related question, if a project had a collection of suggested hooks, and a contributor wanted to update them in relation to a new feature or code change, would they then have to create two separate patches/PRs? That feels like it would decentralize discussions/review? For example, if I maintained a project on GitHub and a contributor wanted to add a clang-tidy invocation as a suggested hook, as well as a config file for it. What would the suggested workflow be? "Open 2 Pull Requests one that updates both branches and do reviews independently"? If it was a mailing list would I have to send two separate patches? I'm not familiar with any workflows or tools that allow you to review and accept changes to two branches as part of the same series of changes. I also think that the history wouldn't be very clear in this case, since there won't be any obvious connection between updates to the suggested-hooks and updates to the rest of the history in this case (other than timestamps I guess, but I think that's a relatively weak form of association) > It seems like changes to hooks have to be managed in a similar way to standard managed files rather than as exceptions. > > -Randall > Thanks, Matthew Rogers
On Wed, Jun 16 2021, Jonathan Tan wrote: > This is a continuation of the work from [1]. That work described the > reasons, possible features, and possible workflows, but not the > implementation in detail. This patch set has an MVP implementation, and > my hope is that having a concrete implementation to look at makes it > easier to discuss matters of implementation. My C on this RFC is: 1) A request that someone reply (there or here would do) to my comments on the last iteration of this at: https://lore.kernel.org/git/874kghk906.fsf@evledraar.gmail.com/ 2) I think you'd get better feedback if you CC'd the people who've been actively discussing this in previous rounds. To briefly summarize the main gist of that 1): > This but does not use any features from es/config-based-hooks but is > implemented on that branch anyway because firstly, I need an existing > command to attach the "autoupdate" subcommand (and "git hook" works), > and secondly, when we test this at $DAYJOB, we will be testing it > together with the aforementioned branch. > > I have had to make several design choices (which I will discuss later), > but now with this implementation, the following workflow is possible: > > 1. The remote repo administrator creates a new branch > "refs/heads/suggested-hooks" pointing to a commit that has all the > hooks that the administrator wants to suggest. The hooks are > directly referenced by the commit tree (i.e. they are in the "/" > directory). > > 2. When a user clones, Git notices that > "refs/remotes/origin/suggested-hooks" is present and prints out a > message about a command that can be run. > > 3. If the user runs that command, Git will install the hooks pointed to > by that ref, and set hook.autoupdate to true. This config variable > is checked whenever "git fetch" is run: whenever it notices that > "refs/remotes/origin/suggested-hooks" changes, it will reinstall the > hooks. > > 4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks > will remain. > > Design choices: > > 1. Where should the suggested hooks be in the remote repo? A branch, > a non-branch ref, a config? I think that a branch is best - it is > relatively well-understood and any hooks there can be > version-controlled (and its history is independent of the other > branches). First, unlike brian I don't (I hope I'm fairly summarizing his view here) disagree mostly or entirely with the existence of such a feature at all. I mean, I get the viewpoint that git shouldn't bless what amounts to an active RCE from the remote. I just think that we could probably do a better job of it than what people are doing in practice, and I've seen people do stuff like have build systems setup permanent symlinks to git-hooks/<some-name> in the tracked dir. We could at least envision a git-native implementation asking the user "do you want this hook update? <shows diff>". I just find this design approach completely bizarre as noted (probably in less blunt words) in the linked E-Mail. We have Emily's series to convert hooks to be config driven that we hope to land in some form, at that point they won't be any more of a special snowflake than any other config. And then, instead of doing what I'd think would be the natural result of that: Simply supporting an in-repo top-level ".gitconfig" file. We're still going to seemingly forever have them be an even more special snowflake with this facility, and the reason seems to be mostly/entirely to do with working around some aspect or restriction of Google's internal infrastructure. I think it's just un-git-y to have a meta-branch that in some way drives not only all other branches, but all other revisions of all branches, ever. It breaks expectations around git in lots of different ways, you can't fetch a single branch and get its hooks, you can't locally alter, commit and update your hooks while e.g. renaming a "t/" directory to "test/"; your hooks and code can't be atomically changed). I think I get why you want to do it that way, I just don't get why, as mostly noted in those earlier rounds why it wouldn't be a better approach / more straightforward / more git-y to: 1. Work on getting hooks driven by config <this is happening with Emily's series / my split-out "base" topic> 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety valves etc. around this, I suggested starting with a whitelist of the N least dangerous config options, e.g. some diff viewing options, or a suggested sendemail.to or whatever. 3. Work our way up to trusting that for more dangerous stuff, eventually hooks. Most of the legitimate concerns from others with this is having some UX where our users won't be trained to just blindly say "yes" to an alias/hook config that "rm -rf's /" or whatever. If we start experimenting with that with aliases or hooks that can run arbitrary code it's like handing a toddlder a shotgun, let's at least start with a sharp fork or something (less dangerous config) :) 4. People who want this "I want my hooks to apply to all revisions ever" could probably get 99% or 100% of what they want if their hook is just a stub that does the equivalent of: sh `curl https://git.google.com/$reponame/hooks/$hookname` You'd then simply forbid on your servers any changes to a .gitconfig that did anything with the hook.* namespace. With such an implementation you don't need a magic "refs/remotes/origin/suggested-hooks" refs, just some state machine (I suggested e.g. GPG signing chains as an eventual end-state, but "show a diff every time" would also do) that keeps track of what config (and hooks are just one such case) has been OK'd, and which has not. I'd think it would even work better in the Googleplex, you could clone a co-worker's branch and execute their hooks, since they're the same as what you've pre-approved, you could even clone some random person's fork of a "blessed" project, because the hooks would be the same `sh $(curl <url I already trust>)`. That validation could even be a system-level in-config hook on your laptop, thus bringing the whole thing full circle...
> Related question, if a project had a collection of suggested hooks, > and a contributor wanted > to update them in relation to a new feature or code change, would they > then have to create two > separate patches/PRs? That feels like it would decentralize discussions/review? > > For example, if I maintained a project on GitHub and a contributor > wanted to add a clang-tidy > invocation as a suggested hook, as well as a config file for it. What > would the suggested workflow be? > > "Open 2 Pull Requests one that updates both branches and do reviews > independently"? > > If it was a mailing list would I have to send two separate patches? With the current design, the answer is "yes" to both. Having said that, for this particular case, I think that the clang-tidy config file can be reviewed and merged first, and then the clang-tidy hook. (The hook needs to be written to also work in the absence of the config file, since the user may have checked out an older version, so as far as I can tell, there is no need for precise timing as to when the hook is merged.)
> On Wed, Jun 16 2021, Jonathan Tan wrote: > > > This is a continuation of the work from [1]. That work described the > > reasons, possible features, and possible workflows, but not the > > implementation in detail. This patch set has an MVP implementation, and > > my hope is that having a concrete implementation to look at makes it > > easier to discuss matters of implementation. > > My C on this RFC is: > > 1) A request that someone reply (there or here would do) to my comments > on the last iteration of this at: > https://lore.kernel.org/git/874kghk906.fsf@evledraar.gmail.com/ OK - I'll take a look at that. > 2) I think you'd get better feedback if you CC'd the people who've been > actively discussing this in previous rounds. Good point. > > Design choices: > > > > 1. Where should the suggested hooks be in the remote repo? A branch, > > a non-branch ref, a config? I think that a branch is best - it is > > relatively well-understood and any hooks there can be > > version-controlled (and its history is independent of the other > > branches). > > First, unlike brian I don't (I hope I'm fairly summarizing his view > here) disagree mostly or entirely with the existence of such a feature > at all. I mean, I get the viewpoint that git shouldn't bless what > amounts to an active RCE from the remote. > > I just think that we could probably do a better job of it than what > people are doing in practice, and I've seen people do stuff like have > build systems setup permanent symlinks to git-hooks/<some-name> in the > tracked dir. We could at least envision a git-native implementation > asking the user "do you want this hook update? <shows diff>". > > I just find this design approach completely bizarre as noted (probably > in less blunt words) in the linked E-Mail. That's fair. You suggest an alternative below (and maybe more in the linked e-mail) - let's look at your suggestion... > We have Emily's series to convert hooks to be config driven that we hope > to land in some form, at that point they won't be any more of a special > snowflake than any other config. > > And then, instead of doing what I'd think would be the natural result of > that: Simply supporting an in-repo top-level ".gitconfig" file. We're > still going to seemingly forever have them be an even more special > snowflake with this facility, and the reason seems to be mostly/entirely > to do with working around some aspect or restriction of Google's > internal infrastructure. I don't think that this is "natural". In particular, I still don't think that hooks should be tied to code revision. E.g. if we make commits based on an old revision and push them, we still want them to follow the latest requirements. > I think it's just un-git-y to have a meta-branch that in some way drives > not only all other branches, but all other revisions of all branches, > ever. > > It breaks expectations around git in lots of different ways, you can't > fetch a single branch and get its hooks, Are you saying that each branch should have its own hooks? That might be reasonable in certain projects, but I don't see how that is a Git expectation. > you can't locally alter, commit > and update your hooks while e.g. renaming a "t/" directory to "test/"; > your hooks and code can't be atomically changed). I still think that hooks should work independent of code versions, so I wouldn't think that atomicity here is important. > I think I get why you want to do it that way, I just don't get why, as > mostly noted in those earlier rounds why it wouldn't be a better > approach / more straightforward / more git-y to: > > 1. Work on getting hooks driven by config <this is happening with > Emily's series / my split-out "base" topic> > 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety > valves etc. around this, I suggested starting with a whitelist of the > N least dangerous config options, e.g. some diff viewing options, or > a suggested sendemail.to or whatever. I've replied to this above. > 3. Work our way up to trusting that for more dangerous stuff, eventually > hooks. Most of the legitimate concerns from others with this is > having some UX where our users won't be trained to just blindly say > "yes" to an alias/hook config that "rm -rf's /" or whatever. > > If we start experimenting with that with aliases or hooks that can > run arbitrary code it's like handing a toddlder a shotgun, let's at > least start with a sharp fork or something (less dangerous config) :) > > 4. People who want this "I want my hooks to apply to all revisions ever" > could probably get 99% or 100% of what they want if their hook is > just a stub that does the equivalent of: > > sh `curl https://git.google.com/$reponame/hooks/$hookname` > > You'd then simply forbid on your servers any changes to a .gitconfig > that did anything with the hook.* namespace. This would work if set in .git/config (not version controlled), but not .gitconfig (version controlled). > With such an implementation you don't need a magic > "refs/remotes/origin/suggested-hooks" refs, just some state machine (I > suggested e.g. GPG signing chains as an eventual end-state, but "show a > diff every time" would also do) that keeps track of what config (and > hooks are just one such case) has been OK'd, and which has not. This sounds complicated. > I'd think it would even work better in the Googleplex, you could clone a > co-worker's branch and execute their hooks, since they're the same as > what you've pre-approved, In the presence of .gitconfig, how would you know? > you could even clone some random person's fork > of a "blessed" project, because the hooks would be the same `sh $(curl > <url I already trust>)`. That validation could even be a system-level > in-config hook on your laptop, thus bringing the whole thing full > circle... Same here. In summary, I think your point of using hook configs + remote-suggested configs instead of remote-suggested hooks is a reasonable one, but I disagree with your reasons (or, at least, your reasons as I understand them).
On Mon, Jun 21 2021, Jonathan Tan wrote: >> On Wed, Jun 16 2021, Jonathan Tan wrote: >> >> > This is a continuation of the work from [1]. That work described the >> > reasons, possible features, and possible workflows, but not the >> > implementation in detail. This patch set has an MVP implementation, and >> > my hope is that having a concrete implementation to look at makes it >> > easier to discuss matters of implementation. >> >> My C on this RFC is: >> >> 1) A request that someone reply (there or here would do) to my comments >> on the last iteration of this at: >> https://lore.kernel.org/git/874kghk906.fsf@evledraar.gmail.com/ > > OK - I'll take a look at that. > >> 2) I think you'd get better feedback if you CC'd the people who've been >> actively discussing this in previous rounds. > > Good point. > >> > Design choices: >> > >> > 1. Where should the suggested hooks be in the remote repo? A branch, >> > a non-branch ref, a config? I think that a branch is best - it is >> > relatively well-understood and any hooks there can be >> > version-controlled (and its history is independent of the other >> > branches). >> >> First, unlike brian I don't (I hope I'm fairly summarizing his view >> here) disagree mostly or entirely with the existence of such a feature >> at all. I mean, I get the viewpoint that git shouldn't bless what >> amounts to an active RCE from the remote. >> >> I just think that we could probably do a better job of it than what >> people are doing in practice, and I've seen people do stuff like have >> build systems setup permanent symlinks to git-hooks/<some-name> in the >> tracked dir. We could at least envision a git-native implementation >> asking the user "do you want this hook update? <shows diff>". >> >> I just find this design approach completely bizarre as noted (probably >> in less blunt words) in the linked E-Mail. > > That's fair. You suggest an alternative below (and maybe more in the > linked e-mail) - let's look at your suggestion... > >> We have Emily's series to convert hooks to be config driven that we hope >> to land in some form, at that point they won't be any more of a special >> snowflake than any other config. >> >> And then, instead of doing what I'd think would be the natural result of >> that: Simply supporting an in-repo top-level ".gitconfig" file. We're >> still going to seemingly forever have them be an even more special >> snowflake with this facility, and the reason seems to be mostly/entirely >> to do with working around some aspect or restriction of Google's >> internal infrastructure. > > I don't think that this is "natural". In particular, I still don't think > that hooks should be tied to code revision. E.g. if we make commits > based on an old revision and push them, we still want them to follow the > latest requirements. Even for real-world centralized workflow situations where I've seen people think they want that, and the end of the day they almost never actually want that. Even something like code linting is a good example, to make it Google-specific: say for a Go project: Are you going to pin your linting tool/version to whatever understood your YAML format for the linter as it was specced 10 years ago when the project started? It's simply a giant hassle to have a piece of code operate on every version of your project ever in a way that doesn't break. I think in practice the designers of this feature don't actually have that in mind, but a "close to trunk" workflow, where you'd expect a hook to only need to operate on revisions for the last few weeks or months, because that'll be the oldest think people create new topics from. But I think the burden of proof is really on the other side here, something that works entirely differently than the rest of git needs to have a good reason. Our in-repo .gitattributes don't work like this, nor .gitignore, .mailmap etc. There's also real world uses of git where the "branches" are wildly divergent, e.g. I've worked on a system automation repo where the "master" was just a stub template, and every team had their own almost entirely different "repo-like" branch. Probably a bad idea for various reasons, but Git supports it just fine. For the centralized use-case what's the problem with just having the hook do a 'for-each-ref --format=' invocation or "cat-file --batch" on the "origin", and eval what it finds there? I'd think that gives you what you want for the more centarlized workflow, while leaving git's implementation working like the rest of our in-repo files. >> I think it's just un-git-y to have a meta-branch that in some way drives >> not only all other branches, but all other revisions of all branches, >> ever. >> >> It breaks expectations around git in lots of different ways, you can't >> fetch a single branch and get its hooks, > > Are you saying that each branch should have its own hooks? That might be > reasonable in certain projects, but I don't see how that is a Git > expectation. It's a git expectation now that I can add git.git as a remote, also chromium.git, and linux.git, fetch them all, and happily switch in the same repo between entirely different codebases that don't share a history. >> you can't locally alter, commit >> and update your hooks while e.g. renaming a "t/" directory to "test/"; >> your hooks and code can't be atomically changed). > > I still think that hooks should work independent of code versions, so I > wouldn't think that atomicity here is important. Covered above. >> I think I get why you want to do it that way, I just don't get why, as >> mostly noted in those earlier rounds why it wouldn't be a better >> approach / more straightforward / more git-y to: >> >> 1. Work on getting hooks driven by config <this is happening with >> Emily's series / my split-out "base" topic> >> 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety >> valves etc. around this, I suggested starting with a whitelist of the >> N least dangerous config options, e.g. some diff viewing options, or >> a suggested sendemail.to or whatever. > > I've replied to this above. Not really, even if we went for this one-HEAD-version-to-rule-them-all plan wouldn't it make more sense to generalize it as a refs/remotes/origin/magic-config, and we'd discover a ".gitconfig" file under that commit/tree. I.e. whether we generalize this to config in general is orthagonal to whether such config lives in HEAD or in a magic ref. With hooks as config I don't see how you'd make any of this hook-specific, there's other config where the "every revision ever" applies much more strongly, e.g. sendemail.to. If that changed for this project tomorrow you wouldn't want a patch based on "maint" to send things to a different ML. >> 3. Work our way up to trusting that for more dangerous stuff, eventually >> hooks. Most of the legitimate concerns from others with this is >> having some UX where our users won't be trained to just blindly say >> "yes" to an alias/hook config that "rm -rf's /" or whatever. >> >> If we start experimenting with that with aliases or hooks that can >> run arbitrary code it's like handing a toddlder a shotgun, let's at >> least start with a sharp fork or something (less dangerous config) :) >> >> 4. People who want this "I want my hooks to apply to all revisions ever" >> could probably get 99% or 100% of what they want if their hook is >> just a stub that does the equivalent of: >> >> sh `curl https://git.google.com/$reponame/hooks/$hookname` >> >> You'd then simply forbid on your servers any changes to a .gitconfig >> that did anything with the hook.* namespace. > > This would work if set in .git/config (not version controlled), but not > .gitconfig (version controlled). Sorry, what wouldn't work? I meant you'd forbid pushes to your in-repo .gitconfig in your "master" branch or whatever, just like you're presumably planning some stronger ACLs for this magic hook branch. >> With such an implementation you don't need a magic >> "refs/remotes/origin/suggested-hooks" refs, just some state machine (I >> suggested e.g. GPG signing chains as an eventual end-state, but "show a >> diff every time" would also do) that keeps track of what config (and >> hooks are just one such case) has been OK'd, and which has not. > > This sounds complicated. On the contrary I think anything that leans into git's content-addressable security model is way less complicated. You don't care who you fetched Junio's v2.32.0 tag from, what matters is that the signing chain validates. The plan of having this magic branch means a whole new trust model for git, you trust magical authorized remotes. If you trust signed content chains you can trust hooks if their last modification can be traced to a signing authority you trust. It's really just: if (hook_content_changed() && hook_content_same_as_in_ok'd_revision_from_upsteam()) trust_hooks(); But while we're on the subject, it seems like a very generous assumption to think that just because you trust hooks at a given revision (or always trust the latest), that you implicitly trust them when *combined with* all past and future revisions from the same repository. Even without a malicious actor that seems like it'll inevitably break in all sorts of data-destroying ways. E.g. people commit stuff accidentally. A hook run under a "git bisect" that naïvely does an "rm *" will eat your data if you land on a revision that an in-tree "-rf" file. But once you get to a malicious actor who can say push a topic branch but not hook updates, will your hooks deal with files with whitespace in them, arbitrary crafted content etc? So I'd think that's an even better reason to prefer the in-repo per-revision atomically committed plan, and only trust hooks for the revision they're shipped with, at least as a default git security model. >> I'd think it would even work better in the Googleplex, you could clone a >> co-worker's branch and execute their hooks, since they're the same as >> what you've pre-approved, > > In the presence of .gitconfig, how would you know? If it's the same config, or you can automatically OK it. So "same" was discussed above, or you could trust any hook that's only doing a wget of some trusted domain and piping that to "sh". >> you could even clone some random person's fork >> of a "blessed" project, because the hooks would be the same `sh $(curl >> <url I already trust>)`. That validation could even be a system-level >> in-config hook on your laptop, thus bringing the whole thing full >> circle... > > Same here. > > In summary, I think your point of using hook configs + remote-suggested > configs instead of remote-suggested hooks is a reasonable one, but I > disagree with your reasons (or, at least, your reasons as I understand > them). You trust e.g. chromium.git's hooks, but I clone it, patch it, and re-push it to somegithost.com URL. If you go with trusting content it becomes easy to install those trusted hooks for the common case, but not if your entire trust model relies on what URL you git clone'd from.
On 2021-06-20 at 19:51:04, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Jun 16 2021, Jonathan Tan wrote: > > > This but does not use any features from es/config-based-hooks but is > > implemented on that branch anyway because firstly, I need an existing > > command to attach the "autoupdate" subcommand (and "git hook" works), > > and secondly, when we test this at $DAYJOB, we will be testing it > > together with the aforementioned branch. > > > > I have had to make several design choices (which I will discuss later), > > but now with this implementation, the following workflow is possible: > > > > 1. The remote repo administrator creates a new branch > > "refs/heads/suggested-hooks" pointing to a commit that has all the > > hooks that the administrator wants to suggest. The hooks are > > directly referenced by the commit tree (i.e. they are in the "/" > > directory). > > > > 2. When a user clones, Git notices that > > "refs/remotes/origin/suggested-hooks" is present and prints out a > > message about a command that can be run. > > > > 3. If the user runs that command, Git will install the hooks pointed to > > by that ref, and set hook.autoupdate to true. This config variable > > is checked whenever "git fetch" is run: whenever it notices that > > "refs/remotes/origin/suggested-hooks" changes, it will reinstall the > > hooks. > > > > 4. To turn off autoupdate, set hook.autoupdate to false. Existing hooks > > will remain. > > > > Design choices: > > > > 1. Where should the suggested hooks be in the remote repo? A branch, > > a non-branch ref, a config? I think that a branch is best - it is > > relatively well-understood and any hooks there can be > > version-controlled (and its history is independent of the other > > branches). > > First, unlike brian I don't (I hope I'm fairly summarizing his view > here) disagree mostly or entirely with the existence of such a feature > at all. I mean, I get the viewpoint that git shouldn't bless what > amounts to an active RCE from the remote. It's accurate that I'm generally opposed to such a feature. I feel that suggesting people install hooks is likely to lead to social engineering attacks, and it's also likely to lead to bad practices such as the expectation that all developers will install hooks or the use of hooks instead of CI or other effective controls. If we do add this feature (which, as I said, I'm opposed to) and we decide to store it in a ref, that ref should not be a normal branch by default (it should be a special one-level ref, like refs/stash or such), and the ref name should be configurable. Not all developers use English as their working language and we should respect that. In addition, there should be an advice.* option that allows people to turn this off once and for all, and it should be clearly documented. Ideally it should be off by default. > I think I get why you want to do it that way, I just don't get why, as > mostly noted in those earlier rounds why it wouldn't be a better > approach / more straightforward / more git-y to: > > 1. Work on getting hooks driven by config <this is happening with > Emily's series / my split-out "base" topic> > 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety > valves etc. around this, I suggested starting with a whitelist of the > N least dangerous config options, e.g. some diff viewing options, or > a suggested sendemail.to or whatever. This also makes me deeply nervous for much of the same reasons. There are situations where e.g. ignoring whitespace can lead to security problems in code review (think Python), and in general it's hard to reason about all the ways people can do malicious things. Typically adding untrusted config ends poorly (think of all the modeline vulnerabilities in Vim). I'd definitely want support for this to be off with no prompting by default.
> >> And then, instead of doing what I'd think would be the natural result of > >> that: Simply supporting an in-repo top-level ".gitconfig" file. We're > >> still going to seemingly forever have them be an even more special > >> snowflake with this facility, and the reason seems to be mostly/entirely > >> to do with working around some aspect or restriction of Google's > >> internal infrastructure. > > > > I don't think that this is "natural". In particular, I still don't think > > that hooks should be tied to code revision. E.g. if we make commits > > based on an old revision and push them, we still want them to follow the > > latest requirements. > > Even for real-world centralized workflow situations where I've seen > people think they want that, and the end of the day they almost never > actually want that. > > Even something like code linting is a good example, to make it > Google-specific: say for a Go project: Are you going to pin your linting > tool/version to whatever understood your YAML format for the linter as > it was specced 10 years ago when the project started? It's simply a > giant hassle to have a piece of code operate on every version of your > project ever in a way that doesn't break. > > I think in practice the designers of this feature don't actually have > that in mind, but a "close to trunk" workflow, where you'd expect a hook > to only need to operate on revisions for the last few weeks or months, > because that'll be the oldest think people create new topics from. This is probably true, but it still means that people would want any hook changes to work retroactively, even if it's just on code for the last few months. So they still need some independence between the hook and the main code. > But I think the burden of proof is really on the other side here, > something that works entirely differently than the rest of git needs to > have a good reason. Our in-repo .gitattributes don't work like this, nor > .gitignore, .mailmap etc. Not all of Git works this way though - e.g. hooks themselves, and config. > There's also real world uses of git where the "branches" are wildly > divergent, e.g. I've worked on a system automation repo where the > "master" was just a stub template, and every team had their own almost > entirely different "repo-like" branch. Probably a bad idea for various > reasons, but Git supports it just fine. I presume this repo either does not use hooks, or its hooks are built for such a branch setup? I envision that the admins of the remote would only suggest hooks that work with their branch setup. > For the centralized use-case what's the problem with just having the > hook do a 'for-each-ref --format=' invocation or "cat-file --batch" on > the "origin", and eval what it finds there? I'd think that gives you > what you want for the more centarlized workflow, while leaving git's > implementation working like the rest of our in-repo files. This would future-proof such a hook, but not make it work retroactively in the past. (The repo admin could just include no-op hook for all hooks to future-proof, I guess, but this wouldn't work if we ever introduced new hooks.) > >> I think it's just un-git-y to have a meta-branch that in some way drives > >> not only all other branches, but all other revisions of all branches, > >> ever. > >> > >> It breaks expectations around git in lots of different ways, you can't > >> fetch a single branch and get its hooks, > > > > Are you saying that each branch should have its own hooks? That might be > > reasonable in certain projects, but I don't see how that is a Git > > expectation. > > It's a git expectation now that I can add git.git as a remote, also > chromium.git, and linux.git, fetch them all, and happily switch in the > same repo between entirely different codebases that don't share a > history. This expectation holds only if you know that your hooks can tolerate such a scenario. If you're using remote-suggested hooks, I don't think it's a matter of "before, I could have many remotes but now I cannot" but "before, I had to install these hooks myself and now it's automatic" - whether this workflow is supported or not would be the same before and now. > >> I think I get why you want to do it that way, I just don't get why, as > >> mostly noted in those earlier rounds why it wouldn't be a better > >> approach / more straightforward / more git-y to: > >> > >> 1. Work on getting hooks driven by config <this is happening with > >> Emily's series / my split-out "base" topic> > >> 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety > >> valves etc. around this, I suggested starting with a whitelist of the > >> N least dangerous config options, e.g. some diff viewing options, or > >> a suggested sendemail.to or whatever. > > > > I've replied to this above. > > Not really, even if we went for this one-HEAD-version-to-rule-them-all > plan wouldn't it make more sense to generalize it as a > refs/remotes/origin/magic-config, and we'd discover a ".gitconfig" file > under that commit/tree. > > I.e. whether we generalize this to config in general is orthagonal to > whether such config lives in HEAD or in a magic ref. > > With hooks as config I don't see how you'd make any of this > hook-specific, there's other config where the "every revision ever" > applies much more strongly, e.g. sendemail.to. If that changed for this > project tomorrow you wouldn't want a patch based on "maint" to send > things to a different ML. My opposition to .gitconfig was that it is per-commit, but here it seems that you're saying that there are reasons for it not to be per-commit (e.g. the sendemail example). > >> 4. People who want this "I want my hooks to apply to all revisions ever" > >> could probably get 99% or 100% of what they want if their hook is > >> just a stub that does the equivalent of: > >> > >> sh `curl https://git.google.com/$reponame/hooks/$hookname` > >> > >> You'd then simply forbid on your servers any changes to a .gitconfig > >> that did anything with the hook.* namespace. > > > > This would work if set in .git/config (not version controlled), but not > > .gitconfig (version controlled). > > Sorry, what wouldn't work? I meant you'd forbid pushes to your in-repo > .gitconfig in your "master" branch or whatever, just like you're > presumably planning some stronger ACLs for this magic hook branch. The "I want my hooks to apply to all revisions ever" wouldn't work if it was per-commit .gitconfig. (It would work if it was in the magic branch, but at that time, what I understood by ".gitconfig" is the per-commit version.) > >> With such an implementation you don't need a magic > >> "refs/remotes/origin/suggested-hooks" refs, just some state machine (I > >> suggested e.g. GPG signing chains as an eventual end-state, but "show a > >> diff every time" would also do) that keeps track of what config (and > >> hooks are just one such case) has been OK'd, and which has not. > > > > This sounds complicated. > > On the contrary I think anything that leans into git's > content-addressable security model is way less complicated. You don't > care who you fetched Junio's v2.32.0 tag from, what matters is that the > signing chain validates. > > The plan of having this magic branch means a whole new trust model for > git, you trust magical authorized remotes. If you trust signed content > chains you can trust hooks if their last modification can be traced to a > signing authority you trust. The security model might be preexisting, but the things needing to be built around it make it more complicated. Take this case - I presume this means that the client would need to tell Git that a certain public key is considered trusted (perhaps prompted by the server upon clone, so the client would only need to copy and paste "git trust-key $HTTPS_URL" or something like that). Whenever a user wants to update a hook, they will make a commit, and when the PR is merged, the merge commit (or the tip commit if fast-forwarded) must be signed with the same key. Compare this with trusting that commits coming from a certain HTTPS URL are fine. The workflows are the same as for a regular code commit. > It's really just: > > if (hook_content_changed() && hook_content_same_as_in_ok'd_revision_from_upsteam()) > trust_hooks(); > > But while we're on the subject, it seems like a very generous assumption > to think that just because you trust hooks at a given revision (or > always trust the latest), that you implicitly trust them when *combined > with* all past and future revisions from the same repository. > > Even without a malicious actor that seems like it'll inevitably break in > all sorts of data-destroying ways. E.g. people commit stuff > accidentally. A hook run under a "git bisect" that navely does an "rm > *" will eat your data if you land on a revision that an in-tree "-rf" > file. I think we have the same problem with current hooks and other config, and we have been dealing with them relatively well (as far as I know). > But once you get to a malicious actor who can say push a topic branch > but not hook updates, will your hooks deal with files with whitespace in > them, arbitrary crafted content etc? > > So I'd think that's an even better reason to prefer the in-repo > per-revision atomically committed plan, and only trust hooks for the > revision they're shipped with, at least as a default git security model. The malicious actor could include the same hook in their topic branch, so I don't see how the per-revision hook is better than the all-revision hook. > >> I'd think it would even work better in the Googleplex, you could clone a > >> co-worker's branch and execute their hooks, since they're the same as > >> what you've pre-approved, > > > > In the presence of .gitconfig, how would you know? > > If it's the same config, or you can automatically OK it. So "same" was > discussed above, or you could trust any hook that's only doing a wget of > some trusted domain and piping that to "sh". Extending trust to hooks that are exactly the same makes sense. > You trust e.g. chromium.git's hooks, but I clone it, patch it, and > re-push it to somegithost.com URL. If you go with trusting content it > becomes easy to install those trusted hooks for the common case, but not > if your entire trust model relies on what URL you git clone'd from. If the repo administrator provides a key and signs their hooks, then the repo itself is indeed more portable and does not need to come from a single source. This may be useful, but I'm not sure how useful this is, since hooks are typically provided by projects. And as far as I know projects generally have one source of truth, so the portability is not a big plus.
> > First, unlike brian I don't (I hope I'm fairly summarizing his view > > here) disagree mostly or entirely with the existence of such a feature > > at all. I mean, I get the viewpoint that git shouldn't bless what > > amounts to an active RCE from the remote. > > It's accurate that I'm generally opposed to such a feature. From emails like [1], I think that you have understood both the pros and cons, and decided that the cons outweigh the pros, which is fair. I'll reply so that others can know what I think about these points. [1] https://lore.kernel.org/git/YGzrfaSC4xd75j2U@camp.crustytoothpaste.net/ > I feel that > suggesting people install hooks is likely to lead to social engineering > attacks, I think that this vector of social engineering attack already exists, as there are projects that without malice recommend hook installation (so an attacker could masquerade as such). It is true that Git itself printing a message recommending hook installation perhaps could lend a false sense of security, but at least we can control what that message says (as opposed to a project recommending a third-party tool, all with messaging out of our control). > and it's also likely to lead to bad practices such as the > expectation that all developers will install hooks or the use of hooks > instead of CI or other effective controls. I agree that hooks can be overused or misused, but there are still legitimate uses of it that a project might want to use. > If we do add this feature (which, as I said, I'm opposed to) and we > decide to store it in a ref, that ref should not be a normal branch by > default (it should be a special one-level ref, like refs/stash or such), Any particular reason not to expose it as a branch (besides following from your general idea that a user should seek out such a feature and not have it presented to them up-front)? > and the ref name should be configurable. Not all developers use English > as their working language and we should respect that. That makes sense. > In addition, there should be an advice.* option that allows people to > turn this off once and for all, and it should be clearly documented. > Ideally it should be off by default. I don't think this would be considered "advice" like the other options, but having an option to turn this off once and for all makes sense. Making it off by default would probably mean that projects that use such hooks would recommend cloning with "git -c my-config=1 clone $URL", but perhaps that's OK. > > I think I get why you want to do it that way, I just don't get why, as > > mostly noted in those earlier rounds why it wouldn't be a better > > approach / more straightforward / more git-y to: > > > > 1. Work on getting hooks driven by config <this is happening with > > Emily's series / my split-out "base" topic> > > 2. Have a facility to read an in-repo '.gitconfig'; have lots of safety > > valves etc. around this, I suggested starting with a whitelist of the > > N least dangerous config options, e.g. some diff viewing options, or > > a suggested sendemail.to or whatever. > > This also makes me deeply nervous for much of the same reasons. There > are situations where e.g. ignoring whitespace can lead to security > problems in code review (think Python), and in general it's hard to > reason about all the ways people can do malicious things. Typically > adding untrusted config ends poorly (think of all the modeline > vulnerabilities in Vim). > > I'd definitely want support for this to be off with no prompting by > default. To use your example, the model we're proposing is more of only using the modelines from sources we trust - as opposed to ensuring that all possible options set by modelines are benign. Admittedly, the administrator of the source may have difficulty ensuring that bad code doesn't slip through code review, for example, but that is a problem they already deal with (at least for projects with any form of executable code in them, e.g. production code or a build script).
On 2021-06-23 at 22:58:09, Jonathan Tan wrote: > > If we do add this feature (which, as I said, I'm opposed to) and we > > decide to store it in a ref, that ref should not be a normal branch by > > default (it should be a special one-level ref, like refs/stash or such), > > Any particular reason not to expose it as a branch (besides following > from your general idea that a user should seek out such a feature and > not have it presented to them up-front)? Branches are for the main code of the project. While it's possible to have orphan branches that do other things, I think that's in general an anti-pattern, and using a special ref for things which are separate and independent from the main code of the repository would be a more elegant solution. > > In addition, there should be an advice.* option that allows people to > > turn this off once and for all, and it should be clearly documented. > > Ideally it should be off by default. > > I don't think this would be considered "advice" like the other options, > but having an option to turn this off once and for all makes sense. > Making it off by default would probably mean that projects that use such > hooks would recommend cloning with "git -c my-config=1 clone $URL", but > perhaps that's OK. Sure, I'm not picky about what it looks like in "advice" vs something else. I think forcing projects to explicitly opt-in to this behavior means that the social engineering and security problems are much reduced, and while I'm still not wild about the idea, I would feel much better about it. > > This also makes me deeply nervous for much of the same reasons. There > > are situations where e.g. ignoring whitespace can lead to security > > problems in code review (think Python), and in general it's hard to > > reason about all the ways people can do malicious things. Typically > > adding untrusted config ends poorly (think of all the modeline > > vulnerabilities in Vim). > > > > I'd definitely want support for this to be off with no prompting by > > default. > > To use your example, the model we're proposing is more of only using the > modelines from sources we trust - as opposed to ensuring that all > possible options set by modelines are benign. Admittedly, the > administrator of the source may have difficulty ensuring that bad code > doesn't slip through code review, for example, but that is a problem > they already deal with (at least for projects with any form of > executable code in them, e.g. production code or a build script). As I think I've previously mentioned, I don't want to receive configuration of my development environment from sources I trust. Even at work, I don't want the repositories I work with to modify my development environment in this way. I tend to have a highly customized configuration that breaks many people's expectations about tooling, so the cases that this isn't a security problem (in repositories I trust) can still result in a functionality problem. Also, since we don't know what repositories the user trusts, the only safe assumption is that the user trusts nothing unless they explicitly tell us.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > If we do add this feature (which, as I said, I'm opposed to) and we > decide to store it in a ref, that ref should not be a normal branch by > default (it should be a special one-level ref, like refs/stash or such), > and the ref name should be configurable. Not all developers use English > as their working language and we should respect that. Just this part. I am not sure what you are trying to achieve by requiring it to be configurable. After all, even for names of branches that are used to store the code, which is of more significance than this "unusual" ref, we've lived with a hardcoded 'master' and with the server-end advertisability of the configured values (i.e. "git clone" was designed to figure out if the origin used a custom name for the primary line of history and use the same name from the beginning), the end-user sitting on the receiving end did not have to do anything special even when the project wanted to use a custom name. But this "unusual" ref would not have a natural equivalent of "the origin side can point the primary branch with its HEAD", so by allowing it to be configurable, you are pretty much closing the door for those who blindly honor whatever the origin tells them to use when running "git clone" from doing so. I agree that it is a good security measure (and I am not sold to the "remote suggested hooks" idea in the first place), but is that the real reason why you suggested configurability?