[v7,01/17] doc: propose hooks managed by the config

Message ID	20201222000220.1491091-2-emilyshaffer@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Sender: "emilyshaffer via sendgmr" <emilyshaffer@podkayne.svl.corp.google.com> Date: Mon, 21 Dec 2020 16:02:04 -0800 In-Reply-To: <20201222000220.1491091-1-emilyshaffer@google.com> Message-Id: <20201222000220.1491091-2-emilyshaffer@google.com> Mime-Version: 1.0 References: <20201222000220.1491091-1-emilyshaffer@google.com> Subject: [PATCH v7 01/17] doc: propose hooks managed by the config From: Emily Shaffer <emilyshaffer@google.com> To: git@vger.kernel.org Cc: Emily Shaffer <emilyshaffer@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	propose config-based hooks (part I) \| expand [v7,00/17] propose config-based hooks (part I) [v7,01/17] doc: propose hooks managed by the config [v7,02/17] hook: scaffolding for git-hook subcommand [v7,03/17] hook: add list command [v7,04/17] hook: include hookdir hook in list [v7,05/17] hook: respect hook.runHookDir [v7,06/17] hook: implement hookcmd.<name>.skip [v7,07/17] parse-options: parse into strvec [v7,08/17] hook: add 'run' subcommand [v7,09/17] hook: replace find_hook() with hook_exists() [v7,10/17] hook: support passing stdin to hooks [v7,11/17] run-command: allow stdin for run_processes_parallel [v7,12/17] hook: allow parallel hook execution [v7,13/17] hook: allow specifying working directory for hooks [v7,14/17] run-command: add stdin callback for parallelization [v7,15/17] hook: provide stdin by string_list or callback [v7,16/17] run-command: allow capturing of collated output [v7,17/17] hooks: allow callers to capture output [v3,18/17] doc: make git-hook.txt point of truth

Message ID

20201222000220.1491091-2-emilyshaffer@google.com (mailing list archive)

State

New, archived

Headers

Sender: "emilyshaffer via sendgmr"
        <emilyshaffer@podkayne.svl.corp.google.com>
Date: Mon, 21 Dec 2020 16:02:04 -0800
In-Reply-To: <20201222000220.1491091-1-emilyshaffer@google.com>
Message-Id: <20201222000220.1491091-2-emilyshaffer@google.com>
Mime-Version: 1.0
References: <20201222000220.1491091-1-emilyshaffer@google.com>
Subject: [PATCH v7 01/17] doc: propose hooks managed by the config
From: Emily Shaffer <emilyshaffer@google.com>
To: git@vger.kernel.org
Cc: Emily Shaffer <emilyshaffer@google.com>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

Series

propose config-based hooks (part I) | expand

Commit Message

Emily Shaffer Dec. 22, 2020, 12:02 a.m. UTC

Begin a design document for config-based hooks, managed via git-hook.
Focus on an overview of the implementation and motivation for design
decisions. Briefly discuss the alternatives considered before this
point. Also, attempt to redefine terms to fit into a multihook world.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---

Notes:
    Since v6, checked for inconsistencies with implementation and added lots of
    caveats about whether 'git hook add' and 'git hook edit' will ever materialize.
    
    Hopefully this reflects reality now; please review accordingly.
    
    Since v6, checked for inconsistencies with implementation and added lots of
    caveats about whether 'git hook add' and 'git hook edit' will ever materialize.
    
    Hopefully this reflects reality now; please review accordingly.
    
    Since v4, addressed comments from Jonathan Tan about wording. However, I have
    not addressed AEvar's comments or done a full re-review of this document.
    I wanted to get the rest of the series out for initial review first.
    
     - Emily
    
    Since v4, addressed comments from Jonathan Tan about wording.

 Documentation/Makefile                        |   1 +
 .../technical/config-based-hooks.txt          | 355 ++++++++++++++++++
 2 files changed, 356 insertions(+)
 create mode 100644 Documentation/technical/config-based-hooks.txt

Comments

Ævar Arnfjörð Bjarmason Jan. 23, 2021, 3:38 p.m. UTC | #1

On Tue, Dec 22 2020, Emily Shaffer wrote:

>     Since v4, addressed comments from Jonathan Tan about wording. However, I have
>     not addressed AEvar's comments or done a full re-review of this document.
>     I wanted to get the rest of the series out for initial review first.

As you note here and in a couple of other patch notes a lot of the
current state was based on my v4 commentary. I see we have parallel hook
execution by default now, woohoo!

I've been keeping an eye on this series, but have been kicking the can
down the road on reviewing it.

Skimming it now I think the state of it looks mostly good now when
viewing the end result, I think it's mainly got one big problem, but the
good news is that it's relatively easy to solve :)

Which is that I think it's really hard to follow along with it because
01/17 starts with a big design doc that's partially outdated, and
partially saying things that aren't in or should be in either a
user-facing doc or commit message.

And then individual patches (e.g. 12/17) either don't have tests
associated with them to test the feature they add, don't update/add
docs, or the docs are at the very beginning.

I think we should aim to mostly or entirely get rid of
Documentation/technical/config-based-hooks.txt, it was more of a "what
about this design?" document in the beginning.

In a series we'd apply most or all of it should really be in end-user
doc (and stuff like "Future work" can just be noted in commit messages
as we go along).

So long story short, I started trying to review this, but found myself
trying to reply to one patch and then grabbing docs from 01/17, or
(e.g. for the parallel stuff) not having tests and starting to come up
with them myself.

So I thought I'd send this E-Mail instead as prodding to maybe convince
you to re-roll it again to make it easier to follow along in a piecemeal
fashion.

As noted before I'm happy to help with this series if needed. I just
thought I'd send this first given that it's been a month since the last
submission, perhaps you've got some more local WIP changes by now...

Emily Shaffer Jan. 29, 2021, 11:52 p.m. UTC | #2

On Sat, Jan 23, 2021 at 04:38:31PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Tue, Dec 22 2020, Emily Shaffer wrote:
> 
> >     Since v4, addressed comments from Jonathan Tan about wording. However, I have
> >     not addressed AEvar's comments or done a full re-review of this document.
> >     I wanted to get the rest of the series out for initial review first.
> 
> As you note here and in a couple of other patch notes a lot of the
> current state was based on my v4 commentary. I see we have parallel hook
> execution by default now, woohoo!
> 
> I've been keeping an eye on this series, but have been kicking the can
> down the road on reviewing it.
> 
> Skimming it now I think the state of it looks mostly good now when
> viewing the end result, I think it's mainly got one big problem, but the
> good news is that it's relatively easy to solve :)
> 
> Which is that I think it's really hard to follow along with it because
> 01/17 starts with a big design doc that's partially outdated, and
> partially saying things that aren't in or should be in either a
> user-facing doc or commit message.

Sure.

> 
> And then individual patches (e.g. 12/17) either don't have tests
> associated with them to test the feature they add, don't update/add
> docs, or the docs are at the very beginning.

Thanks, sure.

> 
> I think we should aim to mostly or entirely get rid of
> Documentation/technical/config-based-hooks.txt, it was more of a "what
> about this design?" document in the beginning.

I'm not 100% sure that I agree - there are a couple other design docs in
Documentation/technical which I still refer to from time to time, e.g.
for sparse checkout. But I *do* agree that there's a lot of info there
that needs to be in end-user docs.

> 
> In a series we'd apply most or all of it should really be in end-user
> doc (and stuff like "Future work" can just be noted in commit messages
> as we go along).
> 
> So long story short, I started trying to review this, but found myself
> trying to reply to one patch and then grabbing docs from 01/17, or
> (e.g. for the parallel stuff) not having tests and starting to come up
> with them myself.

Yeah. A related issue I could imagine, although not what you mentioned
here, is needing to do the same thing between part I and part II of the
series, as I often added some functionality late in part I and then used
it first late in part II. I don't think this is worth reordering for,
but probably better notes would be handy.

> 
> So I thought I'd send this E-Mail instead as prodding to maybe convince
> you to re-roll it again to make it easier to follow along in a piecemeal
> fashion.
> 
> As noted before I'm happy to help with this series if needed. I just
> thought I'd send this first given that it's been a month since the last
> submission, perhaps you've got some more local WIP changes by now...

The biggest help for me would be review focused on nits, code style,
missed free(), etc. - I still haven't gotten that kind of review yet,
and I feel the series as it is now is pretty much "feature complete".

In fact, and I'll mention this in reply to the cover letter in a moment,
we picked this series up as-is from Junio's branch in 'gitster/git' and
have been using it at Google for a couple of weeks now - primarily with
users running their legacy hooks living in hookdir, but also with a
subset of users encouraged to try out the config functionality.

Thanks for the bump.
 - Emily

Junio C Hamano Feb. 1, 2021, 10:11 p.m. UTC | #3

Emily Shaffer <emilyshaffer@google.com> writes:

> +ways, providing an avenue to deprecate these "legacy" hooks if desired. The
> +handling is based on a config `hook.runHookDir`, which is checked against a
> +number of cases:

Don't we want to also warn when the setting "no" or something
similar prevents the legacy hook from running, to help users
who wonder why their hook scripts are not running?  I.e.

> +- "no": the legacy hook will not be run

+- "warn-no": Git will print a warning to stderr before ignoring the
+  legacy hook

> +- "interactive": Git will prompt the user before running the legacy hook
> +- "warn": Git will print a warning to stderr before running the legacy hook
> +- "yes" (default): Git will silently run the legacy hook

> +In case this list is expanded in the future, if a value for `hook.runHookDir` is
> +given which Git does not recognize, Git should discard that config entry. For
> +example, if "warn" was specified at system level and "junk" was specified at
> +global level, Git would resolve the value to "warn"; if the only time the config
> +was set was to "junk", Git would use the default value of "yes".

Hmph, instead of complaining "value 'junk' is not recognized" and
erroring out?  Why?

> +[[stage-3]]
> +==== Stage 3
> +
> +`.git/hooks` is removed from the template and the hook directory is considered
> +deprecated. To avoid breaking older repos, the default of `hook.runHookDir` is
> +not changed, and `find_hook()` is not removed.

Presumably, we'll have documentation somewhere that instructs users
(who were taught by slashdot and other site to add certain scripts
under their .git/hooks/) how to do the equivalent without adding
scripts in .git/hooks/ directory and instead using the config
mechanism (e.g. "when told to add script X in .git/hooks/, read such
an instruction as if telling you to do Y instead") by the time this
happens?  It probably makes sense to do so as part of stage-2, at
which point the users are _ready_ to migrate.

> +[[security]]
> +=== Security and repo config
> +
> +Part of the motivation behind this refactor is to mitigate hooks as an attack
> +vector;footnote:[https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/]
> +however, as the design stands, users can still provide hooks in the repo-level
> +config, which is included when a repo is zipped and sent elsewhere.  The
> +security of the repo-level config is still under discussion; this design
> +generally assumes the repo-level config is secure, which is not true yet. The
> +goal is to avoid an overcomplicated design to work around a problem which has
> +ceased to exist.

I doubt we want to claim anything about security as part of this
series.  As you say in the paragraph, .git/config and .git/hooks/
are equally (un)protected and if we decide to punt .git/config
security, then not moving away from .git/hooks would not hurt
security-wise, either (in other words, security is not a viable
motivation behind this series).

And if we stop advertising 'security merit' that does not exist,
what remains?  Isn't the biggest selling point that an identical set
of hook configuration can be shared among multiple repositories, and
it allows more than one hook scripts to be triggered by a single
"hook event"?  There may be other good things we should be able to
sell the new mechanism to our users, and we do stress on them, which
is done in the motivation section.  So...

> +.Comparison of alternatives
> +|===
> +|Feature |Config-based hooks |Hook directories |Status quo

Sorry, but I did not find this table particularly convincing.

The only thing I sense is a hand-wavy desire that "we could make it
better than everybody else if we work on it in this area", which can
apply equally for other approaches---they could enhance what they
already have (e.g. "discoverability & documentation").

As a list of "these are the points we aspire to do better than other
people", I think it is an excellent idea to have a table like this
here in the documentation.  But that is not a "comparison".

> +[[execution-ordering]]
> +=== Execution ordering
> +
> +We may find that config order is insufficient for some users; for example,
> +config order makes it difficult to add a new hook to the system or global config
> +which runs at the end of the hook list. A new ordering schema should be:
> +
> +1) Specified by a `hook.order` config, so that users will not unexpectedly see
> +their order change;
> +
> +2) Either dependency or numerically based.
> +
> +Dependency-based ordering is prone to classic linked-list problems, like a
> +cycles and handling of missing dependencies. But, it paves the way for enabling
> +parallelization if some tasks truly depend on others.
> +
> +Numerical ordering makes it tricky for Git to generate suggested ordering
> +numbers for each command, but is easy to determine a definitive order.

OK.

Have we decided what we do for hooks whose interface is to feed
their input from their standard input?  The current system, I think,
just feeds the single hook by writing into a pipe to it, but if we
were to drive multiple hooks, we'd need to write the same thing to
each of these hook programs?  

Do we have a plan to deal with hooks whose outcome is not just
"yes/no", e.g. "proc-receive" hook that munges the list of refs to
be updated and the new values for them, or "applypatch-msg" that
munges the incoming proposed commit log message?  Does the second
hook work on the result of the first hook?  Do the two hooks work on
the vanilla state and their output have to agree with each other?

> +[[parallelization]]
> +=== Parallelization with dependencies
> +
> +Currently hooks use a naive parallelization scheme or are run in series.  But if
> +one hook depends on another's output, then users will want to specify those
> +dependencies.

An untold assumption here is that the questions I asked earlier on
having more than one hooks that is not just yes/no is something
readers know the same answer, and that answer is "the outcome of the
first hook is passed along (as if it were the input given by Git
directly, if the first hook did not exist), to the second hook.  It
should be spelled out somewhere before the execution ordering
section, I think.

Emily Shaffer March 10, 2021, 7:30 p.m. UTC | #4

On Mon, Feb 01, 2021 at 02:11:53PM -0800, Junio C Hamano wrote:
> 
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > +ways, providing an avenue to deprecate these "legacy" hooks if desired. The
> > +handling is based on a config `hook.runHookDir`, which is checked against a
> > +number of cases:
> 
> Don't we want to also warn when the setting "no" or something
> similar prevents the legacy hook from running, to help users
> who wonder why their hook scripts are not running?  I.e.
> 
> > +- "no": the legacy hook will not be run
> 
> +- "warn-no": Git will print a warning to stderr before ignoring the
> +  legacy hook

Yeah, I think you are right. Jonathan N suggested such an option as
"error" (as opposed to warning); I'll add it here plus to the enum and
take your description verbatim. Thanks.

> 
> > +- "interactive": Git will prompt the user before running the legacy hook
> > +- "warn": Git will print a warning to stderr before running the legacy hook
> > +- "yes" (default): Git will silently run the legacy hook
> 
> > +In case this list is expanded in the future, if a value for `hook.runHookDir` is
> > +given which Git does not recognize, Git should discard that config entry. For
> > +example, if "warn" was specified at system level and "junk" was specified at
> > +global level, Git would resolve the value to "warn"; if the only time the config
> > +was set was to "junk", Git would use the default value of "yes".
> 
> Hmph, instead of complaining "value 'junk' is not recognized" and
> erroring out?  Why?

I think for the exact case you're describing above - where I forgot some
useful combination of "run/don't run" and "warn/don't warn" (or, one
could forsee, "first" e.g. to run the legacy hook early on instead of
late, or "only" e.g. config hooks don't exist, etc etc), we add this
flag, and some enterprise (like Jonathan N's team) distributes the new
config to folks' system configs before 100% of machines are using the
newer Git executable. It's a long shot :) but I'd rather be flexible
than inflexible.

I will update the design doc anyway - I ended up implementing it as
"complain about 'junk' and then do default behavior".

> 
> > +[[stage-3]]
> > +==== Stage 3
> > +
> > +`.git/hooks` is removed from the template and the hook directory is considered
> > +deprecated. To avoid breaking older repos, the default of `hook.runHookDir` is
> > +not changed, and `find_hook()` is not removed.
> 
> Presumably, we'll have documentation somewhere that instructs users
> (who were taught by slashdot and other site to add certain scripts
> under their .git/hooks/) how to do the equivalent without adding
> scripts in .git/hooks/ directory and instead using the config
> mechanism (e.g. "when told to add script X in .git/hooks/, read such
> an instruction as if telling you to do Y instead") by the time this
> happens?  It probably makes sense to do so as part of stage-2, at
> which point the users are _ready_ to migrate.

Hm. Where should we distribute such a documentation? git-hook.txt?
githooks.txt? I think it's a good idea, sure.

> 
> > +[[security]]
> > +=== Security and repo config
> > +
> > +Part of the motivation behind this refactor is to mitigate hooks as an attack
> > +vector;footnote:[https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/]
> > +however, as the design stands, users can still provide hooks in the repo-level
> > +config, which is included when a repo is zipped and sent elsewhere.  The
> > +security of the repo-level config is still under discussion; this design
> > +generally assumes the repo-level config is secure, which is not true yet. The
> > +goal is to avoid an overcomplicated design to work around a problem which has
> > +ceased to exist.
> 
> I doubt we want to claim anything about security as part of this
> series.  As you say in the paragraph, .git/config and .git/hooks/
> are equally (un)protected and if we decide to punt .git/config
> security, then not moving away from .git/hooks would not hurt
> security-wise, either (in other words, security is not a viable
> motivation behind this series).

Yeah, I think you are right that despite my best efforts to spin it that
way, it just isn't a security win :) Ah well...

> 
> And if we stop advertising 'security merit' that does not exist,
> what remains?  Isn't the biggest selling point that an identical set
> of hook configuration can be shared among multiple repositories, and
> it allows more than one hook scripts to be triggered by a single
> "hook event"?  There may be other good things we should be able to
> sell the new mechanism to our users, and we do stress on them, which
> is done in the motivation section.  So...

Ok.

I would like to keep this header and make it more clear that we didn't
fully succeed in the wish to make zip attacks impossible, but I will
remove it from the motivations section at the top. I also added a little
more clarity (I hope) on the other pieces in the motivation section.

> 
> > +.Comparison of alternatives
> > +|===
> > +|Feature |Config-based hooks |Hook directories |Status quo
> 
> Sorry, but I did not find this table particularly convincing.
> 
> The only thing I sense is a hand-wavy desire that "we could make it
> better than everybody else if we work on it in this area", which can
> apply equally for other approaches---they could enhance what they
> already have (e.g. "discoverability & documentation").
> 
> As a list of "these are the points we aspire to do better than other
> people", I think it is an excellent idea to have a table like this
> here in the documentation.  But that is not a "comparison".

Ok. I'll see what I can do to frame it better.

> 
> > +[[execution-ordering]]
> > +=== Execution ordering
> > +
> > +We may find that config order is insufficient for some users; for example,
> > +config order makes it difficult to add a new hook to the system or global config
> > +which runs at the end of the hook list. A new ordering schema should be:
> > +
> > +1) Specified by a `hook.order` config, so that users will not unexpectedly see
> > +their order change;
> > +
> > +2) Either dependency or numerically based.
> > +
> > +Dependency-based ordering is prone to classic linked-list problems, like a
> > +cycles and handling of missing dependencies. But, it paves the way for enabling
> > +parallelization if some tasks truly depend on others.
> > +
> > +Numerical ordering makes it tricky for Git to generate suggested ordering
> > +numbers for each command, but is easy to determine a definitive order.
> 
> OK.
> 
> Have we decided what we do for hooks whose interface is to feed
> their input from their standard input?  The current system, I think,
> just feeds the single hook by writing into a pipe to it, but if we
> were to drive multiple hooks, we'd need to write the same thing to
> each of these hook programs?  

Yep. There are a few ways to do it:
1. Provide a file to be used as stdin. This way we just hook up that
file's fd to each hook process's stdin fd. (Example:
https://lore.kernel.org/git/20201222000435.1529768-11-emilyshaffer@google.com
in builtin/am.c:run_post_rewrite_hook())
2. Provide a string_list in the run_hooks_opt struct; hook.c treats each
entry in the string_list as a line to stdin, separated by a newline, and
"replays" the list to each hook in order. (Example:
https://lore.kernel.org/git/20201222000435.1529768-12-emilyshaffer@google.com)
3. Set up your own more complicated version of (2) by writing your own
callback to provide "next line of stdin" based on a context pointer. The
context pointer is per-hook-task. (Example:
https://lore.kernel.org/git/20201222000435.1529768-17-emilyshaffer@google.com)

> 
> Do we have a plan to deal with hooks whose outcome is not just
> "yes/no", e.g. "proc-receive" hook that munges the list of refs to
> be updated and the new values for them, or "applypatch-msg" that
> munges the incoming proposed commit log message?  Does the second
> hook work on the result of the first hook?  Do the two hooks work on
> the vanilla state and their output have to agree with each other?

The only hook I found this way was 'proc-receive' itself - and in the
end, because it's somewhat interactive with two-way communication with
the caller, it didn't seem like there was a good way to reason about
multiple hooks, like you say. So in the 'proc-receive' case, we disallow
multiple hooks. See
https://lore.kernel.org/git/20201222000435.1529768-15-emilyshaffer@google.com.
> 
> > +[[parallelization]]
> > +=== Parallelization with dependencies
> > +
> > +Currently hooks use a naive parallelization scheme or are run in series.  But if
> > +one hook depends on another's output, then users will want to specify those
> > +dependencies.
> 
> An untold assumption here is that the questions I asked earlier on
> having more than one hooks that is not just yes/no is something
> readers know the same answer, and that answer is "the outcome of the
> first hook is passed along (as if it were the input given by Git
> directly, if the first hook did not exist), to the second hook.  It
> should be spelled out somewhere before the execution ordering
> section, I think.

I'll make an explicit callout about that case, thanks.

 - Emily

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 69dbe4bb0b..505d318da1 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -81,6 +81,7 @@  SP_ARTICLES += $(API_DOCS)
 TECH_DOCS += MyFirstContribution
 TECH_DOCS += MyFirstObjectWalk
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += technical/config-based-hooks
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/technical/config-based-hooks.txt b/Documentation/technical/config-based-hooks.txt
new file mode 100644
index 0000000000..3217faba47
--- /dev/null
+++ b/Documentation/technical/config-based-hooks.txt
@@ -0,0 +1,355 @@ 
+Configuration-based hook management
+===================================
+:sectanchors:
+
+[[motivation]]
+== Motivation
+
+Replace the .git/hook/hookname path as the only source of hooks to execute;
+allow users to define hooks using config files, in a way which is friendly to
+users with multiple repos which have similar needs.
+
+Redefine "hook" as an event rather than a single script, allowing users to
+perform multiple unrelated actions on a single event.
+
+Take a step closer to safety when copying zipped Git repositories from untrusted
+users by making it more apparent to users which scripts will be run during
+normal Git operations.
+
+Make it easier for users to discover Git's hook feature and automate their
+workflows.
+
+[[user-interfaces]]
+== User interfaces
+
+[[config-schema]]
+=== Config schema
+
+Hooks can be introduced by editing the configuration manually. There are two new
+sections added, `hook` and `hookcmd`.
+
+[[config-schema-hook]]
+==== `hook`
+
+Primarily contains subsections for each hook event. The order of variables in
+these subsections defines the hook command execution order; hook commands can be
+specified by setting the value directly to the command if no additional
+configuration is needed, or by setting the value as the name of a `hookcmd`. If
+Git does not find a `hookcmd` whose subsection matches the value of the given
+command string, Git will try to execute the string directly. Hooks are executed
+by passing the resolved command string to the shell. In the future, hook event
+subsections could also contain per-hook-event settings; see
+<<per-hook-event-settings,the section in Future Work>> for more details.
+
+Also contains top-level hook execution settings, for example, `hook.runHookDir`.
+(These settings are described more in <<library,Library>>.)
+
+----
+[hook "pre-commit"]
+  command = perl-linter
+  command = /usr/bin/git-secrets --pre-commit
+
+[hook "pre-applypatch"]
+  command = perl-linter
+  # for illustration purposes; error behavior isn't planned yet
+  error = ignore
+
+[hook]
+  runHookDir = interactive
+----
+
+[[config-schema-hookcmd]]
+==== `hookcmd`
+
+Defines a hook command and its attributes, which will be used when a hook event
+occurs. Unqualified attributes are assumed to apply to this hook during all hook
+events, but event-specific attributes can also be supplied. The example runs
+`/usr/bin/lint-it --language=perl <args passed by Git>`, but for repos which
+include this config, the hook command will be skipped for all events.
+Theoretically, the last line could be used to "un-skip" the hook command for
+`pre-commit` hooks, but this hasn't been scoped or implemented yet.
+
+----
+[hookcmd "perl-linter"]
+  command = /usr/bin/lint-it --language=perl
+  skip = true
+  # for illustration purposes; below hasn't been defined yet
+  pre-commit-skip = false
+----
+
+[[command-line-api]]
+=== Command-line API
+
+Users should be able to view, run, reorder, and create hook commands via the
+command line. External tools should be able to view a list of hooks in the
+correct order to run. Modifier commands (`edit` and `add`) have not been
+implemented yet and may not be if manually editing the config proves usable
+enough.
+
+*`git hook list <hook-event>`*
+
+*`git hook run <hook-event> [-a <arg>]... [-e <env-var>]...`*
+
+*`git hook edit <hook-event>`*
+
+*`git hook add <hook-command> <hook-event> <options...>`*
+
+[[hook-editor]]
+=== Hook editor
+
+The tool which is presented by `git hook edit <hook-command>`. Ideally, this
+tool should be easier to use than manually editing the config, and then produce
+a concise config afterwards. It may take a form similar to `git rebase
+--interactive`. This has not been designed or implemented yet and may not be if
+the config proves usable enough.
+
+[[implementation]]
+== Implementation
+
+[[library]]
+=== Library
+
+`hook.c` and `hook.h` are responsible for interacting with the config files. The
+hook library provides a basic API to call all hooks in config order with more
+complex options passed via `struct run_hooks_opt`:
+
+*`int run_hooks(const char *hookname, struct run_hooks_opt *options)`*
+
+`struct run_hooks_opt` allows callers to set:
+
+- environment variables
+- command-line arguments
+- behavior for the hook command provided by `run-command.h:find_hook()` (see
+  below)
+- a method to provide stdin to each hook, either via a file containing stdin, a
+  `struct string_list` containing a list of lines to print, or a callback
+  function to allow the caller to populate stdin manually
+- a method to process stdout from each hook, e.g. for printing to sideband
+  during a network operation
+- parallelism
+- a custom working directory for hooks to execute in
+
+And this struct can be extended with more options as necessary in the future.
+
+The "legacy" hook provided by `run-command.h:find_hook()` - that is, the hook
+present in `.git/hooks/<hookname>` or
+`$(git config --get core.hooksPath)/<hookname>` - can be handled in a number of
+ways, providing an avenue to deprecate these "legacy" hooks if desired. The
+handling is based on a config `hook.runHookDir`, which is checked against a
+number of cases:
+
+- "no": the legacy hook will not be run
+- "interactive": Git will prompt the user before running the legacy hook
+- "warn": Git will print a warning to stderr before running the legacy hook
+- "yes" (default): Git will silently run the legacy hook
+
+In case this list is expanded in the future, if a value for `hook.runHookDir` is
+given which Git does not recognize, Git should discard that config entry. For
+example, if "warn" was specified at system level and "junk" was specified at
+global level, Git would resolve the value to "warn"; if the only time the config
+was set was to "junk", Git would use the default value of "yes".
+
+`struct hookcmd` is expected to grow in size over time as more functionality is
+added to hooks; so that other parts of the code don't need to understand the
+config schema, `struct hookcmd` should contain logical values instead of string
+pairs.
+
+By default, hook parallelism is chosen based on the semantics of each hook;
+callsites initialize their `struct run_hooks_opt` via one of two macros,
+`RUN_HOOKS_OPT_INIT_SYNC` or `RUN_HOOKS_OPT_INIT_ASYNC`. The default number of
+jobs can be configured in `hook.jobs`; this config applies across all hook
+events. If unset, the value of `online_cpus()` (equivalent to `nproc`) is used.
+
+[[builtin]]
+=== Builtin
+
+`builtin/hook.c` is responsible for providing the frontend. It's responsible for
+formatting user-provided data and then calling the library API to set the
+configs as appropriate. The builtin frontend is not responsible for calling the
+config directly, so that other areas of Git can rely on the hook library to
+understand the most recent config schema for hooks.
+
+[[migration]]
+=== Migration path
+
+[[stage-0]]
+==== Stage 0
+
+Hooks are called by running `run-command.h:find_hook()` with the hookname and
+executing the result. The hook library and builtin do not exist. Hooks only
+exist as specially named scripts within `.git/hooks/`.
+
+[[stage-1]]
+==== Stage 1
+
+`git hook list --porcelain <hook-event>` is implemented. `hook.h:run_hooks()` is
+taught to include `run-command.h:find_hook()` at the end; calls to `find_hook()`
+are replaced with calls to `run_hooks()`. Users can opt-in to config-based hooks
+simply by creating some in their config; otherwise users should remain
+unaffected by the change.
+
+[[stage-2]]
+==== Stage 2
+
+The call to `find_hook()` inside of `run_hooks()` learns to check for a config,
+`hook.runHookDir`. Users can opt into managing their hooks completely via the
+config this way.
+
+[[stage-3]]
+==== Stage 3
+
+`.git/hooks` is removed from the template and the hook directory is considered
+deprecated. To avoid breaking older repos, the default of `hook.runHookDir` is
+not changed, and `find_hook()` is not removed.
+
+[[caveats]]
+== Caveats
+
+[[security]]
+=== Security and repo config
+
+Part of the motivation behind this refactor is to mitigate hooks as an attack
+vector;footnote:[https://lore.kernel.org/git/20171002234517.GV19555@aiede.mtv.corp.google.com/]
+however, as the design stands, users can still provide hooks in the repo-level
+config, which is included when a repo is zipped and sent elsewhere.  The
+security of the repo-level config is still under discussion; this design
+generally assumes the repo-level config is secure, which is not true yet. The
+goal is to avoid an overcomplicated design to work around a problem which has
+ceased to exist.
+
+[[ease-of-use]]
+=== Ease of use
+
+The config schema is nontrivial; that's why it's important for the `git hook`
+modifier commands to be usable. Contributors with UX expertise are encouraged to
+share their suggestions.
+
+[[alternatives]]
+== Alternative approaches
+
+A previous summary of alternatives exists in the
+archives.footnote:[https://lore.kernel.org/git/20191116011125.GG22855@google.com]
+
+[[status-quo]]
+=== Status quo
+
+Today users can implement multihooks themselves by using a "trampoline script"
+as their hook, and pointing that script to a directory or list of other scripts
+they wish to run.
+
+[[hook-directories]]
+=== Hook directories
+
+Other contributors have suggested Git learn about the existence of a directory
+such as `.git/hooks/<hookname>.d` and execute those hooks in alphabetical order.
+
+[[comparison]]
+=== Comparison table
+
+.Comparison of alternatives
+|===
+|Feature |Config-based hooks |Hook directories |Status quo
+
+|Supports multiple hooks
+|Natively
+|Natively
+|With user effort
+
+|Safer for zipped repos
+|A little
+|No
+|No
+
+|Previous hooks just work
+|If configured
+|Yes
+|Yes
+
+|Can install one hook to many repos
+|Yes
+|No
+|No
+
+|Discoverability
+|Better (in `git help git`)
+|Same as before
+|Same as before
+
+|Hard to run unexpected hook
+|If configured
+|No
+|No
+|===
+
+[[future-work]]
+== Future work
+
+[[execution-ordering]]
+=== Execution ordering
+
+We may find that config order is insufficient for some users; for example,
+config order makes it difficult to add a new hook to the system or global config
+which runs at the end of the hook list. A new ordering schema should be:
+
+1) Specified by a `hook.order` config, so that users will not unexpectedly see
+their order change;
+
+2) Either dependency or numerically based.
+
+Dependency-based ordering is prone to classic linked-list problems, like a
+cycles and handling of missing dependencies. But, it paves the way for enabling
+parallelization if some tasks truly depend on others.
+
+Numerical ordering makes it tricky for Git to generate suggested ordering
+numbers for each command, but is easy to determine a definitive order.
+
+[[parallelization]]
+=== Parallelization with dependencies
+
+Currently hooks use a naive parallelization scheme or are run in series.  But if
+one hook depends on another's output, then users will want to specify those
+dependencies. If we decide to solve this problem, we may want to look to modern
+build systems for inspiration on how to manage dependencies and parallel tasks.
+
+[[securing-hookdir-hooks]]
+=== Securing hookdir hooks
+
+With the design as written in this doc, it's still possible for a malicious user
+to modify `.git/config` to include `hook.pre-receive.command = rm -rf /`, then
+zip their repo and send it to another user. It may be necessary to teach Git to
+only allow inlined hooks like this if they were configured outside of the local
+scope (in other words, only run hookcmds, and only allow hookcmds to be
+configured in global or system scope); or another approach, like a list of safe
+projects, might be useful. It may also be sufficient (or at least useful) to
+teach a `hook.disableAll` config or similar flag to the Git executable.
+
+[[submodule-inheritance]]
+=== Submodule inheritance
+
+It's possible some submodules may want to run the identical set of hooks that
+their superrepo runs. While a globally-configured hook set is helpful, it's not
+a great solution for users who have multiple repos-with-submodules under the
+same user. It would be useful for submodules to learn how to run hooks from
+their superrepo's config, or inherit that hook setting.
+
+[[per-hook-event-settings]]
+=== Per-hook-event settings
+
+It might be desirable to keep settings specifically for some hook events, but
+not for others - for example, a user may wish to disable hookdir hooks for all
+events but pre-commit, which they haven't had time to convert yet; or, a user
+may wish for execution order settings to differ based on hook event. In that
+case, it would be useful to set something like `hook.pre-commit.executionOrder`
+which would not apply to the 'prepare-commit-msg' hook, for example.
+
+[[glossary]]
+== Glossary
+
+*hook event*
+
+A point during Git's execution where user scripts may be run, for example,
+_prepare-commit-msg_ or _pre-push_.
+
+*hook command*
+
+A user script or executable which will be run on one or more hook events.