Message ID | patch-v2-8.8-238155fcb9d-20220518T195858Z-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | hook API: connect hooks to the TTY again, fixes a v2.36.0 regression | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > In the preceding commit we removed the "stdout_to_stderr=1" assignment > as being redundant. This change brings it back as with ".ungroup=1" > the run_process_parallel() function doesn't provide them for us > implicitly. This part I recall commenting on the earlier round. The above message and the change in the patch makes sense. Thanks.
On Wed, May 18, 2022 at 10:05:24PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Fix a regression reported[1] in f443246b9f2 (commit: convert > {pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22): Due to > using the run_process_parallel() API in the earlier 96e7225b310 (hook: > add 'run' subcommand, 2021-12-22) we'd capture the hook's stderr and > stdout, and thus lose the connection to the TTY in the case of > e.g. the "pre-commit" hook. > > As a preceding commit notes GNU parallel's similar --ungroup option > also has it emit output faster. While we're unlikely to have hooks > that emit truly massive amounts of output (or where the performance > thereof matters) it's still informative to measure the overhead. In a > similar "seq" test we're now ~30% faster: > > $ cat .git/hooks/seq-hook; git hyperfine -L rev origin/master,HEAD~0 -s 'make CFLAGS=-O3' './git hook run seq-hook' > #!/bin/sh > > seq 100000000 > Benchmark 1: ./git hook run seq-hook' in 'origin/master > Time (mean ± σ): 787.1 ms ± 13.6 ms [User: 701.6 ms, System: 534.4 ms] > Range (min … max): 773.2 ms … 806.3 ms 10 runs > > Benchmark 2: ./git hook run seq-hook' in 'HEAD~0 > Time (mean ± σ): 603.4 ms ± 1.6 ms [User: 573.1 ms, System: 30.3 ms] > Range (min … max): 601.0 ms … 606.2 ms 10 runs > > Summary > './git hook run seq-hook' in 'HEAD~0' ran > 1.30 ± 0.02 times faster than './git hook run seq-hook' in 'origin/master' > > In the preceding commit we removed the "stdout_to_stderr=1" assignment > as being redundant. This change brings it back as with ".ungroup=1" > the run_process_parallel() function doesn't provide them for us > implicitly. > > As an aside omitting the stdout_to_stderr=1 here would have all tests > pass, except those that test "git hook run" itself in > t1800-hook.sh. But our tests passing is the result of another test > blind spot, as was the case with the regression being fixed here. The > "stdout_to_stderr=1" for hooks is long-standing behavior, see > e.g. 1d9e8b56fe3 (Split back out update_hook handling in receive-pack, > 2007-03-10) and other follow-up commits (running "git log" with > "--reverse -p -Gstdout_to_stderr" is a good start). > > 1. https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/ > > Reported-by: Anthony Sottile <asottile@umich.edu> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > hook.c | 5 +++++ > t/t1800-hook.sh | 37 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 42 insertions(+) > > diff --git a/hook.c b/hook.c > index dc498ef5c39..5f31b60384a 100644 > --- a/hook.c > +++ b/hook.c > @@ -54,6 +54,7 @@ static int pick_next_hook(struct child_process *cp, > return 0; > > strvec_pushv(&cp->env_array, hook_cb->options->env.v); > + cp->stdout_to_stderr = 1; /* because of .ungroup = 1 */ > cp->trace2_hook_name = hook_cb->hook_name; > cp->dir = hook_cb->options->dir; > > @@ -126,6 +127,7 @@ int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options) > .tr2_label = hook_name, > > .jobs = jobs, > + .ungroup = jobs == 1, I mentioned it on patch 5, but I actually do not see a reason why we shouldn't do this logic in run_processes_parallel instead of just for the hooks. If someone can mention a reason we want to buffer child processes we're running in series I'm all ears, of course. > > .get_next_task = pick_next_hook, > .start_failure = notify_start_failure, > @@ -136,6 +138,9 @@ int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options) > if (!options) > BUG("a struct run_hooks_opt must be provided to run_hooks"); > > + if (jobs != 1 || !run_opts.ungroup) > + BUG("TODO: think about & document order & interleaving of parallel hook output"); Doesn't this mean we're actually disallowing parallel hooks entirely? I don't think that's necessary or desired. I guess right now when the config isn't used, there's not really a way to provide parallel hooks, but I also think this will cause unnecessary conflicts for Google who is carrying config hooks downstream. I know that's not such a great reason. But it seems weird to be explicitly using the parallel processing framework, but then say, "oh, but we actually don't want to run in parallel, that's a BUG()". > + > if (options->invoked_hook) > *options->invoked_hook = 0; > > diff --git a/t/t1800-hook.sh b/t/t1800-hook.sh > index 1e4adc3d53e..f22754deccc 100755 > --- a/t/t1800-hook.sh > +++ b/t/t1800-hook.sh > @@ -4,6 +4,7 @@ test_description='git-hook command' > > TEST_PASSES_SANITIZE_LEAK=true > . ./test-lib.sh > +. "$TEST_DIRECTORY"/lib-terminal.sh > > test_expect_success 'git hook usage' ' > test_expect_code 129 git hook && > @@ -120,4 +121,40 @@ test_expect_success 'git -c core.hooksPath=<PATH> hook run' ' > test_cmp expect actual > ' > > +test_hook_tty() { > + local fd="$1" && > + > + cat >expect && > + > + test_when_finished "rm -rf repo" && > + git init repo && > + > + test_hook -C repo pre-commit <<-EOF && > + { > + test -t 1 && echo >&$fd STDOUT TTY || echo >&$fd STDOUT NO TTY && > + test -t 2 && echo >&$fd STDERR TTY || echo >&$fd STDERR NO TTY > + } $fd>actual > + EOF > + > + test_commit -C repo A && > + test_commit -C repo B && > + git -C repo reset --soft HEAD^ && > + test_terminal git -C repo commit -m"B.new" && > + test_cmp expect repo/actual > +} > + > +test_expect_success TTY 'git hook run: stdout and stderr are connected to a TTY: STDOUT redirect' ' > + test_hook_tty 1 <<-\EOF > + STDOUT NO TTY > + STDERR TTY > + EOF > +' > + > +test_expect_success TTY 'git hook run: stdout and stderr are connected to a TTY: STDERR redirect' ' > + test_hook_tty 2 <<-\EOF > + STDOUT TTY > + STDERR NO TTY > + EOF > +' > + > test_done > -- > 2.36.1.952.g0ae626f6cd7 >
On Thu, May 26 2022, Emily Shaffer wrote: > On Wed, May 18, 2022 at 10:05:24PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> Fix a regression reported[1] in f443246b9f2 (commit: convert >> {pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22): Due to >> using the run_process_parallel() API in the earlier 96e7225b310 (hook: >> add 'run' subcommand, 2021-12-22) we'd capture the hook's stderr and >> stdout, and thus lose the connection to the TTY in the case of >> e.g. the "pre-commit" hook. >> >> As a preceding commit notes GNU parallel's similar --ungroup option >> also has it emit output faster. While we're unlikely to have hooks >> that emit truly massive amounts of output (or where the performance >> thereof matters) it's still informative to measure the overhead. In a >> similar "seq" test we're now ~30% faster: >> >> $ cat .git/hooks/seq-hook; git hyperfine -L rev origin/master,HEAD~0 -s 'make CFLAGS=-O3' './git hook run seq-hook' >> #!/bin/sh >> >> seq 100000000 >> Benchmark 1: ./git hook run seq-hook' in 'origin/master >> Time (mean ± σ): 787.1 ms ± 13.6 ms [User: 701.6 ms, System: 534.4 ms] >> Range (min … max): 773.2 ms … 806.3 ms 10 runs >> >> Benchmark 2: ./git hook run seq-hook' in 'HEAD~0 >> Time (mean ± σ): 603.4 ms ± 1.6 ms [User: 573.1 ms, System: 30.3 ms] >> Range (min … max): 601.0 ms … 606.2 ms 10 runs >> >> Summary >> './git hook run seq-hook' in 'HEAD~0' ran >> 1.30 ± 0.02 times faster than './git hook run seq-hook' in 'origin/master' >> >> In the preceding commit we removed the "stdout_to_stderr=1" assignment >> as being redundant. This change brings it back as with ".ungroup=1" >> the run_process_parallel() function doesn't provide them for us >> implicitly. >> >> As an aside omitting the stdout_to_stderr=1 here would have all tests >> pass, except those that test "git hook run" itself in >> t1800-hook.sh. But our tests passing is the result of another test >> blind spot, as was the case with the regression being fixed here. The >> "stdout_to_stderr=1" for hooks is long-standing behavior, see >> e.g. 1d9e8b56fe3 (Split back out update_hook handling in receive-pack, >> 2007-03-10) and other follow-up commits (running "git log" with >> "--reverse -p -Gstdout_to_stderr" is a good start). >> >> 1. https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/ >> >> Reported-by: Anthony Sottile <asottile@umich.edu> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> hook.c | 5 +++++ >> t/t1800-hook.sh | 37 +++++++++++++++++++++++++++++++++++++ >> 2 files changed, 42 insertions(+) >> >> diff --git a/hook.c b/hook.c >> index dc498ef5c39..5f31b60384a 100644 >> --- a/hook.c >> +++ b/hook.c >> @@ -54,6 +54,7 @@ static int pick_next_hook(struct child_process *cp, >> return 0; >> >> strvec_pushv(&cp->env_array, hook_cb->options->env.v); >> + cp->stdout_to_stderr = 1; /* because of .ungroup = 1 */ >> cp->trace2_hook_name = hook_cb->hook_name; >> cp->dir = hook_cb->options->dir; >> >> @@ -126,6 +127,7 @@ int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options) >> .tr2_label = hook_name, >> >> .jobs = jobs, >> + .ungroup = jobs == 1, > > I mentioned it on patch 5, but I actually do not see a reason why we > shouldn't do this logic in run_processes_parallel instead of just for > the hooks. If someone can mention a reason we want to buffer child > processes we're running in series I'm all ears, of course. > >> >> .get_next_task = pick_next_hook, >> .start_failure = notify_start_failure, >> @@ -136,6 +138,9 @@ int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options) >> if (!options) >> BUG("a struct run_hooks_opt must be provided to run_hooks"); >> >> + if (jobs != 1 || !run_opts.ungroup) >> + BUG("TODO: think about & document order & interleaving of parallel hook output"); > > Doesn't this mean we're actually disallowing parallel hooks entirely? I > don't think that's necessary or desired. I guess right now when the > config isn't used, there's not really a way to provide parallel hooks, > but I also think this will cause unnecessary conflicts for Google who is > carrying config hooks downstream. I know that's not such a great reason. > But it seems weird to be explicitly using the parallel processing > framework, but then say, "oh, but we actually don't want to run in > parallel, that's a BUG()". I can just drop this paranoia. I figured it was prudent to leave this landmine in place so we'd definitely remember to re-visit this aspect of it, but I think there's 0% that we'll forget. So I'll make it less paranoid.
On Thu, May 26, 2022 at 08:23:23PM +0200, Ævar Arnfjörð Bjarmason wrote: > > > On Thu, May 26 2022, Emily Shaffer wrote: > > > On Wed, May 18, 2022 at 10:05:24PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > >> Fix a regression reported[1] in f443246b9f2 (commit: convert > >> {pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22): Due to > >> using the run_process_parallel() API in the earlier 96e7225b310 (hook: > >> add 'run' subcommand, 2021-12-22) we'd capture the hook's stderr and > >> stdout, and thus lose the connection to the TTY in the case of > >> e.g. the "pre-commit" hook. > >> > >> As a preceding commit notes GNU parallel's similar --ungroup option > >> also has it emit output faster. While we're unlikely to have hooks > >> that emit truly massive amounts of output (or where the performance > >> thereof matters) it's still informative to measure the overhead. In a > >> similar "seq" test we're now ~30% faster: > >> > >> $ cat .git/hooks/seq-hook; git hyperfine -L rev origin/master,HEAD~0 -s 'make CFLAGS=-O3' './git hook run seq-hook' > >> #!/bin/sh > >> > >> seq 100000000 > >> Benchmark 1: ./git hook run seq-hook' in 'origin/master > >> Time (mean ± σ): 787.1 ms ± 13.6 ms [User: 701.6 ms, System: 534.4 ms] > >> Range (min … max): 773.2 ms … 806.3 ms 10 runs > >> > >> Benchmark 2: ./git hook run seq-hook' in 'HEAD~0 > >> Time (mean ± σ): 603.4 ms ± 1.6 ms [User: 573.1 ms, System: 30.3 ms] > >> Range (min … max): 601.0 ms … 606.2 ms 10 runs > >> > >> Summary > >> './git hook run seq-hook' in 'HEAD~0' ran > >> 1.30 ± 0.02 times faster than './git hook run seq-hook' in 'origin/master' > >> > >> In the preceding commit we removed the "stdout_to_stderr=1" assignment > >> as being redundant. This change brings it back as with ".ungroup=1" > >> the run_process_parallel() function doesn't provide them for us > >> implicitly. > >> > >> As an aside omitting the stdout_to_stderr=1 here would have all tests > >> pass, except those that test "git hook run" itself in > >> t1800-hook.sh. But our tests passing is the result of another test > >> blind spot, as was the case with the regression being fixed here. The > >> "stdout_to_stderr=1" for hooks is long-standing behavior, see > >> e.g. 1d9e8b56fe3 (Split back out update_hook handling in receive-pack, > >> 2007-03-10) and other follow-up commits (running "git log" with > >> "--reverse -p -Gstdout_to_stderr" is a good start). > >> > >> 1. https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/ > >> > >> Reported-by: Anthony Sottile <asottile@umich.edu> > >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > >> --- > >> hook.c | 5 +++++ > >> t/t1800-hook.sh | 37 +++++++++++++++++++++++++++++++++++++ > >> 2 files changed, 42 insertions(+) > >> > >> diff --git a/hook.c b/hook.c > >> index dc498ef5c39..5f31b60384a 100644 > >> --- a/hook.c > >> +++ b/hook.c > >> @@ -54,6 +54,7 @@ static int pick_next_hook(struct child_process *cp, > >> return 0; > >> > >> strvec_pushv(&cp->env_array, hook_cb->options->env.v); > >> + cp->stdout_to_stderr = 1; /* because of .ungroup = 1 */ > >> cp->trace2_hook_name = hook_cb->hook_name; > >> cp->dir = hook_cb->options->dir; > >> > >> @@ -126,6 +127,7 @@ int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options) > >> .tr2_label = hook_name, > >> > >> .jobs = jobs, > >> + .ungroup = jobs == 1, > > > > I mentioned it on patch 5, but I actually do not see a reason why we > > shouldn't do this logic in run_processes_parallel instead of just for > > the hooks. If someone can mention a reason we want to buffer child > > processes we're running in series I'm all ears, of course. > > > >> > >> .get_next_task = pick_next_hook, > >> .start_failure = notify_start_failure, > >> @@ -136,6 +138,9 @@ int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options) > >> if (!options) > >> BUG("a struct run_hooks_opt must be provided to run_hooks"); > >> > >> + if (jobs != 1 || !run_opts.ungroup) > >> + BUG("TODO: think about & document order & interleaving of parallel hook output"); > > > > Doesn't this mean we're actually disallowing parallel hooks entirely? I > > don't think that's necessary or desired. I guess right now when the > > config isn't used, there's not really a way to provide parallel hooks, > > but I also think this will cause unnecessary conflicts for Google who is > > carrying config hooks downstream. I know that's not such a great reason. > > But it seems weird to be explicitly using the parallel processing > > framework, but then say, "oh, but we actually don't want to run in > > parallel, that's a BUG()". > > I can just drop this paranoia. I figured it was prudent to leave this > landmine in place so we'd definitely remember to re-visit this aspect of > it, but I think there's 0% that we'll forget. So I'll make it less > paranoid. Thanks. With that change the series looks good to me otherwise, although if you're rerolling to drop it, maybe consider some of the other little nits I left elsewhere. ;) - Emily
diff --git a/hook.c b/hook.c index dc498ef5c39..5f31b60384a 100644 --- a/hook.c +++ b/hook.c @@ -54,6 +54,7 @@ static int pick_next_hook(struct child_process *cp, return 0; strvec_pushv(&cp->env_array, hook_cb->options->env.v); + cp->stdout_to_stderr = 1; /* because of .ungroup = 1 */ cp->trace2_hook_name = hook_cb->hook_name; cp->dir = hook_cb->options->dir; @@ -126,6 +127,7 @@ int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options) .tr2_label = hook_name, .jobs = jobs, + .ungroup = jobs == 1, .get_next_task = pick_next_hook, .start_failure = notify_start_failure, @@ -136,6 +138,9 @@ int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options) if (!options) BUG("a struct run_hooks_opt must be provided to run_hooks"); + if (jobs != 1 || !run_opts.ungroup) + BUG("TODO: think about & document order & interleaving of parallel hook output"); + if (options->invoked_hook) *options->invoked_hook = 0; diff --git a/t/t1800-hook.sh b/t/t1800-hook.sh index 1e4adc3d53e..f22754deccc 100755 --- a/t/t1800-hook.sh +++ b/t/t1800-hook.sh @@ -4,6 +4,7 @@ test_description='git-hook command' TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh +. "$TEST_DIRECTORY"/lib-terminal.sh test_expect_success 'git hook usage' ' test_expect_code 129 git hook && @@ -120,4 +121,40 @@ test_expect_success 'git -c core.hooksPath=<PATH> hook run' ' test_cmp expect actual ' +test_hook_tty() { + local fd="$1" && + + cat >expect && + + test_when_finished "rm -rf repo" && + git init repo && + + test_hook -C repo pre-commit <<-EOF && + { + test -t 1 && echo >&$fd STDOUT TTY || echo >&$fd STDOUT NO TTY && + test -t 2 && echo >&$fd STDERR TTY || echo >&$fd STDERR NO TTY + } $fd>actual + EOF + + test_commit -C repo A && + test_commit -C repo B && + git -C repo reset --soft HEAD^ && + test_terminal git -C repo commit -m"B.new" && + test_cmp expect repo/actual +} + +test_expect_success TTY 'git hook run: stdout and stderr are connected to a TTY: STDOUT redirect' ' + test_hook_tty 1 <<-\EOF + STDOUT NO TTY + STDERR TTY + EOF +' + +test_expect_success TTY 'git hook run: stdout and stderr are connected to a TTY: STDERR redirect' ' + test_hook_tty 2 <<-\EOF + STDOUT TTY + STDERR NO TTY + EOF +' + test_done
Fix a regression reported[1] in f443246b9f2 (commit: convert {pre-commit,prepare-commit-msg} hook to hook.h, 2021-12-22): Due to using the run_process_parallel() API in the earlier 96e7225b310 (hook: add 'run' subcommand, 2021-12-22) we'd capture the hook's stderr and stdout, and thus lose the connection to the TTY in the case of e.g. the "pre-commit" hook. As a preceding commit notes GNU parallel's similar --ungroup option also has it emit output faster. While we're unlikely to have hooks that emit truly massive amounts of output (or where the performance thereof matters) it's still informative to measure the overhead. In a similar "seq" test we're now ~30% faster: $ cat .git/hooks/seq-hook; git hyperfine -L rev origin/master,HEAD~0 -s 'make CFLAGS=-O3' './git hook run seq-hook' #!/bin/sh seq 100000000 Benchmark 1: ./git hook run seq-hook' in 'origin/master Time (mean ± σ): 787.1 ms ± 13.6 ms [User: 701.6 ms, System: 534.4 ms] Range (min … max): 773.2 ms … 806.3 ms 10 runs Benchmark 2: ./git hook run seq-hook' in 'HEAD~0 Time (mean ± σ): 603.4 ms ± 1.6 ms [User: 573.1 ms, System: 30.3 ms] Range (min … max): 601.0 ms … 606.2 ms 10 runs Summary './git hook run seq-hook' in 'HEAD~0' ran 1.30 ± 0.02 times faster than './git hook run seq-hook' in 'origin/master' In the preceding commit we removed the "stdout_to_stderr=1" assignment as being redundant. This change brings it back as with ".ungroup=1" the run_process_parallel() function doesn't provide them for us implicitly. As an aside omitting the stdout_to_stderr=1 here would have all tests pass, except those that test "git hook run" itself in t1800-hook.sh. But our tests passing is the result of another test blind spot, as was the case with the regression being fixed here. The "stdout_to_stderr=1" for hooks is long-standing behavior, see e.g. 1d9e8b56fe3 (Split back out update_hook handling in receive-pack, 2007-03-10) and other follow-up commits (running "git log" with "--reverse -p -Gstdout_to_stderr" is a good start). 1. https://lore.kernel.org/git/CA+dzEBn108QoMA28f0nC8K21XT+Afua0V2Qv8XkR8rAeqUCCZw@mail.gmail.com/ Reported-by: Anthony Sottile <asottile@umich.edu> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- hook.c | 5 +++++ t/t1800-hook.sh | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+)