mbox series

[v2,0/8] hook API: connect hooks to the TTY again, fixes a v2.36.0 regression

Message ID cover-v2-0.8-00000000000-20220518T195858Z-avarab@gmail.com (mailing list archive)
Headers show
Series hook API: connect hooks to the TTY again, fixes a v2.36.0 regression | expand

Message

Ævar Arnfjörð Bjarmason May 18, 2022, 8:05 p.m. UTC
A re-roll of v1[1]. I believe this addresses all comments on the v1
(but perhaps I missed something). Changes:

 * The run_processes_parallel() now takes only one argument, the new
   "opts" struct which has options, callbacks etc. This will also make
   the subsequent config-based hooks topic less churny (it needs new
   callbacks).

   As a result the whole internal *_tr2() wrapper/static function are
   gone.

 * Replaced checks of "ungroup" with whether we have a NULL or not
   (e.g. for pp->pfd), also for free().

 * Typo/grammar fixes in commit messages.

 * Hopefully the 8/8 is less confusing vis-a-vis
   https://lore.kernel.org/git/xmqqfslva3mx.fsf@gitster.g/; I.e. now
   we only add "stdout_to_stderr".

 * The 01/08 and 04/08 are new: Splitting those out made subsequent
   diffs smaller.

 * Tweaked 5/8 a bit to make the diff smaller.

 * Used "err" and "out", not "actual" and "out" in tests, per Junio's
   suggestion.

Passing CI for this series at: https://github.com/avar/git/actions/runs/2346571047

1. https://lore.kernel.org/git/cover-0.6-00000000000-20220421T122108Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (8):
  run-command tests: change if/if/... to if/else if/else
  run-command API: use "opts" struct for run_processes_parallel{,_tr2}()
  run-command tests: test stdout of run_command_parallel()
  run-command.c: add an initializer for "struct parallel_processes"
  run-command: add an "ungroup" option to run_process_parallel()
  hook tests: fix redirection logic error in 96e7225b310
  hook API: don't redundantly re-set "no_stdin" and "stdout_to_stderr"
  hook API: fix v2.36.0 regression: hooks should be connected to a TTY

 builtin/fetch.c             |  18 +++--
 builtin/submodule--helper.c |  15 ++--
 hook.c                      |  28 +++++---
 run-command.c               | 132 +++++++++++++++++++++++-------------
 run-command.h               |  66 ++++++++++++++----
 submodule.c                 |  18 +++--
 t/helper/test-run-command.c |  65 ++++++++++++------
 t/t0061-run-command.sh      |  55 ++++++++++++---
 t/t1800-hook.sh             |  39 ++++++++++-
 9 files changed, 316 insertions(+), 120 deletions(-)

Range-diff against v1:
-:  ----------- > 1:  26a81eff267 run-command tests: change if/if/... to if/else if/else
1:  8bf71ce63dd ! 2:  5f0a6e9925f run-command API: replace run_processes_parallel_tr2() with opts struct
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    run-command API: replace run_processes_parallel_tr2() with opts struct
    +    run-command API: use "opts" struct for run_processes_parallel{,_tr2}()
     
    -    Add a new "struct run_process_parallel_opts" to cover the trace2
    -    use-case added in ee4512ed481 (trace2: create new combined trace
    -    facility, 2019-02-22). A subsequent commit will add more options, and
    -    having a proliferation of new functions or extra parameters would
    -    result in needless churn.
    +    Add a new "struct run_process_parallel_opts" to replace the growing
    +    run_processes_parallel() and run_processes_parallel_tr2() argument
    +    lists. This refactoring makes it easier to add new options and
    +    parameters easier.
     
    -    It makes for a smaller change to make run_processes_parallel() and
    -    run_processes_parallel_tr2() wrapper functions for the new "static"
    -    run_processes_parallel_1(), which contains the main logic. We pass
    -    down "opts" to the *_1() function even though it isn't used there
    -    yet (only in the *_tr2() function), a subsequent commit will make more
    -    use of it.
    +    The *_tr2() variant of the function was added in ee4512ed481 (trace2:
    +    create new combined trace facility, 2019-02-22), and has subsequently
    +    been used by every caller except t/helper/test-run-command.c.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ builtin/fetch.c: static int fetch_multiple(struct string_list *list, int max_chi
     +		struct run_process_parallel_opts run_opts = {
     +			.tr2_category = "fetch",
     +			.tr2_label = "parallel/fetch",
    ++
    ++			.jobs = max_children,
    ++
    ++			.get_next_task = &fetch_next_remote,
    ++			.start_failure = &fetch_failed_to_start,
    ++			.task_finished = &fetch_finished,
    ++			.data = &state,
     +		};
      
      		strvec_push(&argv, "--end-of-options");
    @@ builtin/fetch.c: static int fetch_multiple(struct string_list *list, int max_chi
     -						    &fetch_finished,
     -						    &state,
     -						    "fetch", "parallel/fetch");
    -+		result = run_processes_parallel(max_children,
    -+						&fetch_next_remote,
    -+						&fetch_failed_to_start,
    -+						&fetch_finished, &state,
    -+						&run_opts);
    ++		result = run_processes_parallel(&run_opts);
      
      		if (!result)
      			result = state.result;
    @@ builtin/submodule--helper.c: static int update_submodules(struct update_data *up
     +	struct run_process_parallel_opts run_opts = {
     +		.tr2_category = "submodule",
     +		.tr2_label = "parallel/update",
    ++
    ++		.get_next_task = update_clone_get_next_task,
    ++		.start_failure = update_clone_start_failure,
    ++		.task_finished = update_clone_task_finished,
    ++		.data = &suc,
     +	};
      
      	suc.update_data = update_data;
    @@ builtin/submodule--helper.c: static int update_submodules(struct update_data *up
     -				   update_clone_start_failure,
     -				   update_clone_task_finished, &suc, "submodule",
     -				   "parallel/update");
    -+	run_processes_parallel(suc.update_data->max_jobs,
    -+			       update_clone_get_next_task,
    -+			       update_clone_start_failure,
    -+			       update_clone_task_finished, &suc, &run_opts);
    ++	run_opts.jobs = suc.update_data->max_jobs;
    ++	run_processes_parallel(&run_opts);
      
      	/*
      	 * We saved the output and put it out all at once now.
     
      ## hook.c ##
     @@ hook.c: int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options)
    + 		.options = options,
    + 	};
      	const char *const hook_path = find_hook(hook_name);
    - 	int jobs = 1;
    +-	int jobs = 1;
    ++	const int jobs = 1;
      	int ret = 0;
     +	struct run_process_parallel_opts run_opts = {
     +		.tr2_category = "hook",
     +		.tr2_label = hook_name,
    ++
    ++		.jobs = jobs,
    ++
    ++		.get_next_task = pick_next_hook,
    ++		.start_failure = notify_start_failure,
    ++		.task_finished = notify_hook_finished,
    ++		.data = &cb_data,
     +	};
      
      	if (!options)
    @@ hook.c: int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options)
     -				   &cb_data,
     -				   "hook",
     -				   hook_name);
    -+	run_processes_parallel(jobs, pick_next_hook, notify_start_failure,
    -+			       notify_hook_finished, &cb_data, &run_opts);
    ++	run_processes_parallel(&run_opts);
      	ret = cb_data.rc;
      cleanup:
      	strbuf_release(&abs_path);
     
      ## run-command.c ##
    +@@ run-command.c: static void handle_children_on_signal(int signo)
    + }
    + 
    + static void pp_init(struct parallel_processes *pp,
    +-		    int n,
    +-		    get_next_task_fn get_next_task,
    +-		    start_failure_fn start_failure,
    +-		    task_finished_fn task_finished,
    +-		    void *data)
    ++		    struct run_process_parallel_opts *opts)
    + {
    + 	int i;
    ++	int n = opts->jobs;
    ++	void *data = opts->data;
    ++	get_next_task_fn get_next_task = opts->get_next_task;
    ++	start_failure_fn start_failure = opts->start_failure;
    ++	task_finished_fn task_finished = opts->task_finished;
    + 
    + 	if (n < 1)
    + 		n = online_cpus();
     @@ run-command.c: static int pp_collect_finished(struct parallel_processes *pp)
      	return result;
      }
    @@ run-command.c: static int pp_collect_finished(struct parallel_processes *pp)
     -			   start_failure_fn start_failure,
     -			   task_finished_fn task_finished,
     -			   void *pp_cb)
    -+static int run_processes_parallel_1(int n, get_next_task_fn get_next_task,
    -+				    start_failure_fn start_failure,
    -+				    task_finished_fn task_finished,
    -+				    void *pp_cb,
    -+				    struct run_process_parallel_opts *opts)
    ++int run_processes_parallel(struct run_process_parallel_opts *opts)
      {
      	int i, code;
      	int output_timeout = 100;
    + 	int spawn_cap = 4;
    + 	struct parallel_processes pp;
    ++	const char *tr2_category = opts->tr2_category;
    ++	const char *tr2_label = opts->tr2_label;
    ++	const int do_trace2 = tr2_category && tr2_label;
    ++	const int n = opts->jobs;
    + 
    +-	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb);
    ++	if (do_trace2)
    ++		trace2_region_enter_printf(tr2_category, tr2_label, NULL,
    ++					   "max:%d", ((n < 1) ? online_cpus()
    ++						      : n));
    ++
    ++	pp_init(&pp, opts);
    + 	while (1) {
    + 		for (i = 0;
    + 		    i < spawn_cap && !pp.shutdown &&
     @@ run-command.c: int run_processes_parallel(int n,
    - 	return 0;
    - }
    + 	}
      
    + 	pp_cleanup(&pp);
    +-	return 0;
    +-}
    +-
     -int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
     -			       start_failure_fn start_failure,
     -			       task_finished_fn task_finished, void *pp_cb,
     -			       const char *tr2_category, const char *tr2_label)
    -+static int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
    -+				      start_failure_fn start_failure,
    -+				      task_finished_fn task_finished,
    -+				      void *pp_cb,
    -+				      struct run_process_parallel_opts *opts)
    - {
    -+	const char *tr2_category = opts->tr2_category;
    -+	const char *tr2_label = opts->tr2_label;
    - 	int result;
    +-{
    +-	int result;
      
    - 	trace2_region_enter_printf(tr2_category, tr2_label, NULL, "max:%d",
    - 				   ((n < 1) ? online_cpus() : n));
    +-	trace2_region_enter_printf(tr2_category, tr2_label, NULL, "max:%d",
    +-				   ((n < 1) ? online_cpus() : n));
    ++	if (do_trace2)
    ++		trace2_region_leave(tr2_category, tr2_label, NULL);
      
     -	result = run_processes_parallel(n, get_next_task, start_failure,
     -					task_finished, pp_cb);
    -+	result = run_processes_parallel_1(n, get_next_task, start_failure,
    -+					  task_finished, pp_cb, opts);
    - 
    - 	trace2_region_leave(tr2_category, tr2_label, NULL);
    - 
    - 	return result;
    +-
    +-	trace2_region_leave(tr2_category, tr2_label, NULL);
    +-
    +-	return result;
    ++	return 0;
      }
      
    -+int run_processes_parallel(int n, get_next_task_fn get_next_task,
    -+			   start_failure_fn start_failure,
    -+			   task_finished_fn task_finished, void *pp_cb,
    -+			   struct run_process_parallel_opts *opts)
    -+{
    -+	if (opts->tr2_category && opts->tr2_label)
    -+		return run_processes_parallel_tr2(n, get_next_task,
    -+						  start_failure, task_finished,
    -+						  pp_cb, opts);
    -+
    -+	return run_processes_parallel_1(n, get_next_task, start_failure,
    -+					task_finished, pp_cb, opts);
    -+}
    -+
    -+
      int run_auto_maintenance(int quiet)
    - {
    - 	int enabled;
     
      ## run-command.h ##
     @@ run-command.h: typedef int (*task_finished_fn)(int result,
    - 				void *pp_cb,
      				void *pp_task_cb);
      
    -+/**
    + /**
     + * Options to pass to run_processes_parallel(), { 0 }-initialized
     + * means no options. Fields:
     + *
     + * tr2_category & tr2_label: sets the trace2 category and label for
     + * logging. These must either be unset, or both of them must be set.
    ++ *
    ++ * jobs: see 'n' in run_processes_parallel() below.
    ++ *
    ++ * *_fn & data: see run_processes_parallel() below.
     + */
     +struct run_process_parallel_opts
     +{
     +	const char *tr2_category;
     +	const char *tr2_label;
    ++
    ++	int jobs;
    ++
    ++	get_next_task_fn get_next_task;
    ++	start_failure_fn start_failure;
    ++	task_finished_fn task_finished;
    ++	void *data;
     +};
     +
    - /**
    ++/**
    ++ * Options are passed via the "struct run_process_parallel_opts" above.
    ++
       * Runs up to n processes at the same time. Whenever a process can be
       * started, the callback get_next_task_fn is called to obtain the data
    +  * required to start another child process.
     @@ run-command.h: typedef int (*task_finished_fn)(int result,
    -  *
       * start_failure_fn and task_finished_fn can be NULL to omit any
       * special handling.
    -+ *
    -+ * Options are passed via a "struct run_process_parallel_opts".
       */
     -int run_processes_parallel(int n,
     -			   get_next_task_fn,
    @@ run-command.h: typedef int (*task_finished_fn)(int result,
     -int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
     -			       task_finished_fn, void *pp_cb,
     -			       const char *tr2_category, const char *tr2_label);
    -+int run_processes_parallel(int n, get_next_task_fn, start_failure_fn,
    -+			   task_finished_fn, void *pp_cb,
    -+			   struct run_process_parallel_opts *opts);
    ++int run_processes_parallel(struct run_process_parallel_opts *opts);
      
      /**
       * Convenience function which prepares env_array for a command to be run in a
    @@ submodule.c: int fetch_submodules(struct repository *r,
     +	struct run_process_parallel_opts run_opts = {
     +		.tr2_category = "submodule",
     +		.tr2_label = "parallel/fetch",
    ++
    ++		.jobs = max_parallel_jobs,
    ++
    ++		.get_next_task = get_next_submodule,
    ++		.start_failure = fetch_start_failure,
    ++		.task_finished = fetch_finish,
    ++		.data = &spf,
     +	};
      
      	spf.r = r;
    @@ submodule.c: int fetch_submodules(struct repository *r,
     -				   fetch_finish,
     -				   &spf,
     -				   "submodule", "parallel/fetch");
    -+	run_processes_parallel(max_parallel_jobs, get_next_submodule,
    -+			       fetch_start_failure, fetch_finish, &spf,
    -+			       &run_opts);
    ++	run_processes_parallel(&run_opts);
      
      	if (spf.submodules_with_errors.len > 0)
      		fprintf(stderr, _("Errors during submodule fetch:\n%s"),
     
      ## t/helper/test-run-command.c ##
     @@ t/helper/test-run-command.c: static int testsuite(int argc, const char **argv)
    + 			 "write JUnit-style XML files"),
    + 		OPT_END()
    + 	};
    ++	struct run_process_parallel_opts run_opts = {
    ++		.get_next_task = next_test,
    ++		.start_failure = test_failed,
    ++		.task_finished = test_finished,
    ++		.data = &suite,
    ++	};
    + 
    + 	argc = parse_options(argc, argv, NULL, options,
    + 			testsuite_usage, PARSE_OPT_STOP_AT_NON_OPTION);
    +@@ t/helper/test-run-command.c: static int testsuite(int argc, const char **argv)
    + 
    + 	fprintf(stderr, "Running %"PRIuMAX" tests (%d at a time)\n",
      		(uintmax_t)suite.tests.nr, max_jobs);
    ++	run_opts.jobs = max_jobs;
      
    - 	ret = run_processes_parallel(max_jobs, next_test, test_failed,
    +-	ret = run_processes_parallel(max_jobs, next_test, test_failed,
     -				     test_finished, &suite);
    -+				     test_finished, &suite, NULL);
    ++	ret = run_processes_parallel(&run_opts);
      
      	if (suite.failed.nr > 0) {
      		ret = 1;
     @@ t/helper/test-run-command.c: int cmd__run_command(int argc, const char **argv)
    - {
    - 	struct child_process proc = CHILD_PROCESS_INIT;
      	int jobs;
    + 	get_next_task_fn next_fn = NULL;
    + 	task_finished_fn finished_fn = NULL;
     +	struct run_process_parallel_opts opts = { 0 };
      
      	if (argc > 1 && !strcmp(argv[1], "testsuite"))
      		exit(testsuite(argc - 1, argv + 1));
     @@ t/helper/test-run-command.c: int cmd__run_command(int argc, const char **argv)
    + 	jobs = atoi(argv[2]);
    + 	strvec_clear(&proc.args);
    + 	strvec_pushv(&proc.args, (const char **)argv + 3);
    ++	opts.jobs = jobs;
    ++	opts.data = &proc;
      
    - 	if (!strcmp(argv[1], "run-command-parallel"))
    - 		exit(run_processes_parallel(jobs, parallel_next,
    --					    NULL, NULL, &proc));
    -+					    NULL, NULL, &proc, &opts));
    - 
    - 	if (!strcmp(argv[1], "run-command-abort"))
    --		exit(run_processes_parallel(jobs, parallel_next,
    --					    NULL, task_finished, &proc));
    -+		exit(run_processes_parallel(jobs, parallel_next, NULL,
    -+					    task_finished, &proc, &opts));
    - 
    - 	if (!strcmp(argv[1], "run-command-no-jobs"))
    --		exit(run_processes_parallel(jobs, no_job,
    --					    NULL, task_finished, &proc));
    -+		exit(run_processes_parallel(jobs, no_job, NULL, task_finished,
    -+					    &proc, &opts));
    + 	if (!strcmp(argv[1], "run-command-parallel")) {
    + 		next_fn = parallel_next;
    +@@ t/helper/test-run-command.c: int cmd__run_command(int argc, const char **argv)
    + 		return 1;
    + 	}
      
    - 	fprintf(stderr, "check usage\n");
    - 	return 1;
    +-	exit(run_processes_parallel(jobs, next_fn, NULL, finished_fn, &proc));
    ++	opts.get_next_task = next_fn;
    ++	opts.task_finished = finished_fn;
    ++	exit(run_processes_parallel(&opts));
    + }
2:  d9c9b158130 ! 3:  a8e1fc07b65 run-command tests: test stdout of run_command_parallel()
    @@ t/t0061-run-command.sh: World
      
      test_expect_success 'run_command runs in parallel with more jobs available than tasks' '
     -	test-tool run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
    -+	test-tool run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>actual &&
    +-	test_cmp expect actual
    ++	test-tool run-command run-command-parallel 5 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
     +	test_must_be_empty out &&
    - 	test_cmp expect actual
    ++	test_cmp expect err
      '
      
      test_expect_success 'run_command runs in parallel with as many jobs as tasks' '
     -	test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
    -+	test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>actual &&
    +-	test_cmp expect actual
    ++	test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
     +	test_must_be_empty out &&
    - 	test_cmp expect actual
    ++	test_cmp expect err
      '
      
      test_expect_success 'run_command runs in parallel with more tasks than jobs available' '
     -	test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
    -+	test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>actual &&
    +-	test_cmp expect actual
    ++	test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
     +	test_must_be_empty out &&
    - 	test_cmp expect actual
    ++	test_cmp expect err
      '
      
    + cat >expect <<-EOF
     @@ t/t0061-run-command.sh: asking for a quick stop
      EOF
      
      test_expect_success 'run_command is asked to abort gracefully' '
     -	test-tool run-command run-command-abort 3 false 2>actual &&
    -+	test-tool run-command run-command-abort 3 false >out 2>actual &&
    +-	test_cmp expect actual
    ++	test-tool run-command run-command-abort 3 false >out 2>err &&
     +	test_must_be_empty out &&
    - 	test_cmp expect actual
    ++	test_cmp expect err
      '
      
    + cat >expect <<-EOF
     @@ t/t0061-run-command.sh: no further jobs available
      EOF
      
      test_expect_success 'run_command outputs ' '
     -	test-tool run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" 2>actual &&
    -+	test-tool run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>actual &&
    +-	test_cmp expect actual
    ++	test-tool run-command run-command-no-jobs 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
     +	test_must_be_empty out &&
    - 	test_cmp expect actual
    ++	test_cmp expect err
      '
      
    + test_trace () {
-:  ----------- > 4:  663936fb4ad run-command.c: add an initializer for "struct parallel_processes"
3:  d76f63c2948 ! 5:  c2e015ed840 run-command: add an "ungroup" option to run_process_parallel()
    @@ run-command.c: struct parallel_processes {
      	struct pollfd *pfd;
      
      	unsigned shutdown : 1;
    -+	unsigned ungroup:1;
    ++	unsigned ungroup : 1;
      
      	int output_owner;
      	struct strbuf buffered_output; /* of finished children */
     @@ run-command.c: static void pp_init(struct parallel_processes *pp,
    - 		    get_next_task_fn get_next_task,
    - 		    start_failure_fn start_failure,
    - 		    task_finished_fn task_finished,
    --		    void *data)
    -+		    void *data, struct run_process_parallel_opts *opts)
    - {
    -+	const int ungroup = opts->ungroup;
    - 	int i;
    - 
    - 	if (n < 1)
    -@@ run-command.c: static void pp_init(struct parallel_processes *pp,
    - 	pp->start_failure = start_failure ? start_failure : default_start_failure;
    - 	pp->task_finished = task_finished ? task_finished : default_task_finished;
    - 
    -+	pp->ungroup = ungroup;
    -+
      	pp->nr_processes = 0;
      	pp->output_owner = 0;
      	pp->shutdown = 0;
    ++	pp->ungroup = opts->ungroup;
      	CALLOC_ARRAY(pp->children, n);
     -	CALLOC_ARRAY(pp->pfd, n);
    -+	if (!ungroup)
    ++	if (!pp->ungroup)
     +		CALLOC_ARRAY(pp->pfd, n);
    -+
    - 	strbuf_init(&pp->buffered_output, 0);
      
      	for (i = 0; i < n; i++) {
      		strbuf_init(&pp->children[i].err, 0);
      		child_process_init(&pp->children[i].process);
    -+		if (ungroup)
    ++		if (!pp->pfd)
     +			continue;
      		pp->pfd[i].events = POLLIN | POLLHUP;
      		pp->pfd[i].fd = -1;
      	}
    -@@ run-command.c: static void pp_init(struct parallel_processes *pp,
    - 
    - static void pp_cleanup(struct parallel_processes *pp)
    - {
    -+	const int ungroup = pp->ungroup;
    - 	int i;
    - 
    - 	trace_printf("run_processes_parallel: done");
     @@ run-command.c: static void pp_cleanup(struct parallel_processes *pp)
    - 	}
    - 
    - 	free(pp->children);
    --	free(pp->pfd);
    -+	if (!ungroup)
    -+		free(pp->pfd);
    - 
    - 	/*
      	 * When get_next_task added messages to the buffer in its last
      	 * iteration, the buffered output is non empty.
      	 */
     -	strbuf_write(&pp->buffered_output, stderr);
    --	strbuf_release(&pp->buffered_output);
    -+	if (!ungroup) {
    ++	if (!pp->ungroup)
     +		strbuf_write(&pp->buffered_output, stderr);
    -+		strbuf_release(&pp->buffered_output);
    -+	}
    + 	strbuf_release(&pp->buffered_output);
      
      	sigchain_pop_common();
    - }
     @@ run-command.c: static void pp_cleanup(struct parallel_processes *pp)
       */
      static int pp_start_one(struct parallel_processes *pp)
    @@ run-command.c: static int pp_start_one(struct parallel_processes *pp)
      	}
     -	pp->children[i].process.err = -1;
     -	pp->children[i].process.stdout_to_stderr = 1;
    --	pp->children[i].process.no_stdin = 1;
    -+
     +	if (!ungroup) {
     +		pp->children[i].process.err = -1;
     +		pp->children[i].process.stdout_to_stderr = 1;
    -+		pp->children[i].process.no_stdin = 1;
     +	}
    + 	pp->children[i].process.no_stdin = 1;
      
      	if (start_command(&pp->children[i].process)) {
     -		code = pp->start_failure(&pp->children[i].err,
    @@ run-command.c: static int pp_start_one(struct parallel_processes *pp)
      	pp->nr_processes++;
      	pp->children[i].state = GIT_CP_WORKING;
     -	pp->pfd[i].fd = pp->children[i].process.err;
    -+	if (!ungroup)
    ++	if (pp->pfd)
     +		pp->pfd[i].fd = pp->children[i].process.err;
      	return 0;
      }
    @@ run-command.c: static int pp_collect_finished(struct parallel_processes *pp)
      		pp->nr_processes--;
      		pp->children[i].state = GIT_CP_FREE;
     -		pp->pfd[i].fd = -1;
    -+		if (!ungroup)
    ++		if (pp->pfd)
     +			pp->pfd[i].fd = -1;
      		child_process_init(&pp->children[i].process);
      
     -		if (i != pp->output_owner) {
     +		if (ungroup) {
    -+			/* no strbuf_*() work to do here */
    ++			; /* no strbuf_*() work to do here */
     +		} else if (i != pp->output_owner) {
      			strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
      			strbuf_reset(&pp->children[i].err);
      		} else {
    -@@ run-command.c: static int run_processes_parallel_1(int n, get_next_task_fn get_next_task,
    - 				    void *pp_cb,
    - 				    struct run_process_parallel_opts *opts)
    - {
    -+	const int ungroup = opts->ungroup;
    - 	int i, code;
    - 	int output_timeout = 100;
    - 	int spawn_cap = 4;
    - 	struct parallel_processes pp;
    - 
    --	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb);
    -+	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb,
    -+		opts);
    - 	while (1) {
    - 		for (i = 0;
    - 		    i < spawn_cap && !pp.shutdown &&
    -@@ run-command.c: static int run_processes_parallel_1(int n, get_next_task_fn get_next_task,
    +@@ run-command.c: int run_processes_parallel(struct run_process_parallel_opts *opts)
      		}
      		if (!pp.nr_processes)
      			break;
     -		pp_buffer_stderr(&pp, output_timeout);
     -		pp_output(&pp);
    -+		if (ungroup) {
    ++		if (opts->ungroup) {
     +			pp_mark_working_for_cleanup(&pp);
     +		} else {
     +			pp_buffer_stderr(&pp, output_timeout);
    @@ run-command.h: typedef int (*start_failure_fn)(struct strbuf *out,
       * pp_task_cb is the callback cookie as passed into get_next_task_fn.
     @@ run-command.h: typedef int (*task_finished_fn)(int result,
       *
    -  * tr2_category & tr2_label: sets the trace2 category and label for
    -  * logging. These must either be unset, or both of them must be set.
    -+ *
    +  * jobs: see 'n' in run_processes_parallel() below.
    +  *
     + * ungroup: Ungroup output. Output is printed as soon as possible and
     + * bypasses run-command's internal processing. This may cause output
     + * from different commands to be mixed.
    ++ *
    +  * *_fn & data: see run_processes_parallel() below.
       */
      struct run_process_parallel_opts
    - {
    - 	const char *tr2_category;
    +@@ run-command.h: struct run_process_parallel_opts
      	const char *tr2_label;
    + 
    + 	int jobs;
     +	unsigned int ungroup:1;
    - };
      
    - /**
    + 	get_next_task_fn get_next_task;
    + 	start_failure_fn start_failure;
     @@ run-command.h: struct run_process_parallel_opts
       *
       * The children started via this function run in parallel. Their output
    @@ run-command.h: struct run_process_parallel_opts
       *
       * start_failure_fn and task_finished_fn can be NULL to omit any
       * special handling.
    -  *
    -- * Options are passed via a "struct run_process_parallel_opts".
    -+ * Options are passed via a "struct run_process_parallel_opts". If the
    -+ * "ungroup" option isn't specified the callbacks will get a pointer
    -+ * to a "struct strbuf *out", and must not write to stdout or stderr
    -+ * as such output will mess up the output of the other parallel
    ++ *
    ++ * If the "ungroup" option isn't specified the callbacks will get a
    ++ * pointer to a "struct strbuf *out", and must not write to stdout or
    ++ * stderr as such output will mess up the output of the other parallel
     + * processes. If "ungroup" option is specified callbacks will get a
     + * NULL "struct strbuf *out" parameter, and are responsible for
     + * emitting their own output, including dealing with any race
     + * conditions due to writing in parallel to stdout and stderr.
       */
    - int run_processes_parallel(int n, get_next_task_fn, start_failure_fn,
    - 			   task_finished_fn, void *pp_cb,
    + int run_processes_parallel(struct run_process_parallel_opts *opts);
    + 
     
      ## t/helper/test-run-command.c ##
     @@ t/helper/test-run-command.c: static int parallel_next(struct child_process *cp,
    @@ t/helper/test-run-command.c: static int task_finished(int result,
      }
      
     @@ t/helper/test-run-command.c: int cmd__run_command(int argc, const char **argv)
    - 	strvec_clear(&proc.args);
    - 	strvec_pushv(&proc.args, (const char **)argv + 3);
    + 	opts.jobs = jobs;
    + 	opts.data = &proc;
      
    --	if (!strcmp(argv[1], "run-command-parallel"))
    +-	if (!strcmp(argv[1], "run-command-parallel")) {
     +	if (!strcmp(argv[1], "run-command-parallel") ||
     +	    !strcmp(argv[1], "run-command-parallel-ungroup")) {
    -+		opts.ungroup = !strcmp(argv[1], "run-command-parallel-ungroup");
    - 		exit(run_processes_parallel(jobs, parallel_next,
    - 					    NULL, NULL, &proc, &opts));
    -+	}
    - 
    --	if (!strcmp(argv[1], "run-command-abort"))
    -+	if (!strcmp(argv[1], "run-command-abort") ||
    -+	    !strcmp(argv[1], "run-command-abort-ungroup")) {
    -+		opts.ungroup = !strcmp(argv[1], "run-command-abort-ungroup");
    - 		exit(run_processes_parallel(jobs, parallel_next, NULL,
    - 					    task_finished, &proc, &opts));
    -+	}
    - 
    --	if (!strcmp(argv[1], "run-command-no-jobs"))
    -+	if (!strcmp(argv[1], "run-command-no-jobs") ||
    -+	    !strcmp(argv[1], "run-command-no-jobs-ungroup")) {
    -+		opts.ungroup = !strcmp(argv[1], "run-command-no-jobs-ungroup");
    - 		exit(run_processes_parallel(jobs, no_job, NULL, task_finished,
    - 					    &proc, &opts));
    -+	}
    + 		next_fn = parallel_next;
    +-	} else if (!strcmp(argv[1], "run-command-abort")) {
    ++	} else if (!strcmp(argv[1], "run-command-abort") ||
    ++		   !strcmp(argv[1], "run-command-abort-ungroup")) {
    + 		next_fn = parallel_next;
    + 		finished_fn = task_finished;
    +-	} else if (!strcmp(argv[1], "run-command-no-jobs")) {
    ++	} else if (!strcmp(argv[1], "run-command-no-jobs") ||
    ++		   !strcmp(argv[1], "run-command-no-jobs-ungroup")) {
    + 		next_fn = no_job;
    + 		finished_fn = task_finished;
    + 	} else {
    +@@ t/helper/test-run-command.c: int cmd__run_command(int argc, const char **argv)
    + 		return 1;
    + 	}
      
    - 	fprintf(stderr, "check usage\n");
    - 	return 1;
    ++	opts.ungroup = ends_with(argv[1], "-ungroup");
    + 	opts.get_next_task = next_fn;
    + 	opts.task_finished = finished_fn;
    + 	exit(run_processes_parallel(&opts));
     
      ## t/t0061-run-command.sh ##
     @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with more jobs available than
    - 	test_cmp expect actual
    + 	test_cmp expect err
      '
      
     +test_expect_success 'run_command runs ungrouped in parallel with more jobs available than tasks' '
    @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with m
     +'
     +
      test_expect_success 'run_command runs in parallel with as many jobs as tasks' '
    - 	test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>actual &&
    + 	test-tool run-command run-command-parallel 4 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
      	test_must_be_empty out &&
    - 	test_cmp expect actual
    + 	test_cmp expect err
      '
      
     +test_expect_success 'run_command runs ungrouped in parallel with as many jobs as tasks' '
    @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with m
     +'
     +
      test_expect_success 'run_command runs in parallel with more tasks than jobs available' '
    - 	test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>actual &&
    + 	test-tool run-command run-command-parallel 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
      	test_must_be_empty out &&
    - 	test_cmp expect actual
    + 	test_cmp expect err
      '
      
     +test_expect_success 'run_command runs ungrouped in parallel with more tasks than jobs available' '
    @@ t/t0061-run-command.sh: test_expect_success 'run_command runs in parallel with m
      preloaded output of a child
      asking for a quick stop
     @@ t/t0061-run-command.sh: test_expect_success 'run_command is asked to abort gracefully' '
    - 	test_cmp expect actual
    + 	test_cmp expect err
      '
      
     +test_expect_success 'run_command is asked to abort gracefully (ungroup)' '
    @@ t/t0061-run-command.sh: test_expect_success 'run_command is asked to abort grace
      no further jobs available
      EOF
     @@ t/t0061-run-command.sh: test_expect_success 'run_command outputs ' '
    - 	test_cmp expect actual
    + 	test_cmp expect err
      '
      
     +test_expect_success 'run_command outputs (ungroup) ' '
    -+	test-tool run-command run-command-no-jobs-ungroup 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>actual &&
    ++	test-tool run-command run-command-no-jobs-ungroup 3 sh -c "printf \"%s\n%s\n\" Hello World" >out 2>err &&
     +	test_must_be_empty out &&
    -+	test_cmp expect actual
    ++	test_cmp expect err
     +'
     +
      test_trace () {
4:  cf62569b2e0 = 6:  84e92c6f7c7 hook tests: fix redirection logic error in 96e7225b310
5:  98c26c9917b ! 7:  bf7d871565f hook API: don't redundantly re-set "no_stdin" and "stdout_to_stderr"
    @@ Commit message
         hook API: don't redundantly re-set "no_stdin" and "stdout_to_stderr"
     
         Amend code added in 96e7225b310 (hook: add 'run' subcommand,
    -    2021-12-22) top stop setting these two flags. We use the
    +    2021-12-22) to stop setting these two flags. We use the
         run_process_parallel() API added in c553c72eed6 (run-command: add an
         asynchronous parallel child processor, 2015-12-15), which always sets
         these in pp_start_one() (in addition to setting .err = -1).
6:  de3664f6d2b ! 8:  238155fcb9d hook API: fix v2.36.0 regression: hooks should be connected to a TTY
    @@ Commit message
                   './git hook run seq-hook' in 'HEAD~0' ran
                     1.30 ± 0.02 times faster than './git hook run seq-hook' in 'origin/master'
     
    -    In the preceding commit we removed the "no_stdin=1" and
    -    "stdout_to_stderr=1" assignments. This change brings them back as with
    -    ".ungroup=1" the run_process_parallel() function doesn't provide them
    -    for us implicitly.
    +    In the preceding commit we removed the "stdout_to_stderr=1" assignment
    +    as being redundant. This change brings it back as with ".ungroup=1"
    +    the run_process_parallel() function doesn't provide them for us
    +    implicitly.
     
         As an aside omitting the stdout_to_stderr=1 here would have all tests
         pass, except those that test "git hook run" itself in
    @@ Commit message
     
      ## hook.c ##
     @@ hook.c: static int pick_next_hook(struct child_process *cp,
    - 	if (!hook_path)
      		return 0;
      
    -+	cp->no_stdin = 1;
      	strvec_pushv(&cp->env_array, hook_cb->options->env.v);
    -+	cp->stdout_to_stderr = 1;
    ++	cp->stdout_to_stderr = 1; /* because of .ungroup = 1 */
      	cp->trace2_hook_name = hook_cb->hook_name;
      	cp->dir = hook_cb->options->dir;
      
     @@ hook.c: int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options)
    - 		.options = options,
    - 	};
    - 	const char *const hook_path = find_hook(hook_name);
    --	int jobs = 1;
    -+	const int jobs = 1;
    - 	int ret = 0;
    - 	struct run_process_parallel_opts run_opts = {
    - 		.tr2_category = "hook",
      		.tr2_label = hook_name,
    + 
    + 		.jobs = jobs,
     +		.ungroup = jobs == 1,
    - 	};
      
    + 		.get_next_task = pick_next_hook,
    + 		.start_failure = notify_start_failure,
    +@@ hook.c: int run_hooks_opt(const char *hook_name, struct run_hooks_opt *options)
      	if (!options)
      		BUG("a struct run_hooks_opt must be provided to run_hooks");

Comments

Johannes Schindelin May 25, 2022, 11:30 a.m. UTC | #1
Hi Ævar,

as promised in the Git IRC Standup [*1*], a review.

On Wed, 18 May 2022, Ævar Arnfjörð Bjarmason wrote:

> Ævar Arnfjörð Bjarmason (8):
>   run-command tests: change if/if/... to if/else if/else
>   run-command API: use "opts" struct for run_processes_parallel{,_tr2}()
>   run-command tests: test stdout of run_command_parallel()
>   run-command.c: add an initializer for "struct parallel_processes"
>   run-command: add an "ungroup" option to run_process_parallel()
>   hook tests: fix redirection logic error in 96e7225b310
>   hook API: don't redundantly re-set "no_stdin" and "stdout_to_stderr"
>   hook API: fix v2.36.0 regression: hooks should be connected to a TTY

I started reviewing the patches individually, but have some higher-level
concerns that put my per-patch review on hold.

Keeping in mind that the intention is to fix a regression that was
introduced by way of refactoring (most of our recent regressions seem to
share that trait [*2*]), I strongly advise against another round of
refactoring [*3*], especially against tying it to fix a regression.

In this instance, it would be very easy to fix the bug without any
refactoring. In a nutshell, the manifestation of the bug amplifies this
part of the commit message of 96e7225b310 (hook: add 'run' subcommand,
2021-12-22):

    Some of the implementation here, such as a function being named
    run_hooks_opt() when it's tasked with running one hook, to using the
    run_processes_parallel_tr2() API to run with jobs=1 is somewhere
    between a bit odd and and an overkill for the current features of this
    "hook run" command and the hook.[ch] API.

It is this switch to `run_processes_parallel()` that is the root cause of
the regression.

The current iteration of the patch series does not fix that.

In the commit message from which I quoted, the plan is laid out to
eventually run more than one hook. If that is still the plan, we will be
presented with the unfortunate choice to either never running them in
parallel, or alternatively reintroducing the regression where the hooks
run detached from stdin/stdout/stderr.

It is pretty clear that there is no actual choice, and the hooks will
never be able to run in parallel. Therefore, the fix should move
`run_hooks_opt()` away from calling `run_processes_parallel()`.

In any case, regression fixes should not be mixed with refactorings unless
the latter make the former easier, which is not the case here.

Ciao,
Johannes

Footnote *1*:
https://colabti.org/irclogger/irclogger_log/git-devel?date=2022-05-23#l44

Footnote *2*: I say "seem" because it would take a proper retro to analyze
what was the reason for the uptick in regressions, and even more
importantly to analyze what we can learn from the experience.

Footnote *3*: The refactoring, as Junio suspected, might very well go a
bit over board. Even if a new variation of the `run_processes_parallel()`
function that takes a struct should be necessary, it would be easy -- and
desirable -- to keep the current function signatures unchanged and simply
turn them into shims that then call the new variant. That would make the
refactoring much easier to review, and in turn it would make it less
likely to introduce another regression.
Ævar Arnfjörð Bjarmason May 25, 2022, 1 p.m. UTC | #2
On Wed, May 25 2022, Johannes Schindelin wrote:

> Hi Ævar,
>
> as promised in the Git IRC Standup [*1*], a review.
>
> On Wed, 18 May 2022, Ævar Arnfjörð Bjarmason wrote:
>
>> Ævar Arnfjörð Bjarmason (8):
>>   run-command tests: change if/if/... to if/else if/else
>>   run-command API: use "opts" struct for run_processes_parallel{,_tr2}()
>>   run-command tests: test stdout of run_command_parallel()
>>   run-command.c: add an initializer for "struct parallel_processes"
>>   run-command: add an "ungroup" option to run_process_parallel()
>>   hook tests: fix redirection logic error in 96e7225b310
>>   hook API: don't redundantly re-set "no_stdin" and "stdout_to_stderr"
>>   hook API: fix v2.36.0 regression: hooks should be connected to a TTY
>
> I started reviewing the patches individually, but have some higher-level
> concerns that put my per-patch review on hold.
>
> Keeping in mind that the intention is to fix a regression that was
> introduced by way of refactoring (most of our recent regressions seem to
> share that trait [*2*]), I strongly advise against another round of
> refactoring [*3*], especially against tying it to fix a regression.
>
> In this instance, it would be very easy to fix the bug without any
> refactoring. In a nutshell, the manifestation of the bug amplifies this
> part of the commit message of 96e7225b310 (hook: add 'run' subcommand,
> 2021-12-22):
>
>     Some of the implementation here, such as a function being named
>     run_hooks_opt() when it's tasked with running one hook, to using the
>     run_processes_parallel_tr2() API to run with jobs=1 is somewhere
>     between a bit odd and and an overkill for the current features of this
>     "hook run" command and the hook.[ch] API.
>
> It is this switch to `run_processes_parallel()` that is the root cause of
> the regression.

Yes, or more generally to the new hook API which makes use of it.

> The current iteration of the patch series does not fix that.

Because the plan is still to continue in this direction and go for
Emily's config-based hooks, which will run in parallel.

And to fix that would at this point be a larger functional change,
because we'd be running with more code we haven't tested before,
i.e. hook.[ch] on some new backend. So just passing down the appropriate
flags to have run-command.[ch] do the right thing for us seemed to be
the least bad option.

> In the commit message from which I quoted, the plan is laid out to
> eventually run more than one hook. If that is still the plan, we will be
> presented with the unfortunate choice to either never running them in
> parallel, or alternatively reintroducing the regression where the hooks
> run detached from stdin/stdout/stderr.

No, because you can have N processes all connected to a terminal with
"ungroup", what you can't do is guarantee that they won't interleave.

But as discussed in some previous threads that would be OK, since that
would come as an opt-in to parallel hook execution. I.e. you could pick
one of:

 1. Current behavior
 2. Our parallel hook execution (whatever "ungroup" etc. settings that entails)
 3. Your own parallel hook execution

It only matters that we don't regress in #1, for #2 we could have
different behavior, but just document the caveats as such.

IOW it's OK if you run parallel hooks and we decide that they won't be
connected to a terminal, because that's a new feature we don't have yet,
one you'd need to opt into.

> It is pretty clear that there is no actual choice, and the hooks will
> never be able to run in parallel. Therefore, the fix should move
> `run_hooks_opt()` away from calling `run_processes_parallel()`.

They will run in parallel, see above.

> In any case, regression fixes should not be mixed with refactorings unless
> the latter make the former easier, which is not the case here.

I noted upthread/side-thread (in any case, in discussions around this)
that I wished I'd come up with something smaller, but couldn't.

If you want to try your hand at that I'd love to see it.

But basically migrating the hook API to a new "backend" would also be a
large change, so would making the bare minumum change in
run-command.[ch].

But hey, I might be wrong. So if you think it's obvious that this could
be much smaller I'd love to see patches for it...

> Footnote *1*:
> https://colabti.org/irclogger/irclogger_log/git-devel?date=2022-05-23#l44
>
> Footnote *2*: I say "seem" because it would take a proper retro to analyze
> what was the reason for the uptick in regressions, and even more
> importantly to analyze what we can learn from the experience.

Yes, that might be interesting.

I'll only note that I think you're focusing on the wrong thing here with
"refactorings".

If you look at the history of this hooks API topic it started early on
with a version where the config-based hooks + parallelism (currently not
here yet) were combined with making the existing hook users use the new
API (partially here now).

Now, I suggested that be split up so that we'd first re-implement all
existing hooks on the new API, and *then* perform any feature changes.

Except of course by doing so that alters the nature of those changes in
your definition, I assume, i.e. it goes from a feature series to
"refactorings".

Whereas I think the important thing to optimize for is to make smaller
incremental changes. Here we had a bug, and it's relatively easy to fix
it, it would be much harder if we had a bigger delta in v2.36 with not
only this bug, but some other regressions.

Which isn't hypothetical b.t.w., until 3-4 months ago nobody had seen
that the config-based hooks topic we had kicking around had a severe
performance regression. I found it and Emily & I have been kicking
around a fix for it (mostly off-list).

But if we'd done that we'd have a more broken release, but we also
wouldn't have "refactorings". I.e. the run_parallel API would actually
be used, but we'd have this breakage plus some others.

Anyway, I think there's lots of things we could probably do better in
delivering more reliable software. I'm just pointing out that here that
I think focusing on a part of a larger progression from A..B and saying
that it refactored something as being bad is to make a categorical
mistake. Because a re-doing of that state to make each step not have any
of those would result in larger change deltas.

> Footnote *3*: The refactoring, as Junio suspected, might very well go a
> bit over board. Even if a new variation of the `run_processes_parallel()`
> function that takes a struct should be necessary, it would be easy -- and
> desirable -- to keep the current function signatures unchanged and simply
> turn them into shims that then call the new variant. That would make the
> refactoring much easier to review, and in turn it would make it less
> likely to introduce another regression.

Sure, we could instead add a third variant to it in addition to the two
on "master", instead of unifying them as is done here.

But per the v1 feedback the consensus seemed to be that this was a good
direction, and to the extent that there were objections it was that I
should add the rest of the arguments to the "opts" struct.

But again, I'm fully open to that, I tried that and didn't think the end
result was any simpler to review, but perhaps you'd like to try...
Junio C Hamano May 25, 2022, 4:57 p.m. UTC | #3
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Keeping in mind that the intention is to fix a regression that was
> introduced by way of refactoring (most of our recent regressions seem to
> share that trait [*2*]), I strongly advise against another round of
> refactoring [*3*], especially against tying it to fix a regression.

I share this sentiment.

> In this instance, it would be very easy to fix the bug without any
> refactoring. In a nutshell, the manifestation of the bug amplifies this
> part of the commit message of 96e7225b310 (hook: add 'run' subcommand,
> 2021-12-22):
>
>     Some of the implementation here, such as a function being named
>     run_hooks_opt() when it's tasked with running one hook, to using the
>     run_processes_parallel_tr2() API to run with jobs=1 is somewhere
>     between a bit odd and and an overkill for the current features of this
>     "hook run" command and the hook.[ch] API.
>
> It is this switch to `run_processes_parallel()` that is the root cause of
> the regression.
>
> The current iteration of the patch series does not fix that.

True.

> In the commit message from which I quoted, the plan is laid out to
> eventually run more than one hook. If that is still the plan, we will be
> presented with the unfortunate choice to either never running them in
> parallel, or alternatively reintroducing the regression where the hooks
> run detached from stdin/stdout/stderr.

I had a similar impression before I reviewed the code after the
regression report, but if I read the code before the breakage
correctly, I think we didn't change the handling of the standard
input stream with the series from Emily/Ævar that broke the hooks.

The regression is the output streams are no longer _directly_
connected to the outside world, and instead to our internal relay
that buffers.  The run_hook_ve() helper did set .no_stdin to 1
before doing run_command() in Git 2.35.  The series with regression
does the same in pick_next_hook() callback in hook.c.  Both also set
.stdout_to_stderr to 1, so the apparent output should not change.

> It is pretty clear that there is no actual choice, and the hooks will
> never be able to run in parallel. Therefore, the fix should move
> `run_hooks_opt()` away from calling `run_processes_parallel()`.

My take on it is slightly different.

I personally do not think we should run hooks in parallel ourselves,
but if hook-like things, which Emily and Ævar want, want run in
parallel, we can safely allow them to do so.  No current users have
ever seen such hook-like things specified in their configuration
files---as long as it is clearly documented that these hook-like
things are not connected to the original standard output or error,
and they may run in parallel and whatever inter-process coordination
is their responsibility, there is no regression.  It is a brand new
feature.

The mechanism that supports that hook-like things should have a
compatibility mode, if it ever wants to take responsibility of
running the traditional hooks as part of its offering.  I think the
right way to do so is follows:

 - Unless each hook-like thing explicitly asks, it does not run in
   parallel with other hook-like things, and its output stream is
   connected directly to the original output stream.  They can run
   without involving the run_processes_parallel() at all.

 - When the traditional on-disk hooks are treated as if it is one of
   these hook-like things, the compatibility mode should be set to
   on for them without any user interaction.

 - Only the new stuff written specifically to be used as these shiny
   new hook-like things would explicitly ask to run in parallel and
   emit to the output multiplexer.

Doing things that way would pave the way forward to allow new stuff
to work differently, without breaking existing stuff people have,
wouldn't it?

> In any case, regression fixes should not be mixed with refactorings unless
> the latter make the former easier, which is not the case here.

Absolutely.  I wonder how involved is would be to revert the merge
of the whole thing from 'master'.  It may give us a clean slate to
rethink the whole mess and redo it without breaking the existing
users' hooks.
Junio C Hamano May 26, 2022, 1:10 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Absolutely.  I wonder how involved is would be to revert the merge
> of the whole thing from 'master'.  It may give us a clean slate to
> rethink the whole mess and redo it without breaking the existing
> users' hooks.

I tried the revert, and the result compiled and tested OK, but I am
tempted to say that it looks as if the topic was deliberately
designed to make it hard to revert by taking as much stuff hostage
as possible.

At least one fix that depends on the run_hooks_opt structure
introduced by c70bc338 (Merge branch 'ab/config-based-hooks-2',
2022-02-09) needs to be discarded.  7431379a (Merge branch
'ab/racy-hooks', 2022-03-16) did address an issue worth addressing,
so even if we revert the whole c70bc338, we would want to redo the
fix, possibly in some other way.  But it also needed an "oops that
was wrong, here is an attempt to fix it again" by cb3b3974 (Merge
branch 'ab/racy-hooks', 2022-03-30).  The situation is quite ugly.

As you hinted in the message I responded to in the message I am
responding to, if we can make a surgical fix to make the new and
improved run_hooks_opt() API build on top of run_command(), instead
on top of run_processes_parallel(), that would give us a cleaner way
out than discarding everything and redoing them "the right way".  At
least, the external interface into the API (read: the impression you
would get by "less hook.h") does not look too bad.

Thanks.
Ævar Arnfjörð Bjarmason May 26, 2022, 10:16 a.m. UTC | #5
On Wed, May 25 2022, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Absolutely.  I wonder how involved is would be to revert the merge
>> of the whole thing from 'master'.  It may give us a clean slate to
>> rethink the whole mess and redo it without breaking the existing
>> users' hooks.
>
> I tried the revert, and the result compiled and tested OK, but I am
> tempted to say that it looks as if the topic was deliberately
> designed to make it hard to revert by taking as much stuff hostage
> as possible.

No, it's just that...

> At least one fix that depends on the run_hooks_opt structure
> introduced by c70bc338 (Merge branch 'ab/config-based-hooks-2',
> 2022-02-09) needs to be discarded.  7431379a (Merge branch
> 'ab/racy-hooks', 2022-03-16) did address an issue worth addressing,

...we've made some use of the API since then, including for that bug
fix...

> so even if we revert the whole c70bc338, we would want to redo the
> fix, possibly in some other way.  But it also needed an "oops that
> was wrong, here is an attempt to fix it again" by cb3b3974 (Merge
> branch 'ab/racy-hooks', 2022-03-30).  The situation is quite ugly.

...although for that last one if you're considering reverting that fix
too to back out of the topic(s) it should be relatively easy to deal
with that one.

> As you hinted in the message I responded to in the message I am
> responding to, if we can make a surgical fix to make the new and
> improved run_hooks_opt() API build on top of run_command(), instead
> on top of run_processes_parallel(), that would give us a cleaner way
> out than discarding everything and redoing them "the right way".  At
> least, the external interface into the API (read: the impression you
> would get by "less hook.h") does not look too bad.

I have a pending re-roll of this topic structured the way it is now (but
with fixes for outstanding issues).

I understand your suggestion here to use the non-parallel API, and the
reluctance to have a relatively large regression fix.

I haven't come up with a patch in this direction, and I'll try before a
re-roll, but I can't see how we wouldn't end up with code that's an even
larger logical change as a result.

I.e. this would require rewriting a large part of hook.[ch] which is
currently structured around the callback API, and carefully coming up
with the equivalent non-parallel API pattern for it.

Whereas the current direction is more boilerplate for sure, but keeps
all of that existing behavior, and just narrowly adjust what options we
pass down to the "struct child_process" in that case.

I can try to come up with it (and delay the current re-roll I have
that's almost ready), but I really think that reviewing such a change
will be much harder.

The current proposal is large by line count, but it's relatively easy to
skim it and assure oneself that a new parameter is being passed in, and
that all the proposed behavior change applies only to the one caller
that passes in that new parameter.

Whereas switching to a new non-callback based API will require carefully
going over the parallel API line-by-line, assuring oneself that the
non-callback version is really doing the same thing etc.
Junio C Hamano May 26, 2022, 4:36 p.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> The current proposal is large by line count, but it's relatively easy to
> skim it and assure oneself that a new parameter is being passed in, and
> that all the proposed behavior change applies only to the one caller
> that passes in that new parameter.
>
> Whereas switching to a new non-callback based API will require carefully
> going over the parallel API line-by-line, assuring oneself that the
> non-callback version is really doing the same thing etc.

I was worried about something like that when I wrote (admittedly
unfairly, in a somewhat frustrated state) that the series was
designed to be hard to revert.  The reverting itself was reasonably
easy if the "did we invoke the hook, really?" topic is discarded at
the same time, but if was done with too much rearchitecting, it is
understandable to become cumbersome to review X-<.

I wonder if rebuilding from scratch is easier to review, then?  The
first three patches of such a series would be

 - Revert cb3b3974 (Merge branch 'ab/racy-hooks', 2022-03-30)
 - Revert 7431379a (Merge branch 'ab/racy-hooks', 2022-03-16)
 - Revert c70bc338 (Merge branch 'ab/config-based-hooks-2', 2022-02-09)

and then the rest would rebuild what used to be in the original
series on top.  There will be a lot of duplicate patches between
that "the rest" and the patches in the original series (e.g. I would
imagine that the resulting hook.h would look more or less
identical), but "git range-diff" may be able to trim it down by
comparing between "the rest" and "c70bc338^..c70bc338^2" (aka
ab/config-based-hooks-2).  I dunno.

Thanks.
Ævar Arnfjörð Bjarmason May 26, 2022, 7:13 p.m. UTC | #7
On Thu, May 26 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> The current proposal is large by line count, but it's relatively easy to
>> skim it and assure oneself that a new parameter is being passed in, and
>> that all the proposed behavior change applies only to the one caller
>> that passes in that new parameter.
>>
>> Whereas switching to a new non-callback based API will require carefully
>> going over the parallel API line-by-line, assuring oneself that the
>> non-callback version is really doing the same thing etc.
>
> I was worried about something like that when I wrote (admittedly
> unfairly, in a somewhat frustrated state) that the series was
> designed to be hard to revert.  The reverting itself was reasonably
> easy if the "did we invoke the hook, really?" topic is discarded at
> the same time, but if was done with too much rearchitecting, it is
> understandable to become cumbersome to review X-<.
>
> I wonder if rebuilding from scratch is easier to review, then?  The
> first three patches of such a series would be
>
>  - Revert cb3b3974 (Merge branch 'ab/racy-hooks', 2022-03-30)
>  - Revert 7431379a (Merge branch 'ab/racy-hooks', 2022-03-16)
>  - Revert c70bc338 (Merge branch 'ab/config-based-hooks-2', 2022-02-09)
>
> and then the rest would rebuild what used to be in the original
> series on top.  There will be a lot of duplicate patches between
> that "the rest" and the patches in the original series (e.g. I would
> imagine that the resulting hook.h would look more or less
> identical), but "git range-diff" may be able to trim it down by
> comparing between "the rest" and "c70bc338^..c70bc338^2" (aka
> ab/config-based-hooks-2).  I dunno.

I'm still happy to and planning to send a re-roll of this to try to
address outstanding comments/concerns, but am holding off for now
because it's not clear to me if you're already planning to discard any
such re-roll in favor of a revert.

Or do you mean to create a point release with such revert(s) and have
master free to move forward with a fix for the outstanding issue, but
not to use that for a point release?
Junio C Hamano May 26, 2022, 7:56 p.m. UTC | #8
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> I wonder if rebuilding from scratch is easier to review, then?  The
>> first three patches of such a series would be
>>
>>  - Revert cb3b3974 (Merge branch 'ab/racy-hooks', 2022-03-30)
>>  - Revert 7431379a (Merge branch 'ab/racy-hooks', 2022-03-16)
>>  - Revert c70bc338 (Merge branch 'ab/config-based-hooks-2', 2022-02-09)
>>
>> and then the rest would rebuild what used to be in the original
>> series on top.  There will be a lot of duplicate patches between
>> that "the rest" and the patches in the original series (e.g. I would
>> imagine that the resulting hook.h would look more or less
>> identical), but "git range-diff" may be able to trim it down by
>> comparing between "the rest" and "c70bc338^..c70bc338^2" (aka
>> ab/config-based-hooks-2).  I dunno.
>
> I'm still happy to and planning to send a re-roll of this to try to
> address outstanding comments/concerns, but am holding off for now
> because it's not clear to me if you're already planning to discard any
> such re-roll in favor of a revert.
>
> Or do you mean to create a point release with such revert(s) and have
> master free to move forward with a fix for the outstanding issue, but
> not to use that for a point release?

If a maintenance release will have reverts with adjustment, then the
solution that will be only merged to master should still be built on
top.  So if we were to go the route above, the early part (the first
three that are reverts above, and possibly a couple more directly on
top just to address "did we really run hook?") would be merged to the
maintenance track, while the whole thing that rebuilds on top of the
reverted one would be merged to 'master', I would imagine.

It all depends on how involved it is to get to where we want to be,
between

 (1) starting from 'master' and working backwards, removing the use of
     the run_parallel stuff and replacing it with the run_command API, or

 (2) bringing us back to pre-c70bc338 state first and then building
     up what we would have built if we didn't use run_parallel stuff
     in the original series.

As you were saying that what you would produce with the former
approach would be, compared to the initial "regress fix" that still
used the run_parallel stuff, a large and unreviewable mess, I was
throwing out a different approach as a potential alternative, with
the hope that the resulting series may make it reviewable, as long
as the early "straight revert" part is straight-forward.

If we take the "start from 'master' and fix minimally" approach, the
whole thing would be both in the maintenance track and in the track
for the next release, I would imagine.

So, in short, either way, we would not run hooks in parallel, and we
would not run hooks with run_parallel API castrated to a single
process usage only, in the version we will merge to the maintenance
track and also to the master track.  The latter may get an update to
re-attempt reusing run_parallel API in a way that is less hostile to
existing users, but I do not think we should make users wait by
spending more time on it than necessary right now, before we get the
regression fix ready.

Thanks.