diff mbox series

[v5,24/36] run-command: add stdin callback for parallelization

Message ID patch-v5-24.36-bb119fa7cc0-20210902T125110Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series Run hooks via "git run hook" & hook library | expand

Commit Message

Ævar Arnfjörð Bjarmason Sept. 2, 2021, 1:11 p.m. UTC
From: Emily Shaffer <emilyshaffer@google.com>

If a user of the run_processes_parallel() API wants to pipe a large
amount of information to stdin of each parallel command, that
information could exceed the buffer of the pipe allocated for that
process's stdin.  Generally this is solved by repeatedly writing to
child_process.in between calls to start_command() and finish_command();
run_processes_parallel() did not provide users an opportunity to access
child_process at that time.

Because the data might be extremely large (for example, a list of all
refs received during a push from a client) simply taking a string_list
or strbuf is not as scalable as using a callback; the rest of the
run_processes_parallel() API also uses callbacks, so making this feature
match the rest of the API reduces mental load on the user.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fetch.c             |  1 +
 builtin/submodule--helper.c |  2 +-
 hook.c                      |  1 +
 run-command.c               | 54 +++++++++++++++++++++++++++++++++++--
 run-command.h               | 17 +++++++++++-
 submodule.c                 |  1 +
 t/helper/test-run-command.c | 31 ++++++++++++++++++---
 t/t0061-run-command.sh      | 30 +++++++++++++++++++++
 8 files changed, 129 insertions(+), 8 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Oct. 6, 2021, 11:03 a.m. UTC | #1
On Thu, Sep 02 2021, Ævar Arnfjörð Bjarmason wrote:

Emily, there's a...:

> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
> index 7ae03dc7123..9348184d303 100644
> --- a/t/helper/test-run-command.c
> +++ b/t/helper/test-run-command.c
> @@ -32,8 +32,13 @@ static int parallel_next(struct child_process *cp,
>  		return 0;
>  
>  	strvec_pushv(&cp->args, d->argv);
> +	cp->in = d->in;
> +	cp->no_stdin = d->no_stdin;
>  	strbuf_addstr(err, "preloaded output of a child\n");
>  	number_callbacks++;
> +
> +	*task_cb = xmalloc(sizeof(int));
> +	*(int*)(*task_cb) = 2;
>  	return 1;
>  }

Probably trivial to solve failure here in t0061-run-command.sh if you
compile with SANITIZE=leak. This failed in combination with my[1] (but
for anyone reading along, this patch has been ejected from "seen" a
while ago).

More generally: The equivalent of 01-07/36 of this series is being
merged into "next". As described in a plan to submit this topic
incrementally I was hoping to do 08-20/36 next, i.e. up to "run-command:
remove old run_hook_{le,ve}() hook API". See [2] for that plan.

You've been inactive on-list recently, it would be nice to time this so
that by the time it gets to 21-36/36 (which I was planning to split in
two per [2]) that you'd have time to review/help with outstanding issues
etc, for eventually re-submitting your "config based hooks" on top once
this all lands.

1. https://lore.kernel.org/git/patch-02.10-9a8804e1d9a-20211006T094705Z-avarab@gmail.com/
2. https://lore.kernel.org/git/875yut8nns.fsf@evledraar.gmail.com/
Ævar Arnfjörð Bjarmason Oct. 12, 2021, 12:59 p.m. UTC | #2
On Wed, Oct 06 2021, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Sep 02 2021, Ævar Arnfjörð Bjarmason wrote:
>
> Emily, there's a...:
>
>> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
>> index 7ae03dc7123..9348184d303 100644
>> --- a/t/helper/test-run-command.c
>> +++ b/t/helper/test-run-command.c
>> @@ -32,8 +32,13 @@ static int parallel_next(struct child_process *cp,
>>  		return 0;
>>  
>>  	strvec_pushv(&cp->args, d->argv);
>> +	cp->in = d->in;
>> +	cp->no_stdin = d->no_stdin;
>>  	strbuf_addstr(err, "preloaded output of a child\n");
>>  	number_callbacks++;
>> +
>> +	*task_cb = xmalloc(sizeof(int));
>> +	*(int*)(*task_cb) = 2;
>>  	return 1;
>>  }
>
> Probably trivial to solve failure here in t0061-run-command.sh if you
> compile with SANITIZE=leak. This failed in combination with my[1] (but
> for anyone reading along, this patch has been ejected from "seen" a
> while ago).
>
> More generally: The equivalent of 01-07/36 of this series is being
> merged into "next". As described in a plan to submit this topic
> incrementally I was hoping to do 08-20/36 next, i.e. up to "run-command:
> remove old run_hook_{le,ve}() hook API". See [2] for that plan.
>
> You've been inactive on-list recently, it would be nice to time this so
> that by the time it gets to 21-36/36 (which I was planning to split in
> two per [2]) that you'd have time to review/help with outstanding issues
> etc, for eventually re-submitting your "config based hooks" on top once
> this all lands.
>
> 1. https://lore.kernel.org/git/patch-02.10-9a8804e1d9a-20211006T094705Z-avarab@gmail.com/
> 2. https://lore.kernel.org/git/875yut8nns.fsf@evledraar.gmail.com/

Since I had a reason to look at this again, this fixes it. I've squashed
it into my "base" branch, but it won't be in the next batch I submit (I
have the cut-off point before this commit):

diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
index fa25bcbbc0d..e9b4214d163 100644
--- a/t/helper/test-run-command.c
+++ b/t/helper/test-run-command.c
@@ -61,11 +61,21 @@ static void test_consume_sideband(struct strbuf *output, void *cb)
 	fclose(sideband);
 }
 
+static int task_free(int result,
+		     struct strbuf *err,
+		     void *pp_cb,
+		     void *pp_task_cb)
+{
+	free(pp_task_cb);
+	return 0;
+}
+
 static int task_finished(int result,
 			 struct strbuf *err,
 			 void *pp_cb,
 			 void *pp_task_cb)
 {
+	task_free(0, NULL, NULL, pp_task_cb);
 	strbuf_addstr(err, "asking for a quick stop\n");
 	return 1;
 }
@@ -438,7 +448,7 @@ int cmd__run_command(int argc, const char **argv)
 
 	if (!strcmp(argv[1], "run-command-parallel"))
 		exit(run_processes_parallel(jobs, parallel_next,
-					    NULL, NULL, NULL, NULL, &proc));
+					    NULL, NULL, NULL, task_free, &proc));
 
 	if (!strcmp(argv[1], "run-command-abort"))
 		exit(run_processes_parallel(jobs, parallel_next,
@@ -452,12 +462,12 @@ int cmd__run_command(int argc, const char **argv)
 		proc.in = -1;
 		proc.no_stdin = 0;
 		exit (run_processes_parallel(jobs, parallel_next, NULL,
-					     test_stdin, NULL, NULL, &proc));
+					     test_stdin, NULL, task_free, &proc));
 	}
 
 	if (!strcmp(argv[1], "run-command-sideband"))
 		exit(run_processes_parallel(jobs, parallel_next, NULL, NULL,
-					    test_consume_sideband, NULL,
+					    test_consume_sideband, task_free,
 					    &proc));
 
 	fprintf(stderr, "check usage\n");
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e064687dbdc..b18ada08842 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1819,6 +1819,7 @@  static int fetch_multiple(struct string_list *list, int max_children)
 		result = run_processes_parallel_tr2(max_children,
 						    &fetch_next_remote,
 						    &fetch_failed_to_start,
+						    NULL,
 						    &fetch_finished,
 						    &state,
 						    "fetch", "parallel/fetch");
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ef2776a9e45..0b73739c160 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2311,7 +2311,7 @@  static int update_submodules(struct submodule_update_clone *suc)
 	int i;
 
 	run_processes_parallel_tr2(suc->max_jobs, update_clone_get_next_task,
-				   update_clone_start_failure,
+				   update_clone_start_failure, NULL,
 				   update_clone_task_finished, suc, "submodule",
 				   "parallel/update");
 
diff --git a/hook.c b/hook.c
index d156b0dc800..45983d08aed 100644
--- a/hook.c
+++ b/hook.c
@@ -146,6 +146,7 @@  int run_hooks(const char *hook_name, const char *hook_path,
 	run_processes_parallel_tr2(jobs,
 				   pick_next_hook,
 				   notify_start_failure,
+				   NULL,
 				   notify_hook_finished,
 				   &cb_data,
 				   "hook",
diff --git a/run-command.c b/run-command.c
index 482ee2d76c6..f1616858d18 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1492,6 +1492,7 @@  struct parallel_processes {
 
 	get_next_task_fn get_next_task;
 	start_failure_fn start_failure;
+	feed_pipe_fn feed_pipe;
 	task_finished_fn task_finished;
 
 	struct {
@@ -1519,6 +1520,13 @@  static int default_start_failure(struct strbuf *out,
 	return 0;
 }
 
+static int default_feed_pipe(struct strbuf *pipe,
+			     void *pp_cb,
+			     void *pp_task_cb)
+{
+	return 1;
+}
+
 static int default_task_finished(int result,
 				 struct strbuf *out,
 				 void *pp_cb,
@@ -1549,6 +1557,7 @@  static void pp_init(struct parallel_processes *pp,
 		    int n,
 		    get_next_task_fn get_next_task,
 		    start_failure_fn start_failure,
+		    feed_pipe_fn feed_pipe,
 		    task_finished_fn task_finished,
 		    void *data)
 {
@@ -1567,6 +1576,7 @@  static void pp_init(struct parallel_processes *pp,
 	pp->get_next_task = get_next_task;
 
 	pp->start_failure = start_failure ? start_failure : default_start_failure;
+	pp->feed_pipe = feed_pipe ? feed_pipe : default_feed_pipe;
 	pp->task_finished = task_finished ? task_finished : default_task_finished;
 
 	pp->nr_processes = 0;
@@ -1664,6 +1674,37 @@  static int pp_start_one(struct parallel_processes *pp)
 	return 0;
 }
 
+static void pp_buffer_stdin(struct parallel_processes *pp)
+{
+	int i;
+	struct strbuf sb = STRBUF_INIT;
+
+	/* Buffer stdin for each pipe. */
+	for (i = 0; i < pp->max_processes; i++) {
+		if (pp->children[i].state == GIT_CP_WORKING &&
+		    pp->children[i].process.in > 0) {
+			int done;
+			strbuf_reset(&sb);
+			done = pp->feed_pipe(&sb, pp->data,
+					      pp->children[i].data);
+			if (sb.len) {
+				if (write_in_full(pp->children[i].process.in,
+					      sb.buf, sb.len) < 0) {
+					if (errno != EPIPE)
+						die_errno("write");
+					done = 1;
+				}
+			}
+			if (done) {
+				close(pp->children[i].process.in);
+				pp->children[i].process.in = 0;
+			}
+		}
+	}
+
+	strbuf_release(&sb);
+}
+
 static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
 {
 	int i;
@@ -1728,6 +1769,7 @@  static int pp_collect_finished(struct parallel_processes *pp)
 		pp->nr_processes--;
 		pp->children[i].state = GIT_CP_FREE;
 		pp->pfd[i].fd = -1;
+		pp->children[i].process.in = 0;
 		child_process_init(&pp->children[i].process);
 
 		if (i != pp->output_owner) {
@@ -1761,6 +1803,7 @@  static int pp_collect_finished(struct parallel_processes *pp)
 int run_processes_parallel(int n,
 			   get_next_task_fn get_next_task,
 			   start_failure_fn start_failure,
+			   feed_pipe_fn feed_pipe,
 			   task_finished_fn task_finished,
 			   void *pp_cb)
 {
@@ -1769,7 +1812,9 @@  int run_processes_parallel(int n,
 	int spawn_cap = 4;
 	struct parallel_processes pp;
 
-	pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb);
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	pp_init(&pp, n, get_next_task, start_failure, feed_pipe, task_finished, pp_cb);
 	while (1) {
 		for (i = 0;
 		    i < spawn_cap && !pp.shutdown &&
@@ -1786,6 +1831,7 @@  int run_processes_parallel(int n,
 		}
 		if (!pp.nr_processes)
 			break;
+		pp_buffer_stdin(&pp);
 		pp_buffer_stderr(&pp, output_timeout);
 		pp_output(&pp);
 		code = pp_collect_finished(&pp);
@@ -1797,11 +1843,15 @@  int run_processes_parallel(int n,
 	}
 
 	pp_cleanup(&pp);
+
+	sigchain_pop(SIGPIPE);
+
 	return 0;
 }
 
 int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 			       start_failure_fn start_failure,
+			       feed_pipe_fn feed_pipe,
 			       task_finished_fn task_finished, void *pp_cb,
 			       const char *tr2_category, const char *tr2_label)
 {
@@ -1811,7 +1861,7 @@  int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task,
 				   ((n < 1) ? online_cpus() : n));
 
 	result = run_processes_parallel(n, get_next_task, start_failure,
-					task_finished, pp_cb);
+					feed_pipe, task_finished, pp_cb);
 
 	trace2_region_leave(tr2_category, tr2_label, NULL);
 
diff --git a/run-command.h b/run-command.h
index cfb6887e4ae..80d394664ae 100644
--- a/run-command.h
+++ b/run-command.h
@@ -422,6 +422,20 @@  typedef int (*start_failure_fn)(struct strbuf *out,
 				void *pp_cb,
 				void *pp_task_cb);
 
+/**
+ * This callback is called repeatedly on every child process who requests
+ * start_command() to create a pipe by setting child_process.in < 0.
+ *
+ * pp_cb is the callback cookie as passed into run_processes_parallel, and
+ * pp_task_cb is the callback cookie as passed into get_next_task_fn.
+ * The contents of 'send' will be read into the pipe and passed to the pipe.
+ *
+ * Return nonzero to close the pipe.
+ */
+typedef int (*feed_pipe_fn)(struct strbuf *pipe,
+			    void *pp_cb,
+			    void *pp_task_cb);
+
 /**
  * This callback is called on every child process that finished processing.
  *
@@ -456,10 +470,11 @@  typedef int (*task_finished_fn)(int result,
 int run_processes_parallel(int n,
 			   get_next_task_fn,
 			   start_failure_fn,
+			   feed_pipe_fn,
 			   task_finished_fn,
 			   void *pp_cb);
 int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn,
-			       task_finished_fn, void *pp_cb,
+			       feed_pipe_fn, task_finished_fn, void *pp_cb,
 			       const char *tr2_category, const char *tr2_label);
 
 /**
diff --git a/submodule.c b/submodule.c
index 8e611fe1dbf..db1700a502d 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1632,6 +1632,7 @@  int fetch_populated_submodules(struct repository *r,
 	run_processes_parallel_tr2(max_parallel_jobs,
 				   get_next_submodule,
 				   fetch_start_failure,
+				   NULL,
 				   fetch_finish,
 				   &spf,
 				   "submodule", "parallel/fetch");
diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
index 7ae03dc7123..9348184d303 100644
--- a/t/helper/test-run-command.c
+++ b/t/helper/test-run-command.c
@@ -32,8 +32,13 @@  static int parallel_next(struct child_process *cp,
 		return 0;
 
 	strvec_pushv(&cp->args, d->argv);
+	cp->in = d->in;
+	cp->no_stdin = d->no_stdin;
 	strbuf_addstr(err, "preloaded output of a child\n");
 	number_callbacks++;
+
+	*task_cb = xmalloc(sizeof(int));
+	*(int*)(*task_cb) = 2;
 	return 1;
 }
 
@@ -55,6 +60,17 @@  static int task_finished(int result,
 	return 1;
 }
 
+static int test_stdin(struct strbuf *pipe, void *cb, void *task_cb)
+{
+	int *lines_remaining = task_cb;
+
+	if (*lines_remaining)
+		strbuf_addf(pipe, "sample stdin %d\n", --(*lines_remaining));
+
+	return !(*lines_remaining);
+}
+
+
 struct testsuite {
 	struct string_list tests, failed;
 	int next;
@@ -185,7 +201,7 @@  static int testsuite(int argc, const char **argv)
 		suite.tests.nr, max_jobs);
 
 	ret = run_processes_parallel(max_jobs, next_test, test_failed,
-				     test_finished, &suite);
+				     test_stdin, test_finished, &suite);
 
 	if (suite.failed.nr > 0) {
 		ret = 1;
@@ -413,15 +429,22 @@  int cmd__run_command(int argc, const char **argv)
 
 	if (!strcmp(argv[1], "run-command-parallel"))
 		exit(run_processes_parallel(jobs, parallel_next,
-					    NULL, NULL, &proc));
+					    NULL, NULL, NULL, &proc));
 
 	if (!strcmp(argv[1], "run-command-abort"))
 		exit(run_processes_parallel(jobs, parallel_next,
-					    NULL, task_finished, &proc));
+					    NULL, NULL, task_finished, &proc));
 
 	if (!strcmp(argv[1], "run-command-no-jobs"))
 		exit(run_processes_parallel(jobs, no_job,
-					    NULL, task_finished, &proc));
+					    NULL, NULL, task_finished, &proc));
+
+	if (!strcmp(argv[1], "run-command-stdin")) {
+		proc.in = -1;
+		proc.no_stdin = 0;
+		exit (run_processes_parallel(jobs, parallel_next, NULL,
+					     test_stdin, NULL, &proc));
+	}
 
 	fprintf(stderr, "check usage\n");
 	return 1;
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 7d599675e35..87759482ad1 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -143,6 +143,36 @@  test_expect_success 'run_command runs in parallel with more tasks than jobs avai
 	test_cmp expect actual
 '
 
+cat >expect <<-EOF
+preloaded output of a child
+listening for stdin:
+sample stdin 1
+sample stdin 0
+preloaded output of a child
+listening for stdin:
+sample stdin 1
+sample stdin 0
+preloaded output of a child
+listening for stdin:
+sample stdin 1
+sample stdin 0
+preloaded output of a child
+listening for stdin:
+sample stdin 1
+sample stdin 0
+EOF
+
+test_expect_success 'run_command listens to stdin' '
+	write_script stdin-script <<-\EOF &&
+	echo "listening for stdin:"
+	while read line; do
+		echo "$line"
+	done
+	EOF
+	test-tool run-command run-command-stdin 2 ./stdin-script 2>actual &&
+	test_cmp expect actual
+'
+
 cat >expect <<-EOF
 preloaded output of a child
 asking for a quick stop