From patchwork Sat Dec 5 01:46:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Emily Shaffer X-Patchwork-Id: 11952783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B723C433FE for ; Sat, 5 Dec 2020 01:47:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 512AC22DFB for ; Sat, 5 Dec 2020 01:47:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731142AbgLEBru (ORCPT ); Fri, 4 Dec 2020 20:47:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731130AbgLEBrt (ORCPT ); Fri, 4 Dec 2020 20:47:49 -0500 Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9ADF0C08E863 for ; Fri, 4 Dec 2020 17:46:41 -0800 (PST) Received: by mail-qk1-x74a.google.com with SMTP id x196so7058233qkb.12 for ; Fri, 04 Dec 2020 17:46:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=oD2rhCpADMStyPd20XMg0pSOsFarv5F7zUMU8LdXwn0=; b=koRT99b9+3RPqtyIZizgGBPVGl1G91zFtGCpg2ZR/uo7CvMWve25JuLREIEUTZEnZD 8DR0lbF9tAGOgrt7txgR16JfRjKkGnxBIB01r9uGlwSVaXxktQDuQYfUlPPxXk7XDLUs Uf/sHZNu11ZuObBAVQmLIvAkn9aI5n6mi0ARWzG42plr7A4CsNoOXjuii7kgYX+qyZkR GsGw6pnf8UoaxZ+8710fFXicHhfkAzGVjMkB+/JN9n/sgggdayw/5FQ1gjV12kf4D1WM t0gP7DI8E5THIHwtBkD175IuA1xtslPvdAaFoyeHF0Wqr5R0hC8k+mbUBuf3iZmBGeXY gXQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=oD2rhCpADMStyPd20XMg0pSOsFarv5F7zUMU8LdXwn0=; b=NyRPRfAQN4nELdle2ClBJ6s05q1oXbMAtPha5a9VFS3V/zyd/pVeWQ2rzdCxgj6pfk FMS74gE+SlcFPRceIxnzefUBZa3dtXCO8hmlXS1C7WluaX3LPNrCFwrRP3l40HNsIJ7h eDe3TxfPaVeH4O4c9pfIDlyNGzAelEhndrUX+Czl9MAsWLNWy+HBAeCHccvL0GrOJsyB 957iVv58aIoADydJIea46NgjZ4CZeyThry/jqJK0LW6rX/1u7fCJ4x5llwcAcYdWGmoB MlgQ5jwOq6ExfLbeOmep4MjPtJ5W2jS6l1qoXkl2uMfJrJ+uyuBJfUh43B3mBl1tE2m3 5YEQ== X-Gm-Message-State: AOAM530SweOj6WyG+6LjPBjaynrdMO1eE7RuJiEE2AJp8qSgmF9g+rFx joAHrwadb0KmORAJGw+dV8Fk/TAlkFMrhH8i4i5fAZE24+/rQSftFo+RToHf92ibQXbzGMRMjyn we9izTAS0jT82KNQvmTgUlQH6vuhOxCxDqblfNMLd7zyzIMOBycRyira62u+sjEIfaaWebftKxA == X-Google-Smtp-Source: ABdhPJwNkkG83i8/n5hFwlYui840TCvrUEKXh6jsLaUu2se/Ek0DM4rIW4A9bGwocf8r0x09pNR08hLYsZUZgY9cSHU= Sender: "emilyshaffer via sendgmr" X-Received: from podkayne.svl.corp.google.com ([2620:15c:2ce:0:1ea0:b8ff:fe77:f690]) (user=emilyshaffer job=sendgmr) by 2002:a0c:b799:: with SMTP id l25mr8952246qve.25.1607132800629; Fri, 04 Dec 2020 17:46:40 -0800 (PST) Date: Fri, 4 Dec 2020 17:46:04 -0800 In-Reply-To: <20201205014607.1464119-1-emilyshaffer@google.com> Message-Id: <20201205014607.1464119-15-emilyshaffer@google.com> Mime-Version: 1.0 References: <20201205014607.1464119-1-emilyshaffer@google.com> X-Mailer: git-send-email 2.28.0.226.g0268cb6820 Subject: [PATCH 14/17] run-command: add stdin callback for parallelization From: Emily Shaffer To: git@vger.kernel.org Cc: Emily Shaffer Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org If a user of the run_processes_parallel() API wants to pipe a large amount of information to stdin of each parallel command, that information could exceed the buffer of the pipe allocated for that process's stdin. Generally this is solved by repeatedly writing to child_process.in between calls to start_command() and finish_command(); run_processes_parallel() did not provide users an opportunity to access child_process at that time. Because the data might be extremely large (for example, a list of all refs received during a push from a client) simply taking a string_list or strbuf is not as scalable as using a callback; the rest of the run_processes_parallel() API also uses callbacks, so making this feature match the rest of the API reduces mental load on the user. Signed-off-by: Emily Shaffer --- Notes: Since run_processes_parallel() is used elsewhere, I'd appreciate a close look on this patch which modifies it. Thanks :) builtin/fetch.c | 1 + builtin/submodule--helper.c | 2 +- run-command.c | 54 +++++++++++++++++++++++++++++++++++-- run-command.h | 17 +++++++++++- submodule.c | 1 + t/helper/test-run-command.c | 31 ++++++++++++++++++--- t/t0061-run-command.sh | 30 +++++++++++++++++++++ 7 files changed, 128 insertions(+), 8 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index ecf8537605..5e153b5193 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1647,6 +1647,7 @@ static int fetch_multiple(struct string_list *list, int max_children) result = run_processes_parallel_tr2(max_children, &fetch_next_remote, &fetch_failed_to_start, + NULL, &fetch_finished, &state, "fetch", "parallel/fetch"); diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index c30896c897..bb623c1852 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -2294,7 +2294,7 @@ static int update_submodules(struct submodule_update_clone *suc) int i; run_processes_parallel_tr2(suc->max_jobs, update_clone_get_next_task, - update_clone_start_failure, + update_clone_start_failure, NULL, update_clone_task_finished, suc, "submodule", "parallel/update"); diff --git a/run-command.c b/run-command.c index 80c8c97bc1..7b65c087f8 100644 --- a/run-command.c +++ b/run-command.c @@ -1548,6 +1548,7 @@ struct parallel_processes { get_next_task_fn get_next_task; start_failure_fn start_failure; + feed_pipe_fn feed_pipe; task_finished_fn task_finished; struct { @@ -1575,6 +1576,13 @@ static int default_start_failure(struct strbuf *out, return 0; } +static int default_feed_pipe(struct strbuf *pipe, + void *pp_cb, + void *pp_task_cb) +{ + return 1; +} + static int default_task_finished(int result, struct strbuf *out, void *pp_cb, @@ -1605,6 +1613,7 @@ static void pp_init(struct parallel_processes *pp, int n, get_next_task_fn get_next_task, start_failure_fn start_failure, + feed_pipe_fn feed_pipe, task_finished_fn task_finished, void *data) { @@ -1623,6 +1632,7 @@ static void pp_init(struct parallel_processes *pp, pp->get_next_task = get_next_task; pp->start_failure = start_failure ? start_failure : default_start_failure; + pp->feed_pipe = feed_pipe ? feed_pipe : default_feed_pipe; pp->task_finished = task_finished ? task_finished : default_task_finished; pp->nr_processes = 0; @@ -1715,6 +1725,37 @@ static int pp_start_one(struct parallel_processes *pp) return 0; } +static void pp_buffer_stdin(struct parallel_processes *pp) +{ + int i; + struct strbuf sb = STRBUF_INIT; + + /* Buffer stdin for each pipe. */ + for (i = 0; i < pp->max_processes; i++) { + if (pp->children[i].state == GIT_CP_WORKING && + pp->children[i].process.in > 0) { + int done; + strbuf_reset(&sb); + done = pp->feed_pipe(&sb, pp->data, + pp->children[i].data); + if (sb.len) { + if (write_in_full(pp->children[i].process.in, + sb.buf, sb.len) < 0) { + if (errno != EPIPE) + die_errno("write"); + done = 1; + } + } + if (done) { + close(pp->children[i].process.in); + pp->children[i].process.in = 0; + } + } + } + + strbuf_release(&sb); +} + static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout) { int i; @@ -1779,6 +1820,7 @@ static int pp_collect_finished(struct parallel_processes *pp) pp->nr_processes--; pp->children[i].state = GIT_CP_FREE; pp->pfd[i].fd = -1; + pp->children[i].process.in = 0; child_process_init(&pp->children[i].process); if (i != pp->output_owner) { @@ -1812,6 +1854,7 @@ static int pp_collect_finished(struct parallel_processes *pp) int run_processes_parallel(int n, get_next_task_fn get_next_task, start_failure_fn start_failure, + feed_pipe_fn feed_pipe, task_finished_fn task_finished, void *pp_cb) { @@ -1820,7 +1863,9 @@ int run_processes_parallel(int n, int spawn_cap = 4; struct parallel_processes pp; - pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb); + sigchain_push(SIGPIPE, SIG_IGN); + + pp_init(&pp, n, get_next_task, start_failure, feed_pipe, task_finished, pp_cb); while (1) { for (i = 0; i < spawn_cap && !pp.shutdown && @@ -1837,6 +1882,7 @@ int run_processes_parallel(int n, } if (!pp.nr_processes) break; + pp_buffer_stdin(&pp); pp_buffer_stderr(&pp, output_timeout); pp_output(&pp); code = pp_collect_finished(&pp); @@ -1848,11 +1894,15 @@ int run_processes_parallel(int n, } pp_cleanup(&pp); + + sigchain_pop(SIGPIPE); + return 0; } int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task, start_failure_fn start_failure, + feed_pipe_fn feed_pipe, task_finished_fn task_finished, void *pp_cb, const char *tr2_category, const char *tr2_label) { @@ -1862,7 +1912,7 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task, ((n < 1) ? online_cpus() : n)); result = run_processes_parallel(n, get_next_task, start_failure, - task_finished, pp_cb); + feed_pipe, task_finished, pp_cb); trace2_region_leave(tr2_category, tr2_label, NULL); diff --git a/run-command.h b/run-command.h index 6472b38bde..e058c0e2c8 100644 --- a/run-command.h +++ b/run-command.h @@ -436,6 +436,20 @@ typedef int (*start_failure_fn)(struct strbuf *out, void *pp_cb, void *pp_task_cb); +/** + * This callback is called repeatedly on every child process who requests + * start_command() to create a pipe by setting child_process.in < 0. + * + * pp_cb is the callback cookie as passed into run_processes_parallel, and + * pp_task_cb is the callback cookie as passed into get_next_task_fn. + * The contents of 'send' will be read into the pipe and passed to the pipe. + * + * Return nonzero to close the pipe. + */ +typedef int (*feed_pipe_fn)(struct strbuf *pipe, + void *pp_cb, + void *pp_task_cb); + /** * This callback is called on every child process that finished processing. * @@ -470,10 +484,11 @@ typedef int (*task_finished_fn)(int result, int run_processes_parallel(int n, get_next_task_fn, start_failure_fn, + feed_pipe_fn, task_finished_fn, void *pp_cb); int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn, - task_finished_fn, void *pp_cb, + feed_pipe_fn, task_finished_fn, void *pp_cb, const char *tr2_category, const char *tr2_label); #endif diff --git a/submodule.c b/submodule.c index b3bb59f066..953f41818c 100644 --- a/submodule.c +++ b/submodule.c @@ -1638,6 +1638,7 @@ int fetch_populated_submodules(struct repository *r, run_processes_parallel_tr2(max_parallel_jobs, get_next_submodule, fetch_start_failure, + NULL, fetch_finish, &spf, "submodule", "parallel/fetch"); diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index 7ae03dc712..9348184d30 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -32,8 +32,13 @@ static int parallel_next(struct child_process *cp, return 0; strvec_pushv(&cp->args, d->argv); + cp->in = d->in; + cp->no_stdin = d->no_stdin; strbuf_addstr(err, "preloaded output of a child\n"); number_callbacks++; + + *task_cb = xmalloc(sizeof(int)); + *(int*)(*task_cb) = 2; return 1; } @@ -55,6 +60,17 @@ static int task_finished(int result, return 1; } +static int test_stdin(struct strbuf *pipe, void *cb, void *task_cb) +{ + int *lines_remaining = task_cb; + + if (*lines_remaining) + strbuf_addf(pipe, "sample stdin %d\n", --(*lines_remaining)); + + return !(*lines_remaining); +} + + struct testsuite { struct string_list tests, failed; int next; @@ -185,7 +201,7 @@ static int testsuite(int argc, const char **argv) suite.tests.nr, max_jobs); ret = run_processes_parallel(max_jobs, next_test, test_failed, - test_finished, &suite); + test_stdin, test_finished, &suite); if (suite.failed.nr > 0) { ret = 1; @@ -413,15 +429,22 @@ int cmd__run_command(int argc, const char **argv) if (!strcmp(argv[1], "run-command-parallel")) exit(run_processes_parallel(jobs, parallel_next, - NULL, NULL, &proc)); + NULL, NULL, NULL, &proc)); if (!strcmp(argv[1], "run-command-abort")) exit(run_processes_parallel(jobs, parallel_next, - NULL, task_finished, &proc)); + NULL, NULL, task_finished, &proc)); if (!strcmp(argv[1], "run-command-no-jobs")) exit(run_processes_parallel(jobs, no_job, - NULL, task_finished, &proc)); + NULL, NULL, task_finished, &proc)); + + if (!strcmp(argv[1], "run-command-stdin")) { + proc.in = -1; + proc.no_stdin = 0; + exit (run_processes_parallel(jobs, parallel_next, NULL, + test_stdin, NULL, &proc)); + } fprintf(stderr, "check usage\n"); return 1; diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index 7d599675e3..3eb572e6cd 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -143,6 +143,36 @@ test_expect_success 'run_command runs in parallel with more tasks than jobs avai test_cmp expect actual ' +cat >expect <<-EOF +preloaded output of a child +listening for stdin: +sample stdin 1 +sample stdin 0 +preloaded output of a child +listening for stdin: +sample stdin 1 +sample stdin 0 +preloaded output of a child +listening for stdin: +sample stdin 1 +sample stdin 0 +preloaded output of a child +listening for stdin: +sample stdin 1 +sample stdin 0 +EOF + +test_expect_success 'run_command listens to stdin' ' + write_script stdin-script <<-\EOF && + echo "listening for stdin:" + while read line; do + echo "$line" + done actual && + test_cmp expect actual +' + cat >expect <<-EOF preloaded output of a child asking for a quick stop