diff mbox series

[v4,3/8] for-each-repo: run subcommands on configured repos

Message ID dd9237927395ef69663ab376a2da74da4654c495.1602782524.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Maintenance III: Background maintenance | expand

Commit Message

Derrick Stolee Oct. 15, 2020, 5:21 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                          |  1 +
 Documentation/git-for-each-repo.txt | 59 +++++++++++++++++++++++++++++
 Makefile                            |  1 +
 builtin.h                           |  1 +
 builtin/for-each-repo.c             | 58 ++++++++++++++++++++++++++++
 command-list.txt                    |  1 +
 git.c                               |  1 +
 t/t0068-for-each-repo.sh            | 30 +++++++++++++++
 8 files changed, 152 insertions(+)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100755 t/t0068-for-each-repo.sh

Comments

Andrzej Hunt May 3, 2021, 4:10 p.m. UTC | #1
On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
[... snip ...]
> diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
> new file mode 100644
> index 0000000000..5bba623ff1
> --- /dev/null
> +++ b/builtin/for-each-repo.c
> @@ -0,0 +1,58 @@
> +#include "cache.h"
> +#include "config.h"
> +#include "builtin.h"
> +#include "parse-options.h"
> +#include "run-command.h"
> +#include "string-list.h"
> +
> +static const char * const for_each_repo_usage[] = {
> +	N_("git for-each-repo --config=<config> <command-args>"),
> +	NULL
> +};
> +
> +static int run_command_on_repo(const char *path,
> +			       void *cbdata)
> +{
> +	int i;
> +	struct child_process child = CHILD_PROCESS_INIT;
> +	struct strvec *args = (struct strvec *)cbdata;

I was curious there's a strong reason for declaring args as void * 
followed by this cast? The most obvious answer seems to be that this 
probably evolved from a callback - and we could simplify it now?
> +
> +	child.git_cmd = 1;
> +	strvec_pushl(&child.args, "-C", path, NULL);
> +
> +	for (i = 0; i < args->nr; i++)
> +		strvec_push(&child.args, args->v[i]);
So here we're copying all of args - and I don't see any way of avoiding 
it since we're adding it to child's arg list.

> +
> +	return run_command(&child);
> +}
> +
> +int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
> +{
> +	static const char *config_key = NULL;
> +	int i, result = 0;
> +	const struct string_list *values;
> +	struct strvec args = STRVEC_INIT;
> +
> +	const struct option options[] = {
> +		OPT_STRING(0, "config", &config_key, N_("config"),
> +			   N_("config key storing a list of repository paths")),
> +		OPT_END()
> +	};
> +
> +	argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
> +			     PARSE_OPT_STOP_AT_NON_OPTION);
> +
> +	if (!config_key)
> +		die(_("missing --config=<config>"));
> +
> +	for (i = 0; i < argc; i++)
> +		strvec_push(&args, argv[i]);

But why do we have to copy all of argv 1:1 into args here, only to later 
pass it to run_command_on_repo() which, as described above, copies the 
entire input again? I suspect this was done to comply with 
run_command_on_repo()'s API (which takes strvec) - does that seem 
plausible, or did I miss something?

Which brings me to the real reason for my questions: I noticed we "leak" 
args (this leak is of no significance since it happens in cmd_*, but 
LSAN still complains, and I'm trying to get tests running leak-free). My 
initial inclination was to strvec_clear() or UNLEAK() args - but if we 
can avoid creating args in the first place we also wouldn't need to 
clear it later.

My current proposal is therefore to completely remove args and pass 
argc+argv into run_command_on_repo() - but I wanted to be sure that I 
didn't miss some important reason to stick with the current approach.

> +
> +	values = repo_config_get_value_multi(the_repository,
> +					     config_key);
> +
> +	for (i = 0; !result && i < values->nr; i++)
> +		result = run_command_on_repo(values->items[i].string, &args);
> +
> +	return result;
> +}
[... snip ...]

(I hope this doesn't come across as useless necroposting - I figured it 
would be easier to clarify these questions on the original thread as 
opposed to potentially discussing it as part of my next leak-fixing 
series :) .)

ATB,

   Andrzej
Eric Sunshine May 3, 2021, 5:01 p.m. UTC | #2
I'm not Stolee and hadn't even looked at this code before, but I'll
jump in with a couple comments...

On Mon, May 3, 2021 at 12:11 PM Andrzej Hunt <andrzej@ahunt.org> wrote:
> On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote:
> > +static int run_command_on_repo(const char *path,
> > +                            void *cbdata)
> > +{
> > +     int i;
> > +     struct child_process child = CHILD_PROCESS_INIT;
> > +     struct strvec *args = (struct strvec *)cbdata;
>
> I was curious there's a strong reason for declaring args as void *
> followed by this cast? The most obvious answer seems to be that this
> probably evolved from a callback - and we could simplify it now?

Agreed, the `void*` cbdata doesn't make sense here.

> > +     strvec_pushl(&child.args, "-C", path, NULL);
> > +
> > +     for (i = 0; i < args->nr; i++)
> > +             strvec_push(&child.args, args->v[i]);
>
> So here we're copying all of args - and I don't see any way of avoiding
> it since we're adding it to child's arg list.

... (dot dot dot)

> > +int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
> > +{
> > +     struct strvec args = STRVEC_INIT;
> > +     for (i = 0; i < argc; i++)
> > +             strvec_push(&args, argv[i]);
>
> But why do we have to copy all of argv 1:1 into args here, only to later
> pass it to run_command_on_repo() which, as described above, copies the
> entire input again? I suspect this was done to comply with
> run_command_on_repo()'s API (which takes strvec) - does that seem
> plausible, or did I miss something?
>
> Which brings me to the real reason for my questions: I noticed we "leak"
> args (this leak is of no significance since it happens in cmd_*, but
> LSAN still complains, and I'm trying to get tests running leak-free). My
> initial inclination was to strvec_clear() or UNLEAK() args - but if we
> can avoid creating args in the first place we also wouldn't need to
> clear it later.
>
> My current proposal is therefore to completely remove args and pass
> argc+argv into run_command_on_repo() - but I wanted to be sure that I
> didn't miss some important reason to stick with the current approach.

An alternative to all this copying would be to take advantage of
child_process.argv which is owned by the caller, thus does not get
freed automatically by run_command(). This would allow you to re-use
the same argument vector for all calls. And you don't need
run_command_on_repo() at all. Something like this in
cmd_for_each_repo(), untested and typed directly in email:

    struct child_process child = CHILD_PROCESS_INIT;

    for (i = 0; !result && i < values->nr; i++) {
        const char *d = chdir(values->items[i].string);
        if (chdir(d))
            die_errno(_("cannot chdir to '%s'"), d);
        child.git_cmd = 1;
        child.argv = argv;
        result = run_command(&child);
    }

This assumes that argv[] is correctly NULL-terminated after
parse_options() -- I didn't check, but expect it to be so. If not,
it's easy enough to copy argv[] into `args` once and then
strvec_clear(&args) at the end of the function.

The one downside is that trace output wouldn't be as helpful (I think)
because you wouldn't see an explicit "-C <dir>", but I suppose the
tracing machinery can be invoked manually to address this.
Eric Sunshine May 3, 2021, 7:26 p.m. UTC | #3
On Mon, May 3, 2021 at 1:01 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>     for (i = 0; !result && i < values->nr; i++) {
>         const char *d = chdir(values->items[i].string);
>         if (chdir(d))
>             die_errno(_("cannot chdir to '%s'"), d);
>         child.git_cmd = 1;
>         child.argv = argv;
>         result = run_command(&child);
>     }

Without the copy/paste error, the declaration of `d` would, of course, be:

    const char *d = values->items[i].string;
Derrick Stolee May 3, 2021, 7:43 p.m. UTC | #4
On 5/3/2021 12:10 PM, Andrzej Hunt wrote:
> 
> On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>> +static int run_command_on_repo(const char *path,
>> +                   void *cbdata)
>> +{
>> +    int i;
>> +    struct child_process child = CHILD_PROCESS_INIT;
>> +    struct strvec *args = (struct strvec *)cbdata;
> 
> I was curious there's a strong reason for declaring args as void * followed by this cast? The most obvious answer seems to be that this probably evolved from a callback - and we could simplify it now?

You are absolutely right that this evolved from a
callback. I look forward to reviewing your patch that
updates this. ;)

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index a5808fa30d..5eb2a2be71 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,6 +67,7 @@ 
 /git-filter-branch
 /git-fmt-merge-msg
 /git-for-each-ref
+/git-for-each-repo
 /git-format-patch
 /git-fsck
 /git-fsck-objects
diff --git a/Documentation/git-for-each-repo.txt b/Documentation/git-for-each-repo.txt
new file mode 100644
index 0000000000..94bd19da26
--- /dev/null
+++ b/Documentation/git-for-each-repo.txt
@@ -0,0 +1,59 @@ 
+git-for-each-repo(1)
+====================
+
+NAME
+----
+git-for-each-repo - Run a Git command on a list of repositories
+
+
+SYNOPSIS
+--------
+[verse]
+'git for-each-repo' --config=<config> [--] <arguments>
+
+
+DESCRIPTION
+-----------
+Run a Git command on a list of repositories. The arguments after the
+known options or `--` indicator are used as the arguments for the Git
+subprocess.
+
+THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.
+
+For example, we could run maintenance on each of a list of repositories
+stored in a `maintenance.repo` config variable using
+
+-------------
+git for-each-repo --config=maintenance.repo maintenance run
+-------------
+
+This will run `git -C <repo> maintenance run` for each value `<repo>`
+in the multi-valued config variable `maintenance.repo`.
+
+
+OPTIONS
+-------
+--config=<config>::
+	Use the given config variable as a multi-valued list storing
+	absolute path names. Iterate on that list of paths to run
+	the given arguments.
++
+These config values are loaded from system, global, and local Git config,
+as available. If `git for-each-repo` is run in a directory that is not a
+Git repository, then only the system and global config is used.
+
+
+SUBPROCESS BEHAVIOR
+-------------------
+
+If any `git -C <repo> <arguments>` subprocess returns a non-zero exit code,
+then the `git for-each-repo` process returns that exit code without running
+more subprocesses.
+
+Each `git -C <repo> <arguments>` subprocess inherits the standard file
+descriptors `stdin`, `stdout`, and `stderr`.
+
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 65f8cfb236..7c588ff036 100644
--- a/Makefile
+++ b/Makefile
@@ -1071,6 +1071,7 @@  BUILTIN_OBJS += builtin/fetch-pack.o
 BUILTIN_OBJS += builtin/fetch.o
 BUILTIN_OBJS += builtin/fmt-merge-msg.o
 BUILTIN_OBJS += builtin/for-each-ref.o
+BUILTIN_OBJS += builtin/for-each-repo.o
 BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
diff --git a/builtin.h b/builtin.h
index 17c1c0ce49..ff7c6e5aa9 100644
--- a/builtin.h
+++ b/builtin.h
@@ -150,6 +150,7 @@  int cmd_fetch(int argc, const char **argv, const char *prefix);
 int cmd_fetch_pack(int argc, const char **argv, const char *prefix);
 int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix);
 int cmd_for_each_ref(int argc, const char **argv, const char *prefix);
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix);
 int cmd_format_patch(int argc, const char **argv, const char *prefix);
 int cmd_fsck(int argc, const char **argv, const char *prefix);
 int cmd_gc(int argc, const char **argv, const char *prefix);
diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
new file mode 100644
index 0000000000..5bba623ff1
--- /dev/null
+++ b/builtin/for-each-repo.c
@@ -0,0 +1,58 @@ 
+#include "cache.h"
+#include "config.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "string-list.h"
+
+static const char * const for_each_repo_usage[] = {
+	N_("git for-each-repo --config=<config> <command-args>"),
+	NULL
+};
+
+static int run_command_on_repo(const char *path,
+			       void *cbdata)
+{
+	int i;
+	struct child_process child = CHILD_PROCESS_INIT;
+	struct strvec *args = (struct strvec *)cbdata;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "-C", path, NULL);
+
+	for (i = 0; i < args->nr; i++)
+		strvec_push(&child.args, args->v[i]);
+
+	return run_command(&child);
+}
+
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
+{
+	static const char *config_key = NULL;
+	int i, result = 0;
+	const struct string_list *values;
+	struct strvec args = STRVEC_INIT;
+
+	const struct option options[] = {
+		OPT_STRING(0, "config", &config_key, N_("config"),
+			   N_("config key storing a list of repository paths")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (!config_key)
+		die(_("missing --config=<config>"));
+
+	for (i = 0; i < argc; i++)
+		strvec_push(&args, argv[i]);
+
+	values = repo_config_get_value_multi(the_repository,
+					     config_key);
+
+	for (i = 0; !result && i < values->nr; i++)
+		result = run_command_on_repo(values->items[i].string, &args);
+
+	return result;
+}
diff --git a/command-list.txt b/command-list.txt
index 0e3204e7d1..581499be82 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -94,6 +94,7 @@  git-fetch-pack                          synchingrepositories
 git-filter-branch                       ancillarymanipulators
 git-fmt-merge-msg                       purehelpers
 git-for-each-ref                        plumbinginterrogators
+git-for-each-repo                       plumbinginterrogators
 git-format-patch                        mainporcelain
 git-fsck                                ancillaryinterrogators          complete
 git-gc                                  mainporcelain
diff --git a/git.c b/git.c
index 24f250d29a..1cab64b5d1 100644
--- a/git.c
+++ b/git.c
@@ -511,6 +511,7 @@  static struct cmd_struct commands[] = {
 	{ "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT },
 	{ "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP },
 	{ "for-each-ref", cmd_for_each_ref, RUN_SETUP },
+	{ "for-each-repo", cmd_for_each_repo, RUN_SETUP_GENTLY },
 	{ "format-patch", cmd_format_patch, RUN_SETUP },
 	{ "fsck", cmd_fsck, RUN_SETUP },
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh
new file mode 100755
index 0000000000..136b4ec839
--- /dev/null
+++ b/t/t0068-for-each-repo.sh
@@ -0,0 +1,30 @@ 
+#!/bin/sh
+
+test_description='git for-each-repo builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'run based on configured value' '
+	git init one &&
+	git init two &&
+	git init three &&
+	git -C two commit --allow-empty -m "DID NOT RUN" &&
+	git config run.key "$TRASH_DIRECTORY/one" &&
+	git config --add run.key "$TRASH_DIRECTORY/three" &&
+	git for-each-repo --config=run.key commit --allow-empty -m "ran" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep ran message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep ran message &&
+	git for-each-repo --config=run.key -- commit --allow-empty -m "ran again" &&
+	git -C one log -1 --pretty=format:%s >message &&
+	grep again message &&
+	git -C two log -1 --pretty=format:%s >message &&
+	! grep again message &&
+	git -C three log -1 --pretty=format:%s >message &&
+	grep again message
+'
+
+test_done