Message ID | dd9237927395ef69663ab376a2da74da4654c495.1602782524.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Maintenance III: Background maintenance | expand |
On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> [... snip ...] > diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c > new file mode 100644 > index 0000000000..5bba623ff1 > --- /dev/null > +++ b/builtin/for-each-repo.c > @@ -0,0 +1,58 @@ > +#include "cache.h" > +#include "config.h" > +#include "builtin.h" > +#include "parse-options.h" > +#include "run-command.h" > +#include "string-list.h" > + > +static const char * const for_each_repo_usage[] = { > + N_("git for-each-repo --config=<config> <command-args>"), > + NULL > +}; > + > +static int run_command_on_repo(const char *path, > + void *cbdata) > +{ > + int i; > + struct child_process child = CHILD_PROCESS_INIT; > + struct strvec *args = (struct strvec *)cbdata; I was curious there's a strong reason for declaring args as void * followed by this cast? The most obvious answer seems to be that this probably evolved from a callback - and we could simplify it now? > + > + child.git_cmd = 1; > + strvec_pushl(&child.args, "-C", path, NULL); > + > + for (i = 0; i < args->nr; i++) > + strvec_push(&child.args, args->v[i]); So here we're copying all of args - and I don't see any way of avoiding it since we're adding it to child's arg list. > + > + return run_command(&child); > +} > + > +int cmd_for_each_repo(int argc, const char **argv, const char *prefix) > +{ > + static const char *config_key = NULL; > + int i, result = 0; > + const struct string_list *values; > + struct strvec args = STRVEC_INIT; > + > + const struct option options[] = { > + OPT_STRING(0, "config", &config_key, N_("config"), > + N_("config key storing a list of repository paths")), > + OPT_END() > + }; > + > + argc = parse_options(argc, argv, prefix, options, for_each_repo_usage, > + PARSE_OPT_STOP_AT_NON_OPTION); > + > + if (!config_key) > + die(_("missing --config=<config>")); > + > + for (i = 0; i < argc; i++) > + strvec_push(&args, argv[i]); But why do we have to copy all of argv 1:1 into args here, only to later pass it to run_command_on_repo() which, as described above, copies the entire input again? I suspect this was done to comply with run_command_on_repo()'s API (which takes strvec) - does that seem plausible, or did I miss something? Which brings me to the real reason for my questions: I noticed we "leak" args (this leak is of no significance since it happens in cmd_*, but LSAN still complains, and I'm trying to get tests running leak-free). My initial inclination was to strvec_clear() or UNLEAK() args - but if we can avoid creating args in the first place we also wouldn't need to clear it later. My current proposal is therefore to completely remove args and pass argc+argv into run_command_on_repo() - but I wanted to be sure that I didn't miss some important reason to stick with the current approach. > + > + values = repo_config_get_value_multi(the_repository, > + config_key); > + > + for (i = 0; !result && i < values->nr; i++) > + result = run_command_on_repo(values->items[i].string, &args); > + > + return result; > +} [... snip ...] (I hope this doesn't come across as useless necroposting - I figured it would be easier to clarify these questions on the original thread as opposed to potentially discussing it as part of my next leak-fixing series :) .) ATB, Andrzej
I'm not Stolee and hadn't even looked at this code before, but I'll jump in with a couple comments... On Mon, May 3, 2021 at 12:11 PM Andrzej Hunt <andrzej@ahunt.org> wrote: > On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote: > > +static int run_command_on_repo(const char *path, > > + void *cbdata) > > +{ > > + int i; > > + struct child_process child = CHILD_PROCESS_INIT; > > + struct strvec *args = (struct strvec *)cbdata; > > I was curious there's a strong reason for declaring args as void * > followed by this cast? The most obvious answer seems to be that this > probably evolved from a callback - and we could simplify it now? Agreed, the `void*` cbdata doesn't make sense here. > > + strvec_pushl(&child.args, "-C", path, NULL); > > + > > + for (i = 0; i < args->nr; i++) > > + strvec_push(&child.args, args->v[i]); > > So here we're copying all of args - and I don't see any way of avoiding > it since we're adding it to child's arg list. ... (dot dot dot) > > +int cmd_for_each_repo(int argc, const char **argv, const char *prefix) > > +{ > > + struct strvec args = STRVEC_INIT; > > + for (i = 0; i < argc; i++) > > + strvec_push(&args, argv[i]); > > But why do we have to copy all of argv 1:1 into args here, only to later > pass it to run_command_on_repo() which, as described above, copies the > entire input again? I suspect this was done to comply with > run_command_on_repo()'s API (which takes strvec) - does that seem > plausible, or did I miss something? > > Which brings me to the real reason for my questions: I noticed we "leak" > args (this leak is of no significance since it happens in cmd_*, but > LSAN still complains, and I'm trying to get tests running leak-free). My > initial inclination was to strvec_clear() or UNLEAK() args - but if we > can avoid creating args in the first place we also wouldn't need to > clear it later. > > My current proposal is therefore to completely remove args and pass > argc+argv into run_command_on_repo() - but I wanted to be sure that I > didn't miss some important reason to stick with the current approach. An alternative to all this copying would be to take advantage of child_process.argv which is owned by the caller, thus does not get freed automatically by run_command(). This would allow you to re-use the same argument vector for all calls. And you don't need run_command_on_repo() at all. Something like this in cmd_for_each_repo(), untested and typed directly in email: struct child_process child = CHILD_PROCESS_INIT; for (i = 0; !result && i < values->nr; i++) { const char *d = chdir(values->items[i].string); if (chdir(d)) die_errno(_("cannot chdir to '%s'"), d); child.git_cmd = 1; child.argv = argv; result = run_command(&child); } This assumes that argv[] is correctly NULL-terminated after parse_options() -- I didn't check, but expect it to be so. If not, it's easy enough to copy argv[] into `args` once and then strvec_clear(&args) at the end of the function. The one downside is that trace output wouldn't be as helpful (I think) because you wouldn't see an explicit "-C <dir>", but I suppose the tracing machinery can be invoked manually to address this.
On Mon, May 3, 2021 at 1:01 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > for (i = 0; !result && i < values->nr; i++) { > const char *d = chdir(values->items[i].string); > if (chdir(d)) > die_errno(_("cannot chdir to '%s'"), d); > child.git_cmd = 1; > child.argv = argv; > result = run_command(&child); > } Without the copy/paste error, the declaration of `d` would, of course, be: const char *d = values->items[i].string;
On 5/3/2021 12:10 PM, Andrzej Hunt wrote: > > On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote: >> From: Derrick Stolee <dstolee@microsoft.com> >> +static int run_command_on_repo(const char *path, >> + void *cbdata) >> +{ >> + int i; >> + struct child_process child = CHILD_PROCESS_INIT; >> + struct strvec *args = (struct strvec *)cbdata; > > I was curious there's a strong reason for declaring args as void * followed by this cast? The most obvious answer seems to be that this probably evolved from a callback - and we could simplify it now? You are absolutely right that this evolved from a callback. I look forward to reviewing your patch that updates this. ;) Thanks, -Stolee
diff --git a/.gitignore b/.gitignore index a5808fa30d..5eb2a2be71 100644 --- a/.gitignore +++ b/.gitignore @@ -67,6 +67,7 @@ /git-filter-branch /git-fmt-merge-msg /git-for-each-ref +/git-for-each-repo /git-format-patch /git-fsck /git-fsck-objects diff --git a/Documentation/git-for-each-repo.txt b/Documentation/git-for-each-repo.txt new file mode 100644 index 0000000000..94bd19da26 --- /dev/null +++ b/Documentation/git-for-each-repo.txt @@ -0,0 +1,59 @@ +git-for-each-repo(1) +==================== + +NAME +---- +git-for-each-repo - Run a Git command on a list of repositories + + +SYNOPSIS +-------- +[verse] +'git for-each-repo' --config=<config> [--] <arguments> + + +DESCRIPTION +----------- +Run a Git command on a list of repositories. The arguments after the +known options or `--` indicator are used as the arguments for the Git +subprocess. + +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE. + +For example, we could run maintenance on each of a list of repositories +stored in a `maintenance.repo` config variable using + +------------- +git for-each-repo --config=maintenance.repo maintenance run +------------- + +This will run `git -C <repo> maintenance run` for each value `<repo>` +in the multi-valued config variable `maintenance.repo`. + + +OPTIONS +------- +--config=<config>:: + Use the given config variable as a multi-valued list storing + absolute path names. Iterate on that list of paths to run + the given arguments. ++ +These config values are loaded from system, global, and local Git config, +as available. If `git for-each-repo` is run in a directory that is not a +Git repository, then only the system and global config is used. + + +SUBPROCESS BEHAVIOR +------------------- + +If any `git -C <repo> <arguments>` subprocess returns a non-zero exit code, +then the `git for-each-repo` process returns that exit code without running +more subprocesses. + +Each `git -C <repo> <arguments>` subprocess inherits the standard file +descriptors `stdin`, `stdout`, and `stderr`. + + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Makefile b/Makefile index 65f8cfb236..7c588ff036 100644 --- a/Makefile +++ b/Makefile @@ -1071,6 +1071,7 @@ BUILTIN_OBJS += builtin/fetch-pack.o BUILTIN_OBJS += builtin/fetch.o BUILTIN_OBJS += builtin/fmt-merge-msg.o BUILTIN_OBJS += builtin/for-each-ref.o +BUILTIN_OBJS += builtin/for-each-repo.o BUILTIN_OBJS += builtin/fsck.o BUILTIN_OBJS += builtin/gc.o BUILTIN_OBJS += builtin/get-tar-commit-id.o diff --git a/builtin.h b/builtin.h index 17c1c0ce49..ff7c6e5aa9 100644 --- a/builtin.h +++ b/builtin.h @@ -150,6 +150,7 @@ int cmd_fetch(int argc, const char **argv, const char *prefix); int cmd_fetch_pack(int argc, const char **argv, const char *prefix); int cmd_fmt_merge_msg(int argc, const char **argv, const char *prefix); int cmd_for_each_ref(int argc, const char **argv, const char *prefix); +int cmd_for_each_repo(int argc, const char **argv, const char *prefix); int cmd_format_patch(int argc, const char **argv, const char *prefix); int cmd_fsck(int argc, const char **argv, const char *prefix); int cmd_gc(int argc, const char **argv, const char *prefix); diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c new file mode 100644 index 0000000000..5bba623ff1 --- /dev/null +++ b/builtin/for-each-repo.c @@ -0,0 +1,58 @@ +#include "cache.h" +#include "config.h" +#include "builtin.h" +#include "parse-options.h" +#include "run-command.h" +#include "string-list.h" + +static const char * const for_each_repo_usage[] = { + N_("git for-each-repo --config=<config> <command-args>"), + NULL +}; + +static int run_command_on_repo(const char *path, + void *cbdata) +{ + int i; + struct child_process child = CHILD_PROCESS_INIT; + struct strvec *args = (struct strvec *)cbdata; + + child.git_cmd = 1; + strvec_pushl(&child.args, "-C", path, NULL); + + for (i = 0; i < args->nr; i++) + strvec_push(&child.args, args->v[i]); + + return run_command(&child); +} + +int cmd_for_each_repo(int argc, const char **argv, const char *prefix) +{ + static const char *config_key = NULL; + int i, result = 0; + const struct string_list *values; + struct strvec args = STRVEC_INIT; + + const struct option options[] = { + OPT_STRING(0, "config", &config_key, N_("config"), + N_("config key storing a list of repository paths")), + OPT_END() + }; + + argc = parse_options(argc, argv, prefix, options, for_each_repo_usage, + PARSE_OPT_STOP_AT_NON_OPTION); + + if (!config_key) + die(_("missing --config=<config>")); + + for (i = 0; i < argc; i++) + strvec_push(&args, argv[i]); + + values = repo_config_get_value_multi(the_repository, + config_key); + + for (i = 0; !result && i < values->nr; i++) + result = run_command_on_repo(values->items[i].string, &args); + + return result; +} diff --git a/command-list.txt b/command-list.txt index 0e3204e7d1..581499be82 100644 --- a/command-list.txt +++ b/command-list.txt @@ -94,6 +94,7 @@ git-fetch-pack synchingrepositories git-filter-branch ancillarymanipulators git-fmt-merge-msg purehelpers git-for-each-ref plumbinginterrogators +git-for-each-repo plumbinginterrogators git-format-patch mainporcelain git-fsck ancillaryinterrogators complete git-gc mainporcelain diff --git a/git.c b/git.c index 24f250d29a..1cab64b5d1 100644 --- a/git.c +++ b/git.c @@ -511,6 +511,7 @@ static struct cmd_struct commands[] = { { "fetch-pack", cmd_fetch_pack, RUN_SETUP | NO_PARSEOPT }, { "fmt-merge-msg", cmd_fmt_merge_msg, RUN_SETUP }, { "for-each-ref", cmd_for_each_ref, RUN_SETUP }, + { "for-each-repo", cmd_for_each_repo, RUN_SETUP_GENTLY }, { "format-patch", cmd_format_patch, RUN_SETUP }, { "fsck", cmd_fsck, RUN_SETUP }, { "fsck-objects", cmd_fsck, RUN_SETUP }, diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh new file mode 100755 index 0000000000..136b4ec839 --- /dev/null +++ b/t/t0068-for-each-repo.sh @@ -0,0 +1,30 @@ +#!/bin/sh + +test_description='git for-each-repo builtin' + +. ./test-lib.sh + +test_expect_success 'run based on configured value' ' + git init one && + git init two && + git init three && + git -C two commit --allow-empty -m "DID NOT RUN" && + git config run.key "$TRASH_DIRECTORY/one" && + git config --add run.key "$TRASH_DIRECTORY/three" && + git for-each-repo --config=run.key commit --allow-empty -m "ran" && + git -C one log -1 --pretty=format:%s >message && + grep ran message && + git -C two log -1 --pretty=format:%s >message && + ! grep ran message && + git -C three log -1 --pretty=format:%s >message && + grep ran message && + git for-each-repo --config=run.key -- commit --allow-empty -m "ran again" && + git -C one log -1 --pretty=format:%s >message && + grep again message && + git -C two log -1 --pretty=format:%s >message && + ! grep again message && + git -C three log -1 --pretty=format:%s >message && + grep again message +' + +test_done