Message ID | patch-10.10-81f138e460c-20221017T115544Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | submodule: make it a built-in, remove git-submodule.sh | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > In a preceding commit we created "builtin/submodule.c" and faithfully > tried to reproduce every aspect of "git-submodule.sh", including its > invocation of "git submodule--helper" as a sub-process. > > Let's do away with the sub-process and invoke > "cmd_submodule__helper()" directly. Eventually we'll want to do away > with "builtin/submodule--helper.c" altogether, but let's not do that > for now to avoid conflicts with other in-flight topics. Even without > those conflicts the resulting diff would be large. We can leave that > for a later cleanup. Thanks! e.g. this managed to avoid conflicts with my newly sent out "submodule clone with branches" v2 [1] [1] https://lore.kernel.org/git/pull.1321.v2.git.git.1666297238.gitgitgadget@gmail.com > It's also worth noting that some users were using e.g. "git > submodule--helper list" directly for performance reasons[2]. With > 31955475d1c (submodule--helper: remove unused "list" helper, > 2022-09-01) released with v2.38.0 the "list" command was no longer > provided. [...] > I think it would make sense to implement a "--format" option for "git > submodule foreach" to help anyone who cares about that remaining > performance (and to improve the API, e.g. by supporting "-z"), but as > far as performance goes this makes the runtime acceptable again. FWIW, I don't think that dropping "git submodule--helper list" was a mistake, since all of "git submodule--helper" was meant to be internal anyway. But if users find it useful, we could add actually supported functionality like what you propose here. > The pattern in "cmd_submodule_builtin()" of saving "struct strvec" > arguments to a "struct string_list" and free()-ing them after the > "argv" has been modified by "cmd_submodule__helper()" is new, without > it we'd get various already-passing tests failing under SANITIZE=leak. [...] > +static int cmd_submodule_builtin(struct strvec *args, const char *prefix) > +{ > + size_t i; > + struct string_list to_free = STRING_LIST_INIT_DUP; > + int ret; > + > + /* > + * The cmd_submodule__helper() will treat the argv as > + * its own and modify it, so e.g. for "git submodule > + * add" the "add" argument will be removed, and we'll > + * thus leak from the strvec_push()'s in > + * setup_helper_args(). > + * > + * So in lieu of some generic "snapshot for a free" > + * API for "struct strvec" squirrel away the pointers > + * to free with string_list_clear() later. > + */ > + for (i = 0; i < args->nr; i++) > + string_list_append_nodup(&to_free, (char *)args->v[i]); > + > + ret = cmd_submodule__helper(args->nr, args->v, prefix); > + > + string_list_clear(&to_free, 0); Hm, so this trick works because we init STRING_LIST_INIT_DUP to make the string_list think that it owns its strings, but when we append, we use string_list_append_nodup(), which doesn't dup the strings. This 'moves' the strings into the string_list and causes them to be freed when we call string_list_clear(). Sounds reasonable enough to me, but I'm not an expert on leaks :) > + free(strvec_detach(args)); > + > + return ret; > +} > + > int cmd_submodule(int argc, const char **argv, const char *prefix) > { > int opt_quiet = 0; > int opt_cached = 0; > int opt_recursive = 0; > - struct child_process cp = CHILD_PROCESS_INIT; > + struct strvec args = STRVEC_INIT; > struct option options[] = { > OPT__QUIET(&opt_quiet, N_("be quiet")), > OPT_BOOL(0, "cached", &opt_cached, > @@ -141,13 +169,10 @@ int cmd_submodule(int argc, const char **argv, const char *prefix) > * Tell the rest of git that any URLs we get don't come > * directly from the user, so it can apply policy as appropriate. > */ > - strvec_push(&cp.env, "GIT_PROTOCOL_FROM_USER=0"); > - setup_helper_args(argc, argv, prefix, opt_quiet, opt_cached, > - opt_recursive, &cp.args, options); > + xsetenv("GIT_PROTOCOL_FROM_USER", "0", 1); > > - cp.git_cmd = 1; > - cp.no_stdin = 0; /* for git submodule foreach */ > - cp.dir = startup_info->original_cwd; > + setup_helper_args(argc, argv, prefix, opt_quiet, opt_cached, > + opt_recursive, &args, options); > > - return run_command(&cp); > + return cmd_submodule_builtin(&args, prefix); > } > -- > 2.38.0.1091.gf9d18265e59
diff --git a/builtin/submodule.c b/builtin/submodule.c index ca8e273b6e9..13e7064b03f 100644 --- a/builtin/submodule.c +++ b/builtin/submodule.c @@ -119,12 +119,40 @@ static void setup_helper_args(int argc, const char **argv, const char *prefix, strvec_pushv(args, argv); } +static int cmd_submodule_builtin(struct strvec *args, const char *prefix) +{ + size_t i; + struct string_list to_free = STRING_LIST_INIT_DUP; + int ret; + + /* + * The cmd_submodule__helper() will treat the argv as + * its own and modify it, so e.g. for "git submodule + * add" the "add" argument will be removed, and we'll + * thus leak from the strvec_push()'s in + * setup_helper_args(). + * + * So in lieu of some generic "snapshot for a free" + * API for "struct strvec" squirrel away the pointers + * to free with string_list_clear() later. + */ + for (i = 0; i < args->nr; i++) + string_list_append_nodup(&to_free, (char *)args->v[i]); + + ret = cmd_submodule__helper(args->nr, args->v, prefix); + + string_list_clear(&to_free, 0); + free(strvec_detach(args)); + + return ret; +} + int cmd_submodule(int argc, const char **argv, const char *prefix) { int opt_quiet = 0; int opt_cached = 0; int opt_recursive = 0; - struct child_process cp = CHILD_PROCESS_INIT; + struct strvec args = STRVEC_INIT; struct option options[] = { OPT__QUIET(&opt_quiet, N_("be quiet")), OPT_BOOL(0, "cached", &opt_cached, @@ -141,13 +169,10 @@ int cmd_submodule(int argc, const char **argv, const char *prefix) * Tell the rest of git that any URLs we get don't come * directly from the user, so it can apply policy as appropriate. */ - strvec_push(&cp.env, "GIT_PROTOCOL_FROM_USER=0"); - setup_helper_args(argc, argv, prefix, opt_quiet, opt_cached, - opt_recursive, &cp.args, options); + xsetenv("GIT_PROTOCOL_FROM_USER", "0", 1); - cp.git_cmd = 1; - cp.no_stdin = 0; /* for git submodule foreach */ - cp.dir = startup_info->original_cwd; + setup_helper_args(argc, argv, prefix, opt_quiet, opt_cached, + opt_recursive, &args, options); - return run_command(&cp); + return cmd_submodule_builtin(&args, prefix); }
In a preceding commit we created "builtin/submodule.c" and faithfully tried to reproduce every aspect of "git-submodule.sh", including its invocation of "git submodule--helper" as a sub-process. Let's do away with the sub-process and invoke "cmd_submodule__helper()" directly. Eventually we'll want to do away with "builtin/submodule--helper.c" altogether, but let's not do that for now to avoid conflicts with other in-flight topics. Even without those conflicts the resulting diff would be large. We can leave that for a later cleanup. This speeds up invocations of all "git submodule" commands, E.g. a trivial "foreach" command on git.git is around 1.50 times faster[1]. For more expensive commands this'll make less of a difference, as the fixed cost of invoking the sub-process will be amortized away. $ git hyperfine -L rev HEAD~1,HEAD -s 'make CFLAGS=-O3' './git submodule foreach "echo \$name"' Benchmark 1: ./git submodule foreach "echo \$name"' in 'HEAD~1 Time (mean ± σ): 9.7 ms ± 0.1 ms [User: 7.6 ms, System: 2.1 ms] Range (min … max): 9.4 ms … 10.2 ms 285 runs Benchmark 2: ./git submodule foreach "echo \$name"' in 'HEAD Time (mean ± σ): 6.6 ms ± 0.1 ms [User: 5.1 ms, System: 1.5 ms] Range (min … max): 6.2 ms … 7.2 ms 414 runs Summary './git submodule foreach "echo \$name"' in 'HEAD' ran 1.48 ± 0.04 times faster than './git submodule foreach "echo \$name"' in 'HEAD~1' It's also worth noting that some users were using e.g. "git submodule--helper list" directly for performance reasons[2]. With 31955475d1c (submodule--helper: remove unused "list" helper, 2022-09-01) released with v2.38.0 the "list" command was no longer provided. Users who had to switch to "git submodule--helper foreach" were given a command that (on my system) is around 6.5x slower. Now the "foreach" is around 0.10x slower (due to the slight shell overhead), with 31955475d1c reverted on top of this: $ hyperfine './git submodule--helper list' './git submodule foreach --quiet "echo \$name"' --warmup 10 Benchmark 1: ./git submodule--helper list Time (mean ± σ): 6.4 ms ± 0.1 ms [User: 5.0 ms, System: 1.5 ms] Range (min … max): 6.2 ms … 7.2 ms 427 runs Benchmark 2: ./git submodule foreach --quiet "echo \$name" Time (mean ± σ): 7.0 ms ± 0.1 ms [User: 4.8 ms, System: 2.3 ms] Range (min … max): 6.8 ms … 7.4 ms 390 runs Summary './git submodule--helper list' ran 1.10 ± 0.03 times faster than './git submodule foreach --quiet "echo \$name"' I think it would make sense to implement a "--format" option for "git submodule foreach" to help anyone who cares about that remaining performance (and to improve the API, e.g. by supporting "-z"), but as far as performance goes this makes the runtime acceptable again. The pattern in "cmd_submodule_builtin()" of saving "struct strvec" arguments to a "struct string_list" and free()-ing them after the "argv" has been modified by "cmd_submodule__helper()" is new, without it we'd get various already-passing tests failing under SANITIZE=leak. 1. Using the "git hyperfine" wrapper for "hyperfine": https://lore.kernel.org/git/211201.86r1aw9gbd.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/87czatrpyb.fsf@bernoul.li/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/submodule.c | 41 +++++++++++++++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 8 deletions(-)