Message ID | patch-03.11-e5ec6945409-20220713T131601Z-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | submodule--helper: fix memory leaks | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > The "path" member can come from "argv" (i.e. not malloc'd), or it can > be something we determine at runtime. In the latter case let's save > away a pointer to free() to avoid leaking memory. [...] > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index 73717be957c..23ab9c7e349 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -1511,6 +1511,7 @@ static int module_deinit(int argc, const char **argv, const char *prefix) > struct module_clone_data { > const char *prefix; > const char *path; > + char *path_free; > const char *name; > const char *url; > const char *depth; > @@ -1527,6 +1528,11 @@ struct module_clone_data { > .single_branch = -1, \ > } > > +static void module_clone_data_release(struct module_clone_data *cd) > +{ > + free(cd->path_free); > +} > + > struct submodule_alternate_setup { > const char *submodule_name; > enum SUBMODULE_ALTERNATE_ERROR_MODE { > @@ -1651,9 +1657,9 @@ static int clone_submodule(struct module_clone_data *clone_data) > > if (!is_absolute_path(clone_data->path)) { > strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path); > - clone_data->path = strbuf_detach(&sb, NULL); > + clone_data->path = clone_data->path_free = strbuf_detach(&sb, NULL); > } else { > - clone_data->path = xstrdup(clone_data->path); > + clone_data->path = clone_data->path_free = xstrdup(clone_data->path); > } Hm, having .path_free doesn't seem necessary. If I'm reading the code correctly, - in module_clone() we set clone_data.path from argv - then we unconditionally call clone_submodule(), which does all of the hard work - in clone_submodule(), we always hit this conditional, which means that past this point, clone_data.path is always free()-able. which suggests that we don't need clone_data.path_free at all. I suspect that just this static void module_clone_data_release(struct module_clone_data *cd) { free(cd->path); } is enough to plug the leak (though admittedly, I haven't properly tested the leak because it's been a PITA to set up locally). That's a pretty hacky suggestion though, since we're still using the same member to hold free()-able and non-free()-able memory. Instead, maybe we could move this "clone_data.path = freeable" logic into module_clone(), like: diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 73717be957..d67d4b9647 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -1649,13 +1649,6 @@ static int clone_submodule(struct module_clone_data *clone_data) sm_gitdir = absolute_pathdup(sb.buf); strbuf_reset(&sb); - if (!is_absolute_path(clone_data->path)) { - strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path); - clone_data->path = strbuf_detach(&sb, NULL); - } else { - clone_data->path = xstrdup(clone_data->path); - } - if (validate_submodule_git_dir(sm_gitdir, clone_data->name) < 0) die(_("refusing to create/use '%s' in another submodule's " "git dir"), sm_gitdir); @@ -1745,12 +1738,13 @@ static int module_clone(int argc, const char **argv, const char *prefix) int dissociate = 0, quiet = 0, progress = 0, require_init = 0; struct module_clone_data clone_data = MODULE_CLONE_DATA_INIT; struct list_objects_filter_options filter_options = { 0 }; + const char *clone_path; struct option module_clone_options[] = { OPT_STRING(0, "prefix", &clone_data.prefix, N_("path"), N_("alternative anchor for relative paths")), - OPT_STRING(0, "path", &clone_data.path, + OPT_STRING(0, "path", &clone_path, N_("path"), N_("where the new submodule will be cloned to")), OPT_STRING(0, "name", &clone_data.name, @@ -1795,6 +1789,15 @@ static int module_clone(int argc, const char **argv, const char *prefix) clone_data.require_init = !!require_init; clone_data.filter_options = &filter_options; + if (!is_absolute_path(clone_path)) { + struct strbuf sb = STRBUF_INIT; + strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_path); + clone_data.path = strbuf_detach(&sb, NULL); + strbuf_release(&sb); + } else { + clone_data.path = xstrdup(clone_path); + } + if (argc || !clone_data.url || !clone_data.path || !*(clone_data.path)) usage_with_options(git_submodule_helper_usage, module_clone_options);
Glen Choo <chooglen@google.com> writes: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c >> index 73717be957c..23ab9c7e349 100644 >> --- a/builtin/submodule--helper.c >> +++ b/builtin/submodule--helper.c >> @@ -1511,6 +1511,7 @@ static int module_deinit(int argc, const char **argv, const char *prefix) >> struct module_clone_data { >> const char *prefix; >> const char *path; >> + char *path_free; >> const char *name; >> const char *url; >> const char *depth; >> @@ -1527,6 +1528,11 @@ struct module_clone_data { >> .single_branch = -1, \ >> } >> >> +static void module_clone_data_release(struct module_clone_data *cd) >> +{ >> + free(cd->path_free); >> +} >> + >> struct submodule_alternate_setup { >> const char *submodule_name; >> enum SUBMODULE_ALTERNATE_ERROR_MODE { >> @@ -1651,9 +1657,9 @@ static int clone_submodule(struct module_clone_data *clone_data) >> >> if (!is_absolute_path(clone_data->path)) { >> strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path); >> - clone_data->path = strbuf_detach(&sb, NULL); >> + clone_data->path = clone_data->path_free = strbuf_detach(&sb, NULL); >> } else { >> - clone_data->path = xstrdup(clone_data->path); >> + clone_data->path = clone_data->path_free = xstrdup(clone_data->path); >> } > > Hm, having .path_free doesn't seem necessary. If I'm reading the code > correctly, > > - in module_clone() we set clone_data.path from argv > - then we unconditionally call clone_submodule(), which does all of the > hard work > - in clone_submodule(), we always hit this conditional, which means that > past this point, clone_data.path is always free()-able. > > which suggests that we don't need clone_data.path_free at all. I suspect > that just this > > static void module_clone_data_release(struct module_clone_data *cd) > { > free(cd->path); > } > > is enough to plug the leak (though admittedly, I haven't properly tested > the leak because it's been a PITA to set up locally). > > That's a pretty hacky suggestion though, since we're still using the > same member to hold free()-able and non-free()-able memory.... Ah, here's a wacky idea (whose feasibility I haven't thought about at all) that would make things a lot cleaner. If we had something like OPT_STRING_DUP, that xstrdup()-s the value from argv when we parse it, then we could drop the "const" from clone_data.path and just free() it.
On Wed, Jul 13 2022, Glen Choo wrote: > Glen Choo <chooglen@google.com> writes: > >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >>> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c >>> index 73717be957c..23ab9c7e349 100644 >>> --- a/builtin/submodule--helper.c >>> +++ b/builtin/submodule--helper.c >>> @@ -1511,6 +1511,7 @@ static int module_deinit(int argc, const char **argv, const char *prefix) >>> struct module_clone_data { >>> const char *prefix; >>> const char *path; >>> + char *path_free; >>> const char *name; >>> const char *url; >>> const char *depth; >>> @@ -1527,6 +1528,11 @@ struct module_clone_data { >>> .single_branch = -1, \ >>> } >>> >>> +static void module_clone_data_release(struct module_clone_data *cd) >>> +{ >>> + free(cd->path_free); >>> +} >>> + >>> struct submodule_alternate_setup { >>> const char *submodule_name; >>> enum SUBMODULE_ALTERNATE_ERROR_MODE { >>> @@ -1651,9 +1657,9 @@ static int clone_submodule(struct module_clone_data *clone_data) >>> >>> if (!is_absolute_path(clone_data->path)) { >>> strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path); >>> - clone_data->path = strbuf_detach(&sb, NULL); >>> + clone_data->path = clone_data->path_free = strbuf_detach(&sb, NULL); >>> } else { >>> - clone_data->path = xstrdup(clone_data->path); >>> + clone_data->path = clone_data->path_free = xstrdup(clone_data->path); >>> } >> >> Hm, having .path_free doesn't seem necessary. If I'm reading the code >> correctly, >> >> - in module_clone() we set clone_data.path from argv >> - then we unconditionally call clone_submodule(), which does all of the >> hard work >> - in clone_submodule(), we always hit this conditional, which means that >> past this point, clone_data.path is always free()-able. >> >> which suggests that we don't need clone_data.path_free at all. I suspect >> that just this >> >> static void module_clone_data_release(struct module_clone_data *cd) >> { >> free(cd->path); >> } >> >> is enough to plug the leak (though admittedly, I haven't properly tested >> the leak because it's been a PITA to set up locally). >> >> That's a pretty hacky suggestion though, since we're still using the >> same member to hold free()-able and non-free()-able memory.... > > Ah, here's a wacky idea (whose feasibility I haven't thought about at > all) that would make things a lot cleaner. If we had something like > OPT_STRING_DUP, that xstrdup()-s the value from argv when we parse it, > then we could drop the "const" from clone_data.path and just free() it. I suppose so, it might make some things simpler, of course at the cost of needlessly duplicating things. But we also have various common patterns such as string_lists where some elements are dup'd, some aren't, and need to deal with that. I think just having common idioms for tracking the dupe is usually better, e.g. in the case of a string list stick the pointer to free in "util". I think in this case the patch as-is is better than your suggestions, because it's a less fragile pattern, we explicitly mark when we dup something that's sometimes a dup, and sometimes not. Whereas if we do it with the xstrdup() at the start it requires more moving things around, and if we have a new user who parses the same argument we'll bug out on that free(). What do you think?
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Wed, Jul 13 2022, Glen Choo wrote: > >> Glen Choo <chooglen@google.com> writes: >> >>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >>> >>>> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c >>>> index 73717be957c..23ab9c7e349 100644 >>>> --- a/builtin/submodule--helper.c >>>> +++ b/builtin/submodule--helper.c >>>> @@ -1511,6 +1511,7 @@ static int module_deinit(int argc, const char **argv, const char *prefix) >>>> struct module_clone_data { >>>> const char *prefix; >>>> const char *path; >>>> + char *path_free; >>>> const char *name; >>>> const char *url; >>>> const char *depth; >>>> @@ -1527,6 +1528,11 @@ struct module_clone_data { >>>> .single_branch = -1, \ >>>> } >>>> >>>> +static void module_clone_data_release(struct module_clone_data *cd) >>>> +{ >>>> + free(cd->path_free); >>>> +} >>>> + >>>> struct submodule_alternate_setup { >>>> const char *submodule_name; >>>> enum SUBMODULE_ALTERNATE_ERROR_MODE { >>>> @@ -1651,9 +1657,9 @@ static int clone_submodule(struct module_clone_data *clone_data) >>>> >>>> if (!is_absolute_path(clone_data->path)) { >>>> strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path); >>>> - clone_data->path = strbuf_detach(&sb, NULL); >>>> + clone_data->path = clone_data->path_free = strbuf_detach(&sb, NULL); >>>> } else { >>>> - clone_data->path = xstrdup(clone_data->path); >>>> + clone_data->path = clone_data->path_free = xstrdup(clone_data->path); >>>> } >>> >>> Hm, having .path_free doesn't seem necessary. If I'm reading the code >>> correctly, >>> >>> - in module_clone() we set clone_data.path from argv >>> - then we unconditionally call clone_submodule(), which does all of the >>> hard work >>> - in clone_submodule(), we always hit this conditional, which means that >>> past this point, clone_data.path is always free()-able. >>> >>> which suggests that we don't need clone_data.path_free at all. I suspect >>> that just this >>> >>> static void module_clone_data_release(struct module_clone_data *cd) >>> { >>> free(cd->path); >>> } >>> >>> is enough to plug the leak (though admittedly, I haven't properly tested >>> the leak because it's been a PITA to set up locally). >>> >>> That's a pretty hacky suggestion though, since we're still using the >>> same member to hold free()-able and non-free()-able memory.... >> >> Ah, here's a wacky idea (whose feasibility I haven't thought about at >> all) that would make things a lot cleaner. If we had something like >> OPT_STRING_DUP, that xstrdup()-s the value from argv when we parse it, >> then we could drop the "const" from clone_data.path and just free() it. > > I suppose so, it might make some things simpler, of course at the cost > of needlessly duplicating things. > > But we also have various common patterns such as string_lists where some > elements are dup'd, some aren't, and need to deal with that. I think > just having common idioms for tracking the dupe is usually better, > e.g. in the case of a string list stick the pointer to free in "util". Hm, sounds fair. "Sometimes dup and sometimes not" sounds like an inevitability. I'm not experienced enough to know better, and folks whose opinion I sought seem to agree with you. > I think in this case the patch as-is is better than your suggestions, > because it's a less fragile pattern, we explicitly mark when we dup > something that's sometimes a dup, and sometimes not. > > Whereas if we do it with the xstrdup() at the start it requires more > moving things around, and if we have a new user who parses the same > argument we'll bug out on that free(). > > What do you think? Frankly I'm ok with moving things around; I think the code could use a little cleaning up :) But yeah, I think my suggestion isn't so great - it's a bit weird to keep around an auto variable that exists only to be dup-ed to the thing we care about. We can forget about that. I do think that it's worth avoiding the "sometimes dup, sometimes not" pattern if we can, though (of course these are just my non-C instincts talking), and we can do that here if we just choose not to assign back to .path. Something like: struct module_clone_data { const char *prefix; - const char *path; + char *path; + const char *path_argv; ... if (!is_absolute_path(clone_data->path)) { - strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path); + strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path_argv); + clone_data->path = strbuf_detach(&sb, NULL); } else { - clone_data->path = xstrdup(clone_data->path); + clone_data->path = xstrdup(clone_data->path_argv); } would be clearer to me since the const pointer never points to something that the struct actually owns. But if the "= .to_free = " idiom is well-understood and accepted to the point that we don't need to actively avoid "sometimes dup, sometimes not", then we should drop my suggestion and just go with your patch :)
On Wed, Jul 13 2022, Glen Choo wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> [...] >> What do you think? > > Frankly I'm ok with moving things around; I think the code could use > a little cleaning up :) Yeah, but this whole part of it is something we'll be throwing away entirely anyway, so I wanted to leave it at just fixing the memory leaks as narrowly as possible for now. I.e. we invoke "clone" from submodule--helper itself, which we don't need to do if we just invoke it as a function, in which case this whole argv v.s. dynamically generated difference goes away. Well, we'll have a constant string in some cases, but we'll likely either strdup them all with a strvec, or more likely pass the arguments with custom "options" struct or something. > But yeah, I think my suggestion isn't so great - > it's a bit weird to keep around an auto variable that exists only to be > dup-ed to the thing we care about. We can forget about that. > > I do think that it's worth avoiding the "sometimes dup, sometimes not" > pattern if we can, though (of course these are just my non-C instincts > talking), and we can do that here if we just choose not to assign back > to .path. Something like: > > struct module_clone_data { > const char *prefix; > - const char *path; > + char *path; > + const char *path_argv; > > ... > > if (!is_absolute_path(clone_data->path)) { > - strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path); > + strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path_argv); > + clone_data->path = strbuf_detach(&sb, NULL); > } else { > - clone_data->path = xstrdup(clone_data->path); > + clone_data->path = xstrdup(clone_data->path_argv); > } > > would be clearer to me since the const pointer never points to something > that the struct actually owns. I think that actually makes a lot of sense, I'll probably just change it to that. I'll mull over this again when I get to re-rolling this (depending on future comments), thanks! > But if the "= .to_free = " idiom is well-understood and accepted to the > point that we don't need to actively avoid "sometimes dup, sometimes > not", then we should drop my suggestion and just go with your patch :) FWIW "git grep ' = to_free = ' finds a fair bit of them, but luckily we usually don't need to play that particular game...
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 73717be957c..23ab9c7e349 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -1511,6 +1511,7 @@ static int module_deinit(int argc, const char **argv, const char *prefix) struct module_clone_data { const char *prefix; const char *path; + char *path_free; const char *name; const char *url; const char *depth; @@ -1527,6 +1528,11 @@ struct module_clone_data { .single_branch = -1, \ } +static void module_clone_data_release(struct module_clone_data *cd) +{ + free(cd->path_free); +} + struct submodule_alternate_setup { const char *submodule_name; enum SUBMODULE_ALTERNATE_ERROR_MODE { @@ -1651,9 +1657,9 @@ static int clone_submodule(struct module_clone_data *clone_data) if (!is_absolute_path(clone_data->path)) { strbuf_addf(&sb, "%s/%s", get_git_work_tree(), clone_data->path); - clone_data->path = strbuf_detach(&sb, NULL); + clone_data->path = clone_data->path_free = strbuf_detach(&sb, NULL); } else { - clone_data->path = xstrdup(clone_data->path); + clone_data->path = clone_data->path_free = xstrdup(clone_data->path); } if (validate_submodule_git_dir(sm_gitdir, clone_data->name) < 0) @@ -1801,6 +1807,8 @@ static int module_clone(int argc, const char **argv, const char *prefix) clone_submodule(&clone_data); list_objects_filter_release(&filter_options); + + module_clone_data_release(&clone_data); return 0; } @@ -3016,6 +3024,7 @@ static int add_submodule(const struct add_data *add_data) { char *submod_gitdir_path; struct module_clone_data clone_data = MODULE_CLONE_DATA_INIT; + int ret; /* perhaps the path already exists and is already a git repo, else clone it */ if (is_directory(add_data->sm_path)) { @@ -3077,8 +3086,10 @@ static int add_submodule(const struct add_data *add_data) if (add_data->depth >= 0) clone_data.depth = xstrfmt("%d", add_data->depth); - if (clone_submodule(&clone_data)) - return -1; + if (clone_submodule(&clone_data)) { + ret = -1; + goto cleanup; + } prepare_submodule_repo_env(&cp.env); cp.git_cmd = 1; @@ -3097,7 +3108,10 @@ static int add_submodule(const struct add_data *add_data) if (run_command(&cp)) die(_("unable to checkout submodule '%s'"), add_data->sm_path); } - return 0; + ret = 0; +cleanup: + module_clone_data_release(&clone_data); + return ret; } static int config_submodule_in_gitmodules(const char *name, const char *var, const char *value) diff --git a/t/t6008-rev-list-submodule.sh b/t/t6008-rev-list-submodule.sh index 3153a0d8910..12e67e187ef 100755 --- a/t/t6008-rev-list-submodule.sh +++ b/t/t6008-rev-list-submodule.sh @@ -8,6 +8,7 @@ test_description='git rev-list involving submodules that this repo has' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_expect_success 'setup' ' diff --git a/t/t7414-submodule-mistakes.sh b/t/t7414-submodule-mistakes.sh index f2e7df59cf2..3269298197c 100755 --- a/t/t7414-submodule-mistakes.sh +++ b/t/t7414-submodule-mistakes.sh @@ -1,6 +1,8 @@ #!/bin/sh test_description='handling of common mistakes people may make with submodules' + +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_expect_success 'create embedded repository' ' diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh index 3fcb44767f5..f5426a8e589 100755 --- a/t/t7506-status-submodule.sh +++ b/t/t7506-status-submodule.sh @@ -2,6 +2,7 @@ test_description='git status for submodule' +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_create_repo_with_commit () { diff --git a/t/t7507-commit-verbose.sh b/t/t7507-commit-verbose.sh index ed2653d46fe..92462a22374 100755 --- a/t/t7507-commit-verbose.sh +++ b/t/t7507-commit-verbose.sh @@ -1,6 +1,8 @@ #!/bin/sh test_description='verbose commit template' + +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh write_script "check-for-diff" <<\EOF &&
Fix memory leaks related to the "struct module_clone_data" by creating a module_clone_data_release() function to go with the MODULE_CLONE_DATA_INIT added in a98b02c1128 (submodule--helper: refactor module_clone(), 2021-07-10). The "path" member can come from "argv" (i.e. not malloc'd), or it can be something we determine at runtime. In the latter case let's save away a pointer to free() to avoid leaking memory. Fixing this leak makes several tests pass, so let's mark them as passing with TEST_PASSES_SANITIZE_LEAK=true. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- builtin/submodule--helper.c | 24 +++++++++++++++++++----- t/t6008-rev-list-submodule.sh | 1 + t/t7414-submodule-mistakes.sh | 2 ++ t/t7506-status-submodule.sh | 1 + t/t7507-commit-verbose.sh | 2 ++ 5 files changed, 25 insertions(+), 5 deletions(-)