Message ID | baf93e4a-7f05-857c-e551-09675496c03c@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | am: don't pass strvec to apply_parse_options() | expand |
On Tue, Dec 13 2022, René Scharfe wrote: > apply_parse_options() passes the array of argument strings to > parse_options(), which removes recognized options. The removed strings > are not freed, though. > > Make a copy of the strvec to pass to the function to retain the pointers > of its strings, so we release them all at the end. > > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > builtin/am.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/builtin/am.c b/builtin/am.c > index 30c9b3a9cd..dddf1b9af0 100644 > --- a/builtin/am.c > +++ b/builtin/am.c > @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file) > int res, opts_left; > int force_apply = 0; > int options = 0; > + const char **apply_argv; > > if (init_apply_state(&apply_state, the_repository, NULL)) > BUG("init_apply_state() failed"); > @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file) > strvec_push(&apply_opts, "apply"); > strvec_pushv(&apply_opts, state->git_apply_opts.v); > > - opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, > + /* > + * Build a copy that apply_parse_options() can rearrange. > + * apply_opts.v keeps referencing the allocated strings for > + * strvec_clear() to release. > + */ > + ALLOC_ARRAY(apply_argv, apply_opts.nr); > + COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr); > + > + opts_left = apply_parse_options(apply_opts.nr, apply_argv, > &apply_state, &force_apply, &options, > NULL); > > @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file) > strvec_clear(&apply_paths); > strvec_clear(&apply_opts); > clear_apply_state(&apply_state); > + free(apply_argv); > > if (res) > return res; I don't mind this going in, but it really feels like a bit of a dirty hack. We have widespread leaks all over the place due to this idiom. I.e. parse_options() and a couple of other APIs expect that they can munge the "argv", which is fine if it arrives via main(), but not if we're editing our own strvecs. I think less of a hack is to teach the eventual parse_options() that when it munges it it should free() it. I did that for the revisions API in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that need free()-ing, 2022-08-02). What do you think?
Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason: > > On Tue, Dec 13 2022, René Scharfe wrote: > >> apply_parse_options() passes the array of argument strings to >> parse_options(), which removes recognized options. The removed strings >> are not freed, though. >> >> Make a copy of the strvec to pass to the function to retain the pointers >> of its strings, so we release them all at the end. >> >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> builtin/am.c | 12 +++++++++++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/builtin/am.c b/builtin/am.c >> index 30c9b3a9cd..dddf1b9af0 100644 >> --- a/builtin/am.c >> +++ b/builtin/am.c >> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file) >> int res, opts_left; >> int force_apply = 0; >> int options = 0; >> + const char **apply_argv; >> >> if (init_apply_state(&apply_state, the_repository, NULL)) >> BUG("init_apply_state() failed"); >> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file) >> strvec_push(&apply_opts, "apply"); >> strvec_pushv(&apply_opts, state->git_apply_opts.v); >> >> - opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, >> + /* >> + * Build a copy that apply_parse_options() can rearrange. >> + * apply_opts.v keeps referencing the allocated strings for >> + * strvec_clear() to release. >> + */ >> + ALLOC_ARRAY(apply_argv, apply_opts.nr); >> + COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr); >> + >> + opts_left = apply_parse_options(apply_opts.nr, apply_argv, >> &apply_state, &force_apply, &options, >> NULL); >> >> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file) >> strvec_clear(&apply_paths); >> strvec_clear(&apply_opts); >> clear_apply_state(&apply_state); >> + free(apply_argv); >> >> if (res) >> return res; > > I don't mind this going in, but it really feels like a bit of a dirty > hack. > > We have widespread leaks all over the place due to this > idiom. I.e. parse_options() and a couple of other APIs expect that they > can munge the "argv", which is fine if it arrives via main(), but not if > we're editing our own strvecs. Where? A quick "git grep 'parse_options.*nr'" turns up only this place as one that passes a strvec to parse_options. > I think less of a hack is to teach the eventual parse_options() that > when it munges it it should free() it. I did that for the revisions API > in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that > need free()-ing, 2022-08-02). > > What do you think? Generating string lists and then parsing them is weird. When calls have to cross a process boundary then we have no choice, but in-process we shouldn't have to lower our request to an intermediate text format. git am does it anyway because it writes its options to a file and reads them back when it resumes with --continue, IIUC. I hope that is and will be the only place that uses parse_options() with a strvec -- and then we don't have to change that function. If this pattern is used more widely then we could package the copying done by this patch somehow, e.g. by adding a strvec_parse_options() that wraps the real thing. If we have to change parse_options() at all then I'd prefer it to not free() anything (to keep it usable with main()'s parameters), but to reorder in a non-destructive way. That would mean keeping the NULL sentinel where it is, and making sure all callers use only the returned argc to determine which arguments parse_options() didn't recognize. René
On Tue, Dec 13 2022, René Scharfe wrote: > Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason: >> >> On Tue, Dec 13 2022, René Scharfe wrote: >> >>> apply_parse_options() passes the array of argument strings to >>> parse_options(), which removes recognized options. The removed strings >>> are not freed, though. >>> >>> Make a copy of the strvec to pass to the function to retain the pointers >>> of its strings, so we release them all at the end. >>> >>> Signed-off-by: René Scharfe <l.s.r@web.de> >>> --- >>> builtin/am.c | 12 +++++++++++- >>> 1 file changed, 11 insertions(+), 1 deletion(-) >>> >>> diff --git a/builtin/am.c b/builtin/am.c >>> index 30c9b3a9cd..dddf1b9af0 100644 >>> --- a/builtin/am.c >>> +++ b/builtin/am.c >>> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file) >>> int res, opts_left; >>> int force_apply = 0; >>> int options = 0; >>> + const char **apply_argv; >>> >>> if (init_apply_state(&apply_state, the_repository, NULL)) >>> BUG("init_apply_state() failed"); >>> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file) >>> strvec_push(&apply_opts, "apply"); >>> strvec_pushv(&apply_opts, state->git_apply_opts.v); >>> >>> - opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, >>> + /* >>> + * Build a copy that apply_parse_options() can rearrange. >>> + * apply_opts.v keeps referencing the allocated strings for >>> + * strvec_clear() to release. >>> + */ >>> + ALLOC_ARRAY(apply_argv, apply_opts.nr); >>> + COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr); >>> + >>> + opts_left = apply_parse_options(apply_opts.nr, apply_argv, >>> &apply_state, &force_apply, &options, >>> NULL); >>> >>> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file) >>> strvec_clear(&apply_paths); >>> strvec_clear(&apply_opts); >>> clear_apply_state(&apply_state); >>> + free(apply_argv); >>> >>> if (res) >>> return res; >> >> I don't mind this going in, but it really feels like a bit of a dirty >> hack. >> >> We have widespread leaks all over the place due to this >> idiom. I.e. parse_options() and a couple of other APIs expect that they >> can munge the "argv", which is fine if it arrives via main(), but not if >> we're editing our own strvecs. > > Where? A quick "git grep 'parse_options.*nr'" turns up only this place > as one that passes a strvec to parse_options. Sorry about the vagueness, this was from memory and I didn't have a full list handy. First, that grep isn't going to give you what you want, since it only catches cases where we're passing the "strvec" to the "parse_options()" directly. But if you look at e.g. how cmd_stash() invokes push_stash() or cmd_describe() invoking cmd_name_rev() you'll see callers where we're potentially losing the strvec due to parse_options(). "Potentially" because in both those cases we call that API, but in the latter case we've got a missing strvec_clear(), so maybe that's all that's needed (it doesn't always munge the argv). >> I think less of a hack is to teach the eventual parse_options() that >> when it munges it it should free() it. I did that for the revisions API >> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that >> need free()-ing, 2022-08-02). >> >> What do you think? > > Generating string lists and then parsing them is weird. When calls have > to cross a process boundary then we have no choice, but in-process we > shouldn't have to lower our request to an intermediate text format. git > am does it anyway because it writes its options to a file and reads them > back when it resumes with --continue, IIUC. I agree, but making all those use a nicer API would be a much bigger change. > I hope that is and will be the only place that uses parse_options() with > a strvec -- and then we don't have to change that function. > > If this pattern is used more widely then we could package the copying > done by this patch somehow, e.g. by adding a strvec_parse_options() > that wraps the real thing. The reason I'm encouraging you to look at some of the other strvec leaks is to see if you can think of a pattern that fits these various leaky callers. So e.g. a strvec_parse_options() won't work for the ones noted above where we've lost track of it being a "struct strvec" in the first place. I have a local version somewhere where I did (pseudocode): struct strvec vec = STRVEC_INIT; struct string_list to_free = STRING_LIST_INIT_DUP; // populate vec... for (size_t i = 0; i < vec.nr; i++) string_list_append_nodup(&to_free, v[i]): // call parse_options(), or otherwise munge "vec"... free(strvec_detach(&vec)); string_list_clear(&to_free, 0); I.e. you snapshot the members of the "vec" before it's munged, and free() those via a "struct string_list". Then you don't strvec_clear(), but instead free() the container, its members being free'd by the string_list. Here's a (not yet in-tree) patch that uses that: https://lore.kernel.org/git/patch-10.10-81f138e460c-20221017T115544Z-avarab@gmail.com/ I think that's more reliable & general than the pattern you're adding here. In your version the only reason we're not segfaulting is because the parse_options() consumes all the arguments, but that's not something you can generally rely on with parse_options(). Also, and maybe I'm missing something, but wouldn't this be functionally the same as your implementation: diff --git a/builtin/am.c b/builtin/am.c index 30c9b3a9cd7..e65262cbb21 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1476,11 +1476,13 @@ static int run_apply(const struct am_state *state, const char *index_file) int res, opts_left; int force_apply = 0; int options = 0; + char *to_free; if (init_apply_state(&apply_state, the_repository, NULL)) BUG("init_apply_state() failed"); strvec_push(&apply_opts, "apply"); + to_free = (char *)apply_opts.v[0]; strvec_pushv(&apply_opts, state->git_apply_opts.v); opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, @@ -1489,6 +1491,7 @@ static int run_apply(const struct am_state *state, const char *index_file) if (opts_left != 0) die("unknown option passed through to git apply"); + free(to_free); if (index_file) { apply_state.index_file = index_file; I.e. we leak the strvec_push() of the "apply" literal, but for the rest we append the "state->git_apply_opts.v", which we already free() elsewhere. So actually, aren't we really just fumbling our way towards the "nodup" interface that the "struct string_list" has, and we should add the same to the "struct strvec". This seems to also fix all the leaks you fixed, but without any of the copying: diff --git a/builtin/am.c b/builtin/am.c index 30c9b3a9cd7..691ec1d152d 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail) static int run_apply(const struct am_state *state, const char *index_file) { struct strvec apply_paths = STRVEC_INIT; - struct strvec apply_opts = STRVEC_INIT; + struct strvec apply_opts = STRVEC_INIT_NODUP; struct apply_state apply_state; int res, opts_left; int force_apply = 0; diff --git a/strvec.c b/strvec.c index 61a76ce6cb9..efdc47a5854 100644 --- a/strvec.c +++ b/strvec.c @@ -22,7 +22,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value) const char *strvec_push(struct strvec *array, const char *value) { - strvec_push_nodup(array, xstrdup(value)); + const char *to_push = array->nodup_strings ? value : xstrdup(value); + + strvec_push_nodup(array, to_push); return array->v[array->nr - 1]; } @@ -31,6 +33,9 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...) va_list ap; struct strbuf v = STRBUF_INIT; + if (array->nodup_strings) + BUG("cannot strvec_pushf() on a 'nodup' strvec"); + va_start(ap, fmt); strbuf_vaddf(&v, fmt, ap); va_end(ap); @@ -67,6 +72,9 @@ void strvec_pop(struct strvec *array) void strvec_split(struct strvec *array, const char *to_split) { + if (array->nodup_strings) + BUG("cannot strvec_split() on a 'nodup' strvec"); + while (isspace(*to_split)) to_split++; for (;;) { @@ -90,7 +98,7 @@ void strvec_clear(struct strvec *array) if (array->v != empty_strvec) { int i; for (i = 0; i < array->nr; i++) - free((char *)array->v[i]); + free(array->nodup_strings ? NULL : (char *)array->v[i]); free(array->v); } strvec_init(array); diff --git a/strvec.h b/strvec.h index 9f55c8766ba..ac6e2c04cea 100644 --- a/strvec.h +++ b/strvec.h @@ -31,12 +31,18 @@ struct strvec { const char **v; size_t nr; size_t alloc; + unsigned int nodup_strings:1; }; #define STRVEC_INIT { \ .v = empty_strvec, \ } +#define STRVEC_INIT_NODUP { \ + .v = empty_strvec, \ + .nodup_strings = 1, \ +} + /** * Initialize an array. This is no different than assigning from * `STRVEC_INIT`. In any case, for this change (and leak fixes in general), could you please also mark the now-passing leak-free tests. This fails after your patch, but passes before. The failure is a "good" one, as they're now leak-free, but should be marked as such: GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true \ make SANITIZE=leak test T="t4256-am-format-flowed.sh t0023-crlf-am.sh t4254-am-corrupt.sh t4257-am-interactive.sh t5403-post-checkout-hook.sh t4152-am-subjects.sh" > If we have to change parse_options() at all then I'd prefer it to not > free() anything (to keep it usable with main()'s parameters),... I'm suggesting passing a flag to it to have it free() things it NULL's. Maybe that's a bad idea, but I don't see the incompatibility with main(), for those we just wouldn't pass the flag. It does have the inherent limitation that you couldn't mix strvec and non-strvec parameters, i.e. if some of your elements are dup'd but others aren't you'd need to arrange to have them all dup'd. But I don't think that's an issue in practice, either we pass the fully dup'd version, or main() parameters. > but to > reorder in a non-destructive way. That would mean keeping the NULL > sentinel where it is, and making sure all callers use only the returned > argc to determine which arguments parse_options() didn't recognize. I think this would work if parse_options() wasn't clobbering those elements in some cases, and replacing them with new ones, but doesn't it do that e.g. in parse_options_step()? It also seems like a much more fragile change to try to ensure that nothing uses the "argv" again, but only looks at the updated "argc". Both that & adding a "const" to it would be a huge change across the codebase, so I'd like to avoid them, but at least the "const" method would be helped along by the compiler.
Am 14.12.22 um 09:44 schrieb Ævar Arnfjörð Bjarmason: > > On Tue, Dec 13 2022, René Scharfe wrote: > >> Am 13.12.22 um 09:37 schrieb Ævar Arnfjörð Bjarmason: >>> >>> On Tue, Dec 13 2022, René Scharfe wrote: >>> >>>> apply_parse_options() passes the array of argument strings to >>>> parse_options(), which removes recognized options. The removed strings >>>> are not freed, though. >>>> >>>> Make a copy of the strvec to pass to the function to retain the pointers >>>> of its strings, so we release them all at the end. >>>> >>>> Signed-off-by: René Scharfe <l.s.r@web.de> >>>> --- >>>> builtin/am.c | 12 +++++++++++- >>>> 1 file changed, 11 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/builtin/am.c b/builtin/am.c >>>> index 30c9b3a9cd..dddf1b9af0 100644 >>>> --- a/builtin/am.c >>>> +++ b/builtin/am.c >>>> @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file) >>>> int res, opts_left; >>>> int force_apply = 0; >>>> int options = 0; >>>> + const char **apply_argv; >>>> >>>> if (init_apply_state(&apply_state, the_repository, NULL)) >>>> BUG("init_apply_state() failed"); >>>> @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file) >>>> strvec_push(&apply_opts, "apply"); >>>> strvec_pushv(&apply_opts, state->git_apply_opts.v); >>>> >>>> - opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, >>>> + /* >>>> + * Build a copy that apply_parse_options() can rearrange. >>>> + * apply_opts.v keeps referencing the allocated strings for >>>> + * strvec_clear() to release. >>>> + */ >>>> + ALLOC_ARRAY(apply_argv, apply_opts.nr); >>>> + COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr); >>>> + >>>> + opts_left = apply_parse_options(apply_opts.nr, apply_argv, >>>> &apply_state, &force_apply, &options, >>>> NULL); >>>> >>>> @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file) >>>> strvec_clear(&apply_paths); >>>> strvec_clear(&apply_opts); >>>> clear_apply_state(&apply_state); >>>> + free(apply_argv); >>>> >>>> if (res) >>>> return res; >>> >>> I don't mind this going in, but it really feels like a bit of a dirty >>> hack. What's so dirty about it, by the way? >>> >>> We have widespread leaks all over the place due to this >>> idiom. I.e. parse_options() and a couple of other APIs expect that they >>> can munge the "argv", which is fine if it arrives via main(), but not if >>> we're editing our own strvecs. >> >> Where? A quick "git grep 'parse_options.*nr'" turns up only this place >> as one that passes a strvec to parse_options. > > Sorry about the vagueness, this was from memory and I didn't have a full > list handy. > > First, that grep isn't going to give you what you want, since it only > catches cases where we're passing the "strvec" to the "parse_options()" > directly. > > But if you look at e.g. how cmd_stash() invokes push_stash() or > cmd_describe() invoking cmd_name_rev() you'll see callers where we're > potentially losing the strvec due to parse_options(). > > "Potentially" because in both those cases we call that API, but in the > latter case we've got a missing strvec_clear(), so maybe that's all > that's needed (it doesn't always munge the argv). OK. >>> I think less of a hack is to teach the eventual parse_options() that >>> when it munges it it should free() it. I did that for the revisions API >>> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that >>> need free()-ing, 2022-08-02). >>> >>> What do you think? >> >> Generating string lists and then parsing them is weird. When calls have >> to cross a process boundary then we have no choice, but in-process we >> shouldn't have to lower our request to an intermediate text format. git >> am does it anyway because it writes its options to a file and reads them >> back when it resumes with --continue, IIUC. > > I agree, but making all those use a nicer API would be a much bigger > change. Indeed. >> I hope that is and will be the only place that uses parse_options() with >> a strvec -- and then we don't have to change that function. >> >> If this pattern is used more widely then we could package the copying >> done by this patch somehow, e.g. by adding a strvec_parse_options() >> that wraps the real thing. > > The reason I'm encouraging you to look at some of the other strvec leaks > is to see if you can think of a pattern that fits these various leaky > callers. > > So e.g. a strvec_parse_options() won't work for the ones noted above > where we've lost track of it being a "struct strvec" in the first place. Right. It would work if we used a strvec in place of *every* argc/argv pair, but that seems a but heavy a change. > I have a local version somewhere where I did (pseudocode): > > struct strvec vec = STRVEC_INIT; > struct string_list to_free = STRING_LIST_INIT_DUP; > > // populate vec... > for (size_t i = 0; i < vec.nr; i++) > string_list_append_nodup(&to_free, v[i]): > // call parse_options(), or otherwise munge "vec"... > free(strvec_detach(&vec)); > string_list_clear(&to_free, 0); > > I.e. you snapshot the members of the "vec" before it's munged, and > free() those via a "struct string_list". > > Then you don't strvec_clear(), but instead free() the container, its > members being free'd by the string_list. So the same as my patch, just with a string_list instead of a string array and ownership juggling. > Here's a (not yet in-tree) patch that uses that: > https://lore.kernel.org/git/patch-10.10-81f138e460c-20221017T115544Z-avarab@gmail.com/ > > I think that's more reliable & general than the pattern you're adding > here. How so? > In your version the only reason we're not segfaulting is because the > parse_options() consumes all the arguments, but that's not something you > can generally rely on with parse_options(). Could you please elaborate? What do you mean with "consume"? Or could you give an example of a segfault do to partial consumption? I see one caveat: We should keep a NULL sentinel at the end when we hand the copy to arbitrary functions, as some of them don't even look at argc. Is that what you meant? > Also, and maybe I'm missing something, but wouldn't this be functionally > the same as your implementation: > > diff --git a/builtin/am.c b/builtin/am.c > index 30c9b3a9cd7..e65262cbb21 100644 > --- a/builtin/am.c > +++ b/builtin/am.c > @@ -1476,11 +1476,13 @@ static int run_apply(const struct am_state *state, const char *index_file) > int res, opts_left; > int force_apply = 0; > int options = 0; > + char *to_free; > > if (init_apply_state(&apply_state, the_repository, NULL)) > BUG("init_apply_state() failed"); > > strvec_push(&apply_opts, "apply"); > + to_free = (char *)apply_opts.v[0]; > strvec_pushv(&apply_opts, state->git_apply_opts.v); > > opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, > @@ -1489,6 +1491,7 @@ static int run_apply(const struct am_state *state, const char *index_file) > > if (opts_left != 0) > die("unknown option passed through to git apply"); > + free(to_free); > > if (index_file) { > apply_state.index_file = index_file; > > I.e. we leak the strvec_push() of the "apply" literal, but for the rest > we append the "state->git_apply_opts.v", which we already free() > elsewhere. state->git_apply_opts is released elsewhere, but the copies of its elements in apply_opts are leaked here. > So actually, aren't we really just fumbling our way towards the "nodup" > interface that the "struct string_list" has, and we should add the same > to the "struct strvec". > > This seems to also fix all the leaks you fixed, but without any of the > copying: > > diff --git a/builtin/am.c b/builtin/am.c > index 30c9b3a9cd7..691ec1d152d 100644 > --- a/builtin/am.c > +++ b/builtin/am.c > @@ -1471,7 +1471,7 @@ static int parse_mail_rebase(struct am_state *state, const char *mail) > static int run_apply(const struct am_state *state, const char *index_file) > { > struct strvec apply_paths = STRVEC_INIT; > - struct strvec apply_opts = STRVEC_INIT; > + struct strvec apply_opts = STRVEC_INIT_NODUP; > struct apply_state apply_state; > int res, opts_left; > int force_apply = 0; > diff --git a/strvec.c b/strvec.c > index 61a76ce6cb9..efdc47a5854 100644 > --- a/strvec.c > +++ b/strvec.c > @@ -22,7 +22,9 @@ static void strvec_push_nodup(struct strvec *array, const char *value) > > const char *strvec_push(struct strvec *array, const char *value) > { > - strvec_push_nodup(array, xstrdup(value)); > + const char *to_push = array->nodup_strings ? value : xstrdup(value); > + > + strvec_push_nodup(array, to_push); > return array->v[array->nr - 1]; > } > > @@ -31,6 +33,9 @@ const char *strvec_pushf(struct strvec *array, const char *fmt, ...) > va_list ap; > struct strbuf v = STRBUF_INIT; > > + if (array->nodup_strings) > + BUG("cannot strvec_pushf() on a 'nodup' strvec"); > + > va_start(ap, fmt); > strbuf_vaddf(&v, fmt, ap); > va_end(ap); > @@ -67,6 +72,9 @@ void strvec_pop(struct strvec *array) > > void strvec_split(struct strvec *array, const char *to_split) > { > + if (array->nodup_strings) > + BUG("cannot strvec_split() on a 'nodup' strvec"); > + > while (isspace(*to_split)) > to_split++; > for (;;) { > @@ -90,7 +98,7 @@ void strvec_clear(struct strvec *array) > if (array->v != empty_strvec) { > int i; > for (i = 0; i < array->nr; i++) > - free((char *)array->v[i]); > + free(array->nodup_strings ? NULL : (char *)array->v[i]); > free(array->v); > } > strvec_init(array); > diff --git a/strvec.h b/strvec.h > index 9f55c8766ba..ac6e2c04cea 100644 > --- a/strvec.h > +++ b/strvec.h > @@ -31,12 +31,18 @@ struct strvec { > const char **v; > size_t nr; > size_t alloc; > + unsigned int nodup_strings:1; > }; > > #define STRVEC_INIT { \ > .v = empty_strvec, \ > } > > +#define STRVEC_INIT_NODUP { \ > + .v = empty_strvec, \ > + .nodup_strings = 1, \ > +} > + > /** > * Initialize an array. This is no different than assigning from > * `STRVEC_INIT`. We can do it, but it's quite complicated overall. strvec is with its "duplicate everything" stance is easy to understand and reason about. Adding a nodup variant will remove that feature. > In any case, for this change (and leak fixes in general), could you > please also mark the now-passing leak-free tests. This fails after your > patch, but passes before. > > The failure is a "good" one, as they're now leak-free, but should be > marked as such: > > GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_SANITIZE_LEAK_LOG=true \ > make SANITIZE=leak test T="t4256-am-format-flowed.sh t0023-crlf-am.sh t4254-am-corrupt.sh t4257-am-interactive.sh t5403-post-checkout-hook.sh t4152-am-subjects.sh" > Well, it might be my laziness talking, but can we avoid that? There are lots of leaks present in tests and plugging the last one for a particular test script holds no particular significance. Wouldn't it be better to update the annotations at the end of the release cycle, similar to translations? Or automate it via some kind of CI job? In any case: Requiring to run the test suite with LeakSanitizer for leak fixes would be a hurdle that I'd jump only reluctantly. (LeakSanitizer is not supported on my development system, an M1 Mac.) >> If we have to change parse_options() at all then I'd prefer it to not >> free() anything (to keep it usable with main()'s parameters),... > > I'm suggesting passing a flag to it to have it free() things it > NULL's. > > Maybe that's a bad idea, but I don't see the incompatibility with > main(), for those we just wouldn't pass the flag. > > It does have the inherent limitation that you couldn't mix strvec and > non-strvec parameters, i.e. if some of your elements are dup'd but > others aren't you'd need to arrange to have them all dup'd. > > But I don't think that's an issue in practice, either we pass the fully > dup'd version, or main() parameters. Yes, but your examples at the top show that sometimes the parse_options() call is hidden by several layers of other functions. We better have a more general solution. >> but to >> reorder in a non-destructive way. That would mean keeping the NULL >> sentinel where it is, and making sure all callers use only the returned >> argc to determine which arguments parse_options() didn't recognize. > > I think this would work if parse_options() wasn't clobbering those > elements in some cases, and replacing them with new ones, but doesn't it > do that e.g. in parse_options_step()? You mean the "fake a short option thing", right? That would have to be addressed somehow. > It also seems like a much more fragile change to try to ensure that > nothing uses the "argv" again, but only looks at the updated > "argc". > > Both that & adding a "const" to it would be a huge change across the > codebase, so I'd like to avoid them, but at least the "const" method > would be helped along by the compiler. Agreed. René
On Tue, Dec 13, 2022 at 07:31:13PM +0100, René Scharfe wrote: > > I think less of a hack is to teach the eventual parse_options() that > > when it munges it it should free() it. I did that for the revisions API > > in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that > > need free()-ing, 2022-08-02). > > > > What do you think? > > Generating string lists and then parsing them is weird. When calls have > to cross a process boundary then we have no choice, but in-process we > shouldn't have to lower our request to an intermediate text format. git > am does it anyway because it writes its options to a file and reads them > back when it resumes with --continue, IIUC. The argument has been made[1] in the past that the public API for the revision machinery is not poking bits in the rev_info struct yourself, but passing the appropriate options to setup_revisions(). And I can see the point; if option "--foo" requires twiddling both revs.foo and revs.bar, then we have no way to enforce that callers have remembered to do both. But if the only code which handles this is the parser for "--foo", then we're OK. In a non-C language, we'd probably mark "foo" and "bar" as private and provide a method to enable the option. We could provide a function, but we'd have no compiler help to ensure that it was used over fiddling the bits. Possibly calling them "foo_private" would be sufficient to make people think twice about tweaking them, though. [1] Apologies for the passive voice; I didn't want to muddle the paragraph by discussing who and when. But I know this is an argument Junio has made; I don't know if he still stands by it these days (there is quite a lot of field-twiddling of rev_info by now). I'm not sure to what degree I agree with it myself, but I thought it was worth mentioning here. > If we have to change parse_options() at all then I'd prefer it to not > free() anything (to keep it usable with main()'s parameters), but to > reorder in a non-destructive way. That would mean keeping the NULL > sentinel where it is, and making sure all callers use only the returned > argc to determine which arguments parse_options() didn't recognize. My findings with -Wunused-parameter show that there are quite a lot of places that take argv/argc but assume the NULL-termination of the argv. If we are just re-ordering argv, though, it feels like this could still work with a strvec. Right now a strvec with "nr" items will free items 0 through nr-1, assuming that v[nr] is its NULL invariant. But it could free v[nr], too, in case the NULL was swapped into an earlier position. It's a little weird already, because that swap has violated the invariant, so trying to strvec_push() onto it would cause confusing results. But if the general use case is to pass the strvec to parse_options(), get reordered, and then clear() it, it should work. If you wanted to get really fancy, push() et al could double-check the invariant and BUG(). -Peff
Am 17.12.22 um 14:24 schrieb Jeff King: > On Tue, Dec 13, 2022 at 07:31:13PM +0100, René Scharfe wrote: > >>> I think less of a hack is to teach the eventual parse_options() that >>> when it munges it it should free() it. I did that for the revisions API >>> in f92dbdbc6a8 (revisions API: don't leak memory on argv elements that >>> need free()-ing, 2022-08-02). >>> >>> What do you think? >> >> Generating string lists and then parsing them is weird. When calls have >> to cross a process boundary then we have no choice, but in-process we >> shouldn't have to lower our request to an intermediate text format. git >> am does it anyway because it writes its options to a file and reads them >> back when it resumes with --continue, IIUC. > > The argument has been made[1] in the past that the public API for the > revision machinery is not poking bits in the rev_info struct yourself, > but passing the appropriate options to setup_revisions(). My point was that we are stuck with the way it's done in git am specifically because of its on-disk parameter store. It forces us to be able to handle a parameter list that we allocate ourselves instead of being given one by the OS like argc/argv in main(). And this justifies a bespoke solution instead of changing parse_options(). In the meantime Ævar showed enough examples to convince me that a more general solution is indeed needed, but I still think we should keep parse_options() unchanged and add something like strvec_clone_nodup() instead. Regarding using textual interfaces internally in general: It's the easiest method, as we have to provide it for users anyway. It's not simple, though -- generating strings and managing their ownership adds overhead like this patch, which we wouldn't have in an struct interface. > And I can see the point; if option "--foo" requires twiddling both > revs.foo and revs.bar, then we have no way to enforce that callers have > remembered to do both. But if the only code which handles this is the > parser for "--foo", then we're OK. Depends on the specifics. Perhaps .bar is redundant and could be inferred from .foo. Or they could be replaced by an enum. > In a non-C language, we'd probably mark "foo" and "bar" as private and > provide a method to enable the option. We could provide a function, but > we'd have no compiler help to ensure that it was used over fiddling the > bits. Possibly calling them "foo_private" would be sufficient to make > people think twice about tweaking them, though. Or normalize the struct to avoid dependencies between fields. > [1] Apologies for the passive voice; I didn't want to muddle the > paragraph by discussing who and when. But I know this is an argument > Junio has made; I don't know if he still stands by it these days > (there is quite a lot of field-twiddling of rev_info by now). I'm > not sure to what degree I agree with it myself, but I thought it was > worth mentioning here. > >> If we have to change parse_options() at all then I'd prefer it to not >> free() anything (to keep it usable with main()'s parameters), but to >> reorder in a non-destructive way. That would mean keeping the NULL >> sentinel where it is, and making sure all callers use only the returned >> argc to determine which arguments parse_options() didn't recognize. > > My findings with -Wunused-parameter show that there are quite a lot of > places that take argv/argc but assume the NULL-termination of the argv. Right. > If we are just re-ordering argv, though, it feels like this could still > work with a strvec. Right now a strvec with "nr" items will free items 0 > through nr-1, assuming that v[nr] is its NULL invariant. But it could > free v[nr], too, in case the NULL was swapped into an earlier position. > > It's a little weird already, because that swap has violated the > invariant, so trying to strvec_push() onto it would cause confusing > results. But if the general use case is to pass the strvec to > parse_options(), get reordered, and then clear() it, it should work. If > you wanted to get really fancy, push() et al could double-check the > invariant and BUG(). Yes, parse_options() and strvec are not fitting perfectly. Changing the former to reorder the elements and keeping them all would require checking that all callers use the return value. Feels like a lot of work. A variant that takes a strvec and removes and frees parsed items from it would be clean, but would require refactoring indirect callers like apply_parse_options() users. Busywork. Making a shallow copy to give to parse_options() in callers that currently pass a strvec directly or indirectly seems like the simplest solution to me for now. René
On Sat, Dec 17, 2022 at 05:07:12PM +0100, René Scharfe wrote: > > If we are just re-ordering argv, though, it feels like this could still > > work with a strvec. Right now a strvec with "nr" items will free items 0 > > through nr-1, assuming that v[nr] is its NULL invariant. But it could > > free v[nr], too, in case the NULL was swapped into an earlier position. > > > > It's a little weird already, because that swap has violated the > > invariant, so trying to strvec_push() onto it would cause confusing > > results. But if the general use case is to pass the strvec to > > parse_options(), get reordered, and then clear() it, it should work. If > > you wanted to get really fancy, push() et al could double-check the > > invariant and BUG(). > > Yes, parse_options() and strvec are not fitting perfectly. Changing the > former to reorder the elements and keeping them all would require > checking that all callers use the return value. Feels like a lot of work. I think we're already munging the strvec arrays in the option parser, though. I'm just suggesting that parse_options() swap arguments to the end instead of overwriting a NULL (actually, I'm not even sure it doesn't do that already), and strvec_clear() checking the final element. The end state is not necessarily safe, but it's no worse than what we have today. That said... > A variant that takes a strvec and removes and frees parsed items from it > would be clean, but would require refactoring indirect callers like > apply_parse_options() users. Busywork. > > Making a shallow copy to give to parse_options() in callers that currently > pass a strvec directly or indirectly seems like the simplest solution to > me for now. Yes, I thought your original patch actually got to the root of the problem. strvec owns the array and its elements, and parse-options wants to munge the array itself (but not the elements). Making a shallow copy is eliminates the conflict over ownership. -Peff
Jeff King <peff@peff.net> writes: >> Making a shallow copy to give to parse_options() in callers that currently >> pass a strvec directly or indirectly seems like the simplest solution to >> me for now. > > Yes, I thought your original patch actually got to the root of the > problem. strvec owns the array and its elements, and parse-options wants > to munge the array itself (but not the elements). Making a shallow copy > is eliminates the conflict over ownership. Yup, it did look very sensible to me.
René Scharfe <l.s.r@web.de> writes: > Depends on the specifics. Perhaps .bar is redundant and could be > inferred from .foo. Or they could be replaced by an enum. > ... > Or normalize the struct to avoid dependencies between fields. I think the example Peff brought up was mostly from the revisions API, where there are two vastly different traversal behaviour (i.e. limited and !limited). With .limited bit set, we first fully enumerate the potential commits, before showing a single entry in the output, and without, we can stream the output. In general, we prefer !limited traversal because that would give us better interactive latency, but some options only can work with the limited bit set (e.g. --ancestry-path, --cherry). We _could_ probably have do without the .limited bit and instead wrote each and every check we currently make to the revs->limited bit as something like #define is_limited(revs) (revs->cherry_mark || \ revs->ancestry_path || \ ...) so that the "struct members" API users do not have to know that they have to set the .limit bit when they set .cherry_mark bit. There are many others even in the revisions API alone. IOW, yes, this depends on the specifics, and "normalize" is easier said than done, unfortunately.
diff --git a/builtin/am.c b/builtin/am.c index 30c9b3a9cd..dddf1b9af0 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -1476,6 +1476,7 @@ static int run_apply(const struct am_state *state, const char *index_file) int res, opts_left; int force_apply = 0; int options = 0; + const char **apply_argv; if (init_apply_state(&apply_state, the_repository, NULL)) BUG("init_apply_state() failed"); @@ -1483,7 +1484,15 @@ static int run_apply(const struct am_state *state, const char *index_file) strvec_push(&apply_opts, "apply"); strvec_pushv(&apply_opts, state->git_apply_opts.v); - opts_left = apply_parse_options(apply_opts.nr, apply_opts.v, + /* + * Build a copy that apply_parse_options() can rearrange. + * apply_opts.v keeps referencing the allocated strings for + * strvec_clear() to release. + */ + ALLOC_ARRAY(apply_argv, apply_opts.nr); + COPY_ARRAY(apply_argv, apply_opts.v, apply_opts.nr); + + opts_left = apply_parse_options(apply_opts.nr, apply_argv, &apply_state, &force_apply, &options, NULL); @@ -1513,6 +1522,7 @@ static int run_apply(const struct am_state *state, const char *index_file) strvec_clear(&apply_paths); strvec_clear(&apply_opts); clear_apply_state(&apply_state); + free(apply_argv); if (res) return res;
apply_parse_options() passes the array of argument strings to parse_options(), which removes recognized options. The removed strings are not freed, though. Make a copy of the strvec to pass to the function to retain the pointers of its strings, so we release them all at the end. Signed-off-by: René Scharfe <l.s.r@web.de> --- builtin/am.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) -- 2.38.2