mbox series

[v2,00/24] Memory leak fixes (pt.3)

Message ID cover.1722499961.git.ps@pks.im (mailing list archive)
Headers show
Series Memory leak fixes (pt.3) | expand

Message

Patrick Steinhardt Aug. 1, 2024, 10:38 a.m. UTC
Hi,

this is the second version of my patch series that plugs another round
of memory leaks in Git. Changes compared to v2:

  - Various typo fixes.

  - Adopted a patch by Rubén that plugs a git-submodule--helper(1) leak
    in a neater way by using an integer to track the clone depth instead
    of using a (possibly allocated) string.

  - Refactored git-ls-remote(1) to use a `struct strvec` to track
    patterns instead of manually populating an array, which makes the
    code easier to reason about while still plugging the memory leak.
    This was suggested by Taylor.

Thanks for all the reviews so far!

Patrick

Patrick Steinhardt (23):
  builtin/replay: plug leaking `advance_name` variable
  builtin/log: fix leaking branch name when creating cover letters
  builtin/describe: fix memory leak with `--contains=`
  builtin/describe: fix leaking array when running diff-index
  builtin/describe: fix trivial memory leak when describing blob
  builtin/name-rev: fix various trivial memory leaks
  builtin/submodule--helper: fix leaking buffer in `is_tip_reachable`
  builtin/ls-remote: fix leaking `pattern` strings
  builtin/remote: fix leaking strings in `branch_list`
  builtin/remote: fix various trivial memory leaks
  builtin/stash: fix various trivial memory leaks
  builtin/rev-parse: fix memory leak with `--parseopt`
  builtin/show-branch: fix several memory leaks
  builtin/credential-store: fix leaking credential
  builtin/rerere: fix various trivial memory leaks
  builtin/shortlog: fix various trivial memory leaks
  builtin/worktree: fix leaking derived branch names
  builtin/credential-cache: fix trivial leaks
  t/test-repository: fix leaking repository
  object-name: fix leaking commit list items
  entry: fix leaking pathnames during delayed checkout
  convert: fix leaking config strings
  commit-reach: fix trivial memory leak when computing reachability

Rubén Justico (1):
  builtin/submodule--helper: fix leaking clone depth parameter

 builtin/credential-cache.c              |  9 ++++-
 builtin/credential-store.c              |  1 +
 builtin/describe.c                      | 25 ++++++++++--
 builtin/log.c                           |  4 +-
 builtin/ls-remote.c                     | 24 +++++-------
 builtin/name-rev.c                      |  6 ++-
 builtin/remote.c                        | 44 ++++++++++++++++-----
 builtin/replay.c                        | 20 +++++++---
 builtin/rerere.c                        |  8 +++-
 builtin/rev-parse.c                     |  5 ++-
 builtin/shortlog.c                      |  1 +
 builtin/show-branch.c                   | 52 +++++++++++++++++--------
 builtin/stash.c                         | 18 ++++++++-
 builtin/submodule--helper.c             | 20 ++++++----
 builtin/worktree.c                      |  7 ++--
 commit-reach.c                          |  1 +
 convert.c                               | 14 +++++--
 entry.c                                 |  4 +-
 object-name.c                           | 26 ++++++++-----
 rerere.c                                |  9 ++++-
 t/helper/test-repository.c              |  4 +-
 t/t0021-conversion.sh                   |  1 +
 t/t0301-credential-cache.sh             |  2 +
 t/t0302-credential-store.sh             |  2 +
 t/t0303-credential-external.sh          |  1 +
 t/t1502-rev-parse-parseopt.sh           |  2 +
 t/t1511-rev-parse-caret.sh              |  1 +
 t/t2030-unresolve-info.sh               |  1 +
 t/t2080-parallel-checkout-basics.sh     |  1 +
 t/t2082-parallel-checkout-attributes.sh |  1 +
 t/t2400-worktree-add.sh                 |  1 +
 t/t2501-cwd-empty.sh                    |  1 +
 t/t3201-branch-contains.sh              |  1 +
 t/t3202-show-branch.sh                  |  1 +
 t/t3206-range-diff.sh                   |  1 +
 t/t3650-replay-basics.sh                |  1 +
 t/t3903-stash.sh                        |  1 +
 t/t3904-stash-patch.sh                  |  2 +
 t/t3905-stash-include-untracked.sh      |  1 +
 t/t4200-rerere.sh                       |  1 +
 t/t4201-shortlog.sh                     |  1 +
 t/t5318-commit-graph.sh                 |  2 +
 t/t5512-ls-remote.sh                    |  1 +
 t/t5514-fetch-multiple.sh               |  1 +
 t/t5520-pull.sh                         |  1 +
 t/t5528-push-default.sh                 |  1 +
 t/t5535-fetch-push-symref.sh            |  1 +
 t/t5543-atomic-push.sh                  |  1 +
 t/t5570-git-daemon.sh                   |  1 +
 t/t6007-rev-list-cherry-pick-file.sh    |  1 +
 t/t6010-merge-base.sh                   |  1 +
 t/t6120-describe.sh                     |  1 +
 t/t6133-pathspec-rev-dwim.sh            |  2 +
 t/t7064-wtstatus-pv2.sh                 |  1 +
 t/t7400-submodule-basic.sh              |  1 +
 t/t9902-completion.sh                   |  1 +
 t/t9903-bash-prompt.sh                  |  1 +
 57 files changed, 256 insertions(+), 88 deletions(-)

Range-diff against v1:
 1:  dd044eacc2 =  1:  b5c8a9495a builtin/replay: plug leaking `advance_name` variable
 2:  4daae88877 !  2:  73a16fff43 builtin/log: fix leaking branch name when creating cover letters
    @@ Metadata
      ## Commit message ##
         builtin/log: fix leaking branch name when creating cover letters
     
    -    When calling `make_cover_letter()` without a branch name, then we try to
    +    When calling `make_cover_letter()` without a branch name, we try to
         derive the branch name by calling `find_branch_name()`. But while this
         function returns an allocated string, we never free the result and thus
         have a memory leak. Fix this.
 -:  ---------- >  3:  65027d13b7 builtin/describe: fix memory leak with `--contains=`
 4:  e8d86c01cf !  4:  19ca97e33a builtin/describe: fix leaking array when running diff-index
    @@ Commit message
         builtin/describe: fix leaking array when running diff-index
     
         When running git-describe(1) with `--dirty`, we will set up a `struct
    -    rev_info` with arguments for git-diff-index(1). The way we assmble the
    +    rev_info` with arguments for git-diff-index(1). The way we assemble the
         arguments it causes two memory leaks though:
     
           - We never release the `struct strvec`.
 5:  8aba7434bd =  5:  6deacb8667 builtin/describe: fix trivial memory leak when describing blob
 6:  088f730572 =  6:  e38e9d1b62 builtin/name-rev: fix various trivial memory leaks
 3:  08a12be13c !  7:  d3eb0372bd builtin/describe: fix memory leak with `--contains=`
    @@
      ## Metadata ##
    -Author: Patrick Steinhardt <ps@pks.im>
    +Author: Rubén Justico <rjusto@gmail.com>
     
      ## Commit message ##
    -    builtin/describe: fix memory leak with `--contains=`
    +    builtin/submodule--helper: fix leaking clone depth parameter
     
    -    When calling `git describe --contains=`, we end up invoking
    -    `cmd_name_rev()` with some munged argv array. This array may contain
    -    allocated strings and furthermore will likely be modified by the called
    -    function. This results in two memory leaks:
    +    The submodule helper supports a `--depth` parameter for both its "add"
    +    and "clone" subcommands, which in both cases end up being forwarded to
    +    git-clone(1). But while the former subcommand uses an `OPT_INTEGER()` to
    +    parse the depth, the latter uses `OPT_STRING()`. Consequently, it is
    +    possible to pass non-integer input to "--depth" when calling the "clone"
    +    subcommand, where the value will then ultimately cause git-clone(1) to
    +    bail out.
     
    -      - First, we leak the array that we use to assemble the arguments.
    +    Besides the fact that the parameter verification should happen earlier,
    +    the submodule helper infrastructure also internally tracks the depth via
    +    a string. This requires us to convert the integer in the "add"
    +    subcommand into an allocated string, and this string ultimately leaks.
     
    -      - Second, we leak the allocated strings that we may have put into the
    -        array.
    -
    -    Fix those leaks by creating a separate copy of the array that we can
    -    hand over to `cmd_name_rev()`. This allows us to free all strings
    -    contained in the `strvec`, as the original vector will not be modified
    -    anymore.
    -
    -    Furthermore, free both the `strvec` and the copied array to fix the
    -    first memory leak.
    +    Refactor the code to consistently track the clone depth as an integer.
    +    This plugs the memory leak, simplifies the code and allows us to use
    +    `OPT_INTEGER()` instead of `OPT_STRING()`, validating the input before
    +    we shell out to git--clone(1).
     
    +    Original-patch-by: Rubén Justo <rjusto@gmail.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    - ## builtin/describe.c ##
    -@@ builtin/describe.c: int cmd_describe(int argc, const char **argv, const char *prefix)
    - 	if (contains) {
    - 		struct string_list_item *item;
    - 		struct strvec args;
    -+		const char **argv_copy;
    -+		int ret;
    + ## builtin/submodule--helper.c ##
    +@@ builtin/submodule--helper.c: struct module_clone_data {
    + 	const char *path;
    + 	const char *name;
    + 	const char *url;
    +-	const char *depth;
    ++	int depth;
    + 	struct list_objects_filter_options *filter_options;
    + 	unsigned int quiet: 1;
    + 	unsigned int progress: 1;
    +@@ builtin/submodule--helper.c: static int clone_submodule(const struct module_clone_data *clone_data,
    + 			strvec_push(&cp.args, "--quiet");
    + 		if (clone_data->progress)
    + 			strvec_push(&cp.args, "--progress");
    +-		if (clone_data->depth && *(clone_data->depth))
    +-			strvec_pushl(&cp.args, "--depth", clone_data->depth, NULL);
    ++		if (clone_data->depth > 0)
    ++			strvec_pushf(&cp.args, "--depth=%d", clone_data->depth);
    + 		if (reference->nr) {
    + 			struct string_list_item *item;
      
    - 		strvec_init(&args);
    - 		strvec_pushl(&args, "name-rev",
    -@@ builtin/describe.c: int cmd_describe(int argc, const char **argv, const char *prefix)
    - 			strvec_pushv(&args, argv);
    - 		else
    - 			strvec_push(&args, "HEAD");
    --		return cmd_name_rev(args.nr, args.v, prefix);
    -+
    -+		/*
    -+		 * `cmd_name_rev()` modifies the array, so we'd link its
    -+		 * contained strings if we didn't do a copy here.
    -+		 */
    -+		ALLOC_ARRAY(argv_copy, args.nr + 1);
    -+		for (size_t i = 0; i < args.nr; i++)
    -+			argv_copy[i] = args.v[i];
    -+		argv_copy[args.nr] = NULL;
    -+
    -+		ret = cmd_name_rev(args.nr, argv_copy, prefix);
    -+
    -+		strvec_clear(&args);
    -+		free(argv_copy);
    -+		return ret;
    - 	}
    +@@ builtin/submodule--helper.c: static int module_clone(int argc, const char **argv, const char *prefix)
    + 			   N_("reference repository")),
    + 		OPT_BOOL(0, "dissociate", &dissociate,
    + 			   N_("use --reference only while cloning")),
    +-		OPT_STRING(0, "depth", &clone_data.depth,
    +-			   N_("string"),
    ++		OPT_INTEGER(0, "depth", &clone_data.depth,
    + 			   N_("depth for shallow clones")),
    + 		OPT__QUIET(&quiet, "suppress output for cloning a submodule"),
    + 		OPT_BOOL(0, "progress", &progress,
    +@@ builtin/submodule--helper.c: static int add_submodule(const struct add_data *add_data)
    + 		}
    + 		clone_data.dissociate = add_data->dissociate;
    + 		if (add_data->depth >= 0)
    +-			clone_data.depth = xstrfmt("%d", add_data->depth);
    ++			clone_data.depth = add_data->depth;
      
    - 	hashmap_init(&names, commit_name_neq, NULL, 0);
    + 		if (clone_submodule(&clone_data, &reference))
    + 			goto cleanup;
 7:  5220c91bda !  8:  49c156cebb builtin/submodule--helper: fix various trivial memory leaks
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    builtin/submodule--helper: fix various trivial memory leaks
    +    builtin/submodule--helper: fix leaking buffer in `is_tip_reachable`
     
    -    There are multiple trivial memory leaks in the submodule helper. Fix
    -    those.
    +    The `rev` buffer in `is_tip_reachable()` is being populated with the
    +    output of git-rev-list(1) -- if either the command fails or the buffer
    +    contains any data, then the input commit is not reachable.
    +
    +    The buffer isn't used for anything else, but neither do we free it,
    +    causing a memory leak. Fix this.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    @@ builtin/submodule--helper.c: static int is_tip_reachable(const char *path, const
      }
      
      static int fetch_in_submodule(const char *module_path, int depth, int quiet,
    -@@ builtin/submodule--helper.c: static void append_fetch_remotes(struct strbuf *msg, const char *git_dir_path)
    - static int add_submodule(const struct add_data *add_data)
    - {
    - 	char *submod_gitdir_path;
    -+	char *depth_to_free = NULL;
    - 	struct module_clone_data clone_data = MODULE_CLONE_DATA_INIT;
    - 	struct string_list reference = STRING_LIST_INIT_NODUP;
    - 	int ret = -1;
    -@@ builtin/submodule--helper.c: static int add_submodule(const struct add_data *add_data)
    - 		}
    - 		clone_data.dissociate = add_data->dissociate;
    - 		if (add_data->depth >= 0)
    --			clone_data.depth = xstrfmt("%d", add_data->depth);
    -+			clone_data.depth = depth_to_free = xstrfmt("%d", add_data->depth);
    - 
    - 		if (clone_submodule(&clone_data, &reference))
    - 			goto cleanup;
     @@ builtin/submodule--helper.c: static int add_submodule(const struct add_data *add_data)
      			die(_("unable to checkout submodule '%s'"), add_data->sm_path);
      	}
      	ret = 0;
     +
      cleanup:
    -+	free(depth_to_free);
      	string_list_clear(&reference, 1);
      	return ret;
    - }
     
      ## t/t7400-submodule-basic.sh ##
     @@ t/t7400-submodule-basic.sh: subcommands of git submodule.
 8:  d42152654b !  9:  4330c80905 builtin/ls-remote: fix leaking `pattern` strings
    @@ Commit message
         and prefix them with "*/", but never free either the array nor the
         allocated strings.
     
    -    Fix those leaks.
    +    Refactor the code to use a `struct strvec` instead of manually tracking
    +    the strings in an array. Like this, we can easily use `strvec_clear()`
    +    to release both the vector and the contained string for us, plugging the
    +    leak.
     
    +    Helped-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## builtin/ls-remote.c ##
    +@@ builtin/ls-remote.c: static const char * const ls_remote_usage[] = {
    +  * Is there one among the list of patterns that match the tail part
    +  * of the path?
    +  */
    +-static int tail_match(const char **pattern, const char *path)
    ++static int tail_match(const struct strvec *pattern, const char *path)
    + {
    +-	const char *p;
    + 	char *pathbuf;
    + 
    +-	if (!pattern)
    ++	if (!pattern->nr)
    + 		return 1; /* no restriction */
    + 
    + 	pathbuf = xstrfmt("/%s", path);
    +-	while ((p = *(pattern++)) != NULL) {
    +-		if (!wildmatch(p, pathbuf, 0)) {
    ++	for (size_t i = 0; i < pattern->nr; i++) {
    ++		if (!wildmatch(pattern->v[i], pathbuf, 0)) {
    + 			free(pathbuf);
    + 			return 1;
    + 		}
     @@ builtin/ls-remote.c: int cmd_ls_remote(int argc, const char **argv, const char *prefix)
      	int status = 0;
      	int show_symref_target = 0;
      	const char *uploadpack = NULL;
     -	const char **pattern = NULL;
    -+	char **pattern = NULL;
    ++	struct strvec pattern = STRVEC_INIT;
      	struct transport_ls_refs_options transport_options =
      		TRANSPORT_LS_REFS_OPTIONS_INIT;
      	int i;
     @@ builtin/ls-remote.c: int cmd_ls_remote(int argc, const char **argv, const char *prefix)
    - 	if (argc > 1) {
    - 		int i;
    - 		CALLOC_ARRAY(pattern, argc);
    + 
    + 	packet_trace_identity("ls-remote");
    + 
    +-	if (argc > 1) {
    +-		int i;
    +-		CALLOC_ARRAY(pattern, argc);
     -		for (i = 1; i < argc; i++) {
    -+		for (i = 1; i < argc; i++)
    - 			pattern[i - 1] = xstrfmt("*/%s", argv[i]);
    +-			pattern[i - 1] = xstrfmt("*/%s", argv[i]);
     -		}
    - 	}
    +-	}
    ++	for (int i = 1; i < argc; i++)
    ++		strvec_pushf(&pattern, "*/%s", argv[i]);
      
      	if (flags & REF_TAGS)
    + 		strvec_push(&transport_options.ref_prefixes, "refs/tags/");
     @@ builtin/ls-remote.c: int cmd_ls_remote(int argc, const char **argv, const char *prefix)
      		struct ref_array_item *item;
      		if (!check_ref_type(ref, flags))
      			continue;
     -		if (!tail_match(pattern, ref->name))
    -+		if (!tail_match((const char **) pattern, ref->name))
    ++		if (!tail_match(&pattern, ref->name))
      			continue;
      		item = ref_array_push(&ref_array, ref->name, &ref->old_oid);
      		item->symref = xstrdup_or_null(ref->symref);
    @@ builtin/ls-remote.c: int cmd_ls_remote(int argc, const char **argv, const char *
      		status = 1;
      	transport_ls_refs_options_release(&transport_options);
     +
    -+	for (i = 1; i < argc; i++)
    -+		free(pattern[i - 1]);
    -+	free(pattern);
    ++	strvec_clear(&pattern);
      	return status;
      }
     
 9:  6952fb2ff2 = 10:  2e3f76b428 builtin/remote: fix leaking strings in `branch_list`
10:  8bedcfdad8 = 11:  384203c34b builtin/remote: fix various trivial memory leaks
11:  a4b3ca49c9 = 12:  b7c0671797 builtin/stash: fix various trivial memory leaks
12:  e277222bd2 ! 13:  034c416d46 builtin/rev-parse: fix memory leak with `--parseopt`
    @@ Commit message
         which is of course wrong.
     
         Fix this by freeing all option's help fields as well as their `argh`
    -    fielids to plug this memory leak.
    +    fields to plug this memory leak.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
13:  fd857efa9f = 14:  20a40e2fd4 builtin/show-branch: fix several memory leaks
14:  ba4a883ae1 = 15:  b553f1ffb9 builtin/credential-store: fix leaking credential
15:  6d49645c0f = 16:  76d81c8ae1 builtin/rerere: fix various trivial memory leaks
16:  778e87221a = 17:  d52ac1a550 builtin/shortlog: fix various trivial memory leaks
17:  8a649bf7ed = 18:  dccaf695a9 builtin/worktree: fix leaking derived branch names
18:  2c7a369490 = 19:  788f0bee7b builtin/credential-cache: fix trivial leaks
19:  7564911d2a = 20:  eb58826b14 t/test-repository: fix leaking repository
20:  e3dbf9116b = 21:  fa95a67ccd object-name: fix leaking commit list items
21:  e426fd7ec5 = 22:  8cae4c44f4 entry: fix leaking pathnames during delayed checkout
22:  8c2a19994f = 23:  5d2bf0f0f3 convert: fix leaking config strings
23:  60aef16aef = 24:  6db1829c79 commit-reach: fix trivial memory leak when computing reachability

base-commit: d19b6cd2dd72dc811f19df4b32c7ed223256c3ee

Comments

Taylor Blau Aug. 1, 2024, 5:17 p.m. UTC | #1
On Thu, Aug 01, 2024 at 12:38:06PM +0200, Patrick Steinhardt wrote:
> Range-diff against v1:

All looks reasonable to me. Thanks for incorporating mine and others'
suggestions, and again for working on this!

Thanks,
Taylor