diff mbox series

[v3,2/2] setup: fix memory leaks with `struct repository_format`

Message ID f8b021033b887923662eb9fa63f6df1677ebbbb5.1548186510.git.martin.agren@gmail.com (mailing list archive)
State New, archived
Headers show
Series setup: fix memory leaks with `struct repository_format` | expand

Commit Message

Martin Ågren Jan. 22, 2019, 9:45 p.m. UTC
After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.

Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.

Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. This
means that it is important to initialize the struct before calling
`read_...()`, so document that. Teach `read_...()` to clear the struct
upon an error, so that it is reset to a safe state, and document this.

About that very last point: In `setup_git_directory_gently()`, we look
at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.

We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 cache.h           | 27 ++++++++++++++++++++++++---
 builtin/init-db.c |  3 ++-
 repository.c      |  3 ++-
 setup.c           | 32 ++++++++++++++++++++------------
 worktree.c        |  4 +++-
 5 files changed, 51 insertions(+), 18 deletions(-)

Comments

Jeff King Jan. 23, 2019, 5:57 a.m. UTC | #1
On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote:

> Call `clear_...()` at the start of `read_...()` instead of just zeroing
> the struct, since we sometimes enter the function multiple times. This
> means that it is important to initialize the struct before calling
> `read_...()`, so document that.

This part is a little counter-intuitive to me. Is anybody ever going to
pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT?

If so, might it be kinder for read_...() to not assume anything about
the incoming struct, and initialize it from scratch? I.e., not to use
clear() but just do the initialization step?

A caller which calls read_() multiple times would presumably have an
intervening clear (either their own, or the one done on an error return
from the read function).

Other than that minor nit, I like the overall shape of this.

One interesting tidbit:

> +/*
> + * Always use this to initialize a `struct repository_format`
> + * to a well-defined, default state before calling
> + * `read_repository()`.
> + */
> +#define REPOSITORY_FORMAT_INIT \
> +{ \
> +	.version = -1, \
> +	.is_bare = -1, \
> +	.hash_algo = GIT_HASH_SHA1, \
> +	.unknown_extensions = STRING_LIST_INIT_DUP, \
> +}
> [...]
> +	struct repository_format candidate = REPOSITORY_FORMAT_INIT;

This uses designated initializers, which is a C99-ism, but one we've
used previously and feel confident in. But...

> +void clear_repository_format(struct repository_format *format)
> +{
> +	string_list_clear(&format->unknown_extensions, 0);
> +	free(format->work_tree);
> +	free(format->partial_clone);
> +	*format = (struct repository_format)REPOSITORY_FORMAT_INIT;
> +}

...this uses that expression not as an initializer, but as a compound
literal. That's also C99, but AFAIK it's the first usage in our code
base. I don't know if it will cause problems or not.

The "old" way to do it is:

  struct repository_format foo = REPOSITORY_FORMAT_INIT;
  memcpy(format, &foo, sizeof(foo));

Given how simple it is to fix if it turns out to be a problem, I'm OK
including it as a weather balloon.

-Peff
brian m. carlson Jan. 24, 2019, 12:14 a.m. UTC | #2
On Wed, Jan 23, 2019 at 12:57:05AM -0500, Jeff King wrote:
> This uses designated initializers, which is a C99-ism, but one we've
> used previously and feel confident in. But...
> 
> > +void clear_repository_format(struct repository_format *format)
> > +{
> > +	string_list_clear(&format->unknown_extensions, 0);
> > +	free(format->work_tree);
> > +	free(format->partial_clone);
> > +	*format = (struct repository_format)REPOSITORY_FORMAT_INIT;
> > +}
> 
> ...this uses that expression not as an initializer, but as a compound
> literal. That's also C99, but AFAIK it's the first usage in our code
> base. I don't know if it will cause problems or not.
> 
> The "old" way to do it is:
> 
>   struct repository_format foo = REPOSITORY_FORMAT_INIT;
>   memcpy(format, &foo, sizeof(foo));
> 
> Given how simple it is to fix if it turns out to be a problem, I'm OK
> including it as a weather balloon.

It's my understanding that MSVC doesn't support this construct. If we
care about supporting MSVC, then we need to write it without the
compound literal. MSVC doesn't support any C99 feature that is not also
in C++, unfortunately.
Martin Ågren Jan. 25, 2019, 7:24 p.m. UTC | #3
On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote:
>
> On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote:
>
> > Call `clear_...()` at the start of `read_...()` instead of just zeroing
> > the struct, since we sometimes enter the function multiple times. This
> > means that it is important to initialize the struct before calling
> > `read_...()`, so document that.
>
> This part is a little counter-intuitive to me. Is anybody ever going to
> pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT?

I do update all users in git.git, but yeah, out-of-tree users and
in-flight topics would segfault.

> If so, might it be kinder for read_...() to not assume anything about
> the incoming struct, and initialize it from scratch? I.e., not to use
> clear() but just do the initialization step?

I have some vague memory from going down that route and giving up. Now
that I'm looking at it again, I think we can at least try to do
something. We can make sure that "external" users that call into setup.c
are fine (they'll leak, but won't crash). Out-of-tree users inside
setup.c will still be able to trip on this. I don't have much spare time
over the next few days, but I'll get to this.

Or we could accept that we may leak when we end up calling `read()`
multiple times (I could catch all leaks now, but new ones might sneak in
after that) and come back to this after X months, when we can perhaps
afford to be a bit more aggressive.

I guess we could just rename the struct to have the compiler catch
out-of-tree users...

> A caller which calls read_() multiple times would presumably have an
> intervening clear (either their own, or the one done on an error return
> from the read function).
>
> Other than that minor nit, I like the overall shape of this.

Thank you.

Martin
Martin Ågren Jan. 25, 2019, 7:25 p.m. UTC | #4
On Thu, 24 Jan 2019 at 01:15, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On Wed, Jan 23, 2019 at 12:57:05AM -0500, Jeff King wrote:
> > > +void clear_repository_format(struct repository_format *format)
> > > +{
> > > +   string_list_clear(&format->unknown_extensions, 0);
> > > +   free(format->work_tree);
> > > +   free(format->partial_clone);
> > > +   *format = (struct repository_format)REPOSITORY_FORMAT_INIT;
> > > +}
> >
> > ...this uses that expression not as an initializer, but as a compound
> > literal. That's also C99, but AFAIK it's the first usage in our code
> > base. I don't know if it will cause problems or not.

> > Given how simple it is to fix if it turns out to be a problem, I'm OK
> > including it as a weather balloon.

Thanks for pointing out this potential problem.

> It's my understanding that MSVC doesn't support this construct. If we
> care about supporting MSVC, then we need to write it without the
> compound literal. MSVC doesn't support any C99 feature that is not also
> in C++, unfortunately.

Ok, better play it safe then. Thanks.

Martin
Jeff King Jan. 25, 2019, 7:51 p.m. UTC | #5
On Fri, Jan 25, 2019 at 08:24:35PM +0100, Martin Ågren wrote:

> On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote:
> >
> > On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote:
> >
> > > Call `clear_...()` at the start of `read_...()` instead of just zeroing
> > > the struct, since we sometimes enter the function multiple times. This
> > > means that it is important to initialize the struct before calling
> > > `read_...()`, so document that.
> >
> > This part is a little counter-intuitive to me. Is anybody ever going to
> > pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT?
> 
> I do update all users in git.git, but yeah, out-of-tree users and
> in-flight topics would segfault.
> 
> > If so, might it be kinder for read_...() to not assume anything about
> > the incoming struct, and initialize it from scratch? I.e., not to use
> > clear() but just do the initialization step?
> 
> I have some vague memory from going down that route and giving up. Now
> that I'm looking at it again, I think we can at least try to do
> something. We can make sure that "external" users that call into setup.c
> are fine (they'll leak, but won't crash). Out-of-tree users inside
> setup.c will still be able to trip on this. I don't have much spare time
> over the next few days, but I'll get to this.
> 
> Or we could accept that we may leak when we end up calling `read()`
> multiple times (I could catch all leaks now, but new ones might sneak in
> after that) and come back to this after X months, when we can perhaps
> afford to be a bit more aggressive.
> 
> I guess we could just rename the struct to have the compiler catch
> out-of-tree users...

I'm less worried about out-of-tree users, and more concerned with just
having a calling convention that matches usual conventions (and is
harder to get wrong).

It's a pretty minor point, though, so I can live with it either way.

-Peff
Martin Ågren Feb. 25, 2019, 7:21 p.m. UTC | #6
On Fri, 25 Jan 2019 at 20:51, Jeff King <peff@peff.net> wrote:
>
> > On Wed, 23 Jan 2019 at 06:57, Jeff King <peff@peff.net> wrote:
> > >
> > > On Tue, Jan 22, 2019 at 10:45:48PM +0100, Martin Ågren wrote:
> > >
> > > > Call `clear_...()` at the start of `read_...()` instead of just zeroing
> > > > the struct, since we sometimes enter the function multiple times. This
> > > > means that it is important to initialize the struct before calling
> > > > `read_...()`, so document that.
> > >
> > > This part is a little counter-intuitive to me. Is anybody ever going to
> > > pass in anything except a struct initialized to REPOSITORY_FORMAT_INIT?

> > > If so, might it be kinder for read_...() to not assume anything about
> > > the incoming struct, and initialize it from scratch? I.e., not to use
> > > clear() but just do the initialization step?
>
> I'm less worried about out-of-tree users, and more concerned with just
> having a calling convention that matches usual conventions (and is
> harder to get wrong).
>
> It's a pretty minor point, though, so I can live with it either way.

It's time to resurrect this thread. I've reworked this patch to avoid
the compound literal when re-initing a struct, and I've been going back
and forth on this point about having to initialize to `..._INIT` or risk
crashing. And I keep coming back to thinking that it's not *that*
different from how `STRBUF_INIT` works.

There's the obvious difference that there aren't as many functions to
call, so there's certainly a difference in scale. And you'd think that
you'll always start with `read_...()`, but another potential first
function to call is `clear_...()` (see builtin/init-db.c), in which case
you better have used `..._INIT` first.

I'm tempted to address this point by documenting as good as I can in the
.h-file that one has to use this initializer macro. I'll obviously
convert all users, so copy-paste programming should work fine...

How does that sound to you?

Martin
Jeff King Feb. 26, 2019, 5:46 p.m. UTC | #7
On Mon, Feb 25, 2019 at 08:21:07PM +0100, Martin Ågren wrote:

> It's time to resurrect this thread. I've reworked this patch to avoid
> the compound literal when re-initing a struct, and I've been going back
> and forth on this point about having to initialize to `..._INIT` or risk
> crashing. And I keep coming back to thinking that it's not *that*
> different from how `STRBUF_INIT` works.
> 
> There's the obvious difference that there aren't as many functions to
> call, so there's certainly a difference in scale. And you'd think that
> you'll always start with `read_...()`, but another potential first
> function to call is `clear_...()` (see builtin/init-db.c), in which case
> you better have used `..._INIT` first.
> 
> I'm tempted to address this point by documenting as good as I can in the
> .h-file that one has to use this initializer macro. I'll obviously
> convert all users, so copy-paste programming should work fine...
> 
> How does that sound to you?

It sounds pretty good.

-Peff
diff mbox series

Patch

diff --git a/cache.h b/cache.h
index ca36b44ee0..ec7c043412 100644
--- a/cache.h
+++ b/cache.h
@@ -972,14 +972,35 @@  struct repository_format {
 	struct string_list unknown_extensions;
 };
 
+/*
+ * Always use this to initialize a `struct repository_format`
+ * to a well-defined, default state before calling
+ * `read_repository()`.
+ */
+#define REPOSITORY_FORMAT_INIT \
+{ \
+	.version = -1, \
+	.is_bare = -1, \
+	.hash_algo = GIT_HASH_SHA1, \
+	.unknown_extensions = STRING_LIST_INIT_DUP, \
+}
+
 /*
  * Read the repository format characteristics from the config file "path" into
- * "format" struct. Returns the numeric version. On error, -1 is returned,
- * format->version is set to -1, and all other fields in the struct are
- * undefined.
+ * "format" struct. Returns the numeric version. On error, or if no version is
+ * found in the configuration, -1 is returned, format->version is set to -1,
+ * and all other fields in the struct are set to the default configuration
+ * (REPOSITORY_FORMAT_INIT). Always initialize the struct using
+ * REPOSITORY_FORMAT_INIT before calling this function.
  */
 int read_repository_format(struct repository_format *format, const char *path);
 
+/*
+ * Free the memory held onto by `format`, but not the struct itself.
+ * (No need to use this after `read_repository_format()` fails.)
+ */
+void clear_repository_format(struct repository_format *format);
+
 /*
  * Verify that the repository described by repository_format is something we
  * can read. If it is, return 0. Otherwise, return -1, and "err" will describe
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 41faffd28d..04c60eaad5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -96,7 +96,7 @@  static void copy_templates(const char *template_dir)
 	struct strbuf path = STRBUF_INIT;
 	struct strbuf template_path = STRBUF_INIT;
 	size_t template_len;
-	struct repository_format template_format;
+	struct repository_format template_format = REPOSITORY_FORMAT_INIT;
 	struct strbuf err = STRBUF_INIT;
 	DIR *dir;
 	char *to_free = NULL;
@@ -148,6 +148,7 @@  static void copy_templates(const char *template_dir)
 	free(to_free);
 	strbuf_release(&path);
 	strbuf_release(&template_path);
+	clear_repository_format(&template_format);
 }
 
 static int git_init_db_config(const char *k, const char *v, void *cb)
diff --git a/repository.c b/repository.c
index 7b02e1dffa..df88705574 100644
--- a/repository.c
+++ b/repository.c
@@ -148,7 +148,7 @@  int repo_init(struct repository *repo,
 	      const char *gitdir,
 	      const char *worktree)
 {
-	struct repository_format format;
+	struct repository_format format = REPOSITORY_FORMAT_INIT;
 	memset(repo, 0, sizeof(*repo));
 
 	repo->objects = raw_object_store_new();
@@ -165,6 +165,7 @@  int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	clear_repository_format(&format);
 	return 0;
 
 error:
diff --git a/setup.c b/setup.c
index bb633942bb..14c2f8ce24 100644
--- a/setup.c
+++ b/setup.c
@@ -477,7 +477,7 @@  static int check_repository_format_gently(const char *gitdir, struct repository_
 	}
 
 	repository_format_precious_objects = candidate->precious_objects;
-	repository_format_partial_clone = candidate->partial_clone;
+	repository_format_partial_clone = xstrdup_or_null(candidate->partial_clone);
 	repository_format_worktree_config = candidate->worktree_config;
 	string_list_clear(&candidate->unknown_extensions, 0);
 
@@ -500,11 +500,9 @@  static int check_repository_format_gently(const char *gitdir, struct repository_
 		}
 		if (candidate->work_tree) {
 			free(git_work_tree_cfg);
-			git_work_tree_cfg = candidate->work_tree;
+			git_work_tree_cfg = xstrdup(candidate->work_tree);
 			inside_work_tree = -1;
 		}
-	} else {
-		free(candidate->work_tree);
 	}
 
 	return 0;
@@ -512,15 +510,21 @@  static int check_repository_format_gently(const char *gitdir, struct repository_
 
 int read_repository_format(struct repository_format *format, const char *path)
 {
-	memset(format, 0, sizeof(*format));
-	format->version = -1;
-	format->is_bare = -1;
-	format->hash_algo = GIT_HASH_SHA1;
-	string_list_init(&format->unknown_extensions, 1);
+	clear_repository_format(format);
 	git_config_from_file(check_repo_format, path, format);
+	if (format->version == -1)
+		clear_repository_format(format);
 	return format->version;
 }
 
+void clear_repository_format(struct repository_format *format)
+{
+	string_list_clear(&format->unknown_extensions, 0);
+	free(format->work_tree);
+	free(format->partial_clone);
+	*format = (struct repository_format)REPOSITORY_FORMAT_INIT;
+}
+
 int verify_repository_format(const struct repository_format *format,
 			     struct strbuf *err)
 {
@@ -1008,7 +1012,7 @@  int discover_git_directory(struct strbuf *commondir,
 	struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
 	size_t gitdir_offset = gitdir->len, cwd_len;
 	size_t commondir_offset = commondir->len;
-	struct repository_format candidate;
+	struct repository_format candidate = REPOSITORY_FORMAT_INIT;
 
 	if (strbuf_getcwd(&dir))
 		return -1;
@@ -1045,9 +1049,11 @@  int discover_git_directory(struct strbuf *commondir,
 		strbuf_release(&err);
 		strbuf_setlen(commondir, commondir_offset);
 		strbuf_setlen(gitdir, gitdir_offset);
+		clear_repository_format(&candidate);
 		return -1;
 	}
 
+	clear_repository_format(&candidate);
 	return 0;
 }
 
@@ -1056,7 +1062,7 @@  const char *setup_git_directory_gently(int *nongit_ok)
 	static struct strbuf cwd = STRBUF_INIT;
 	struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT;
 	const char *prefix;
-	struct repository_format repo_fmt;
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 
 	/*
 	 * We may have read an incomplete configuration before
@@ -1146,6 +1152,7 @@  const char *setup_git_directory_gently(int *nongit_ok)
 
 	strbuf_release(&dir);
 	strbuf_release(&gitdir);
+	clear_repository_format(&repo_fmt);
 
 	return prefix;
 }
@@ -1203,9 +1210,10 @@  int git_config_perm(const char *var, const char *value)
 
 void check_repository_format(void)
 {
-	struct repository_format repo_fmt;
+	struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
 	check_repository_format_gently(get_git_dir(), &repo_fmt, NULL);
 	startup_info->have_repository = 1;
+	clear_repository_format(&repo_fmt);
 }
 
 /*
diff --git a/worktree.c b/worktree.c
index d6a0ee7f73..b45bfeb9d3 100644
--- a/worktree.c
+++ b/worktree.c
@@ -444,7 +444,7 @@  int submodule_uses_worktrees(const char *path)
 	DIR *dir;
 	struct dirent *d;
 	int ret = 0;
-	struct repository_format format;
+	struct repository_format format = REPOSITORY_FORMAT_INIT;
 
 	submodule_gitdir = git_pathdup_submodule(path, "%s", "");
 	if (!submodule_gitdir)
@@ -462,8 +462,10 @@  int submodule_uses_worktrees(const char *path)
 	read_repository_format(&format, sb.buf);
 	if (format.version != 0) {
 		strbuf_release(&sb);
+		clear_repository_format(&format);
 		return 1;
 	}
+	clear_repository_format(&format);
 
 	/* Replace config by worktrees. */
 	strbuf_setlen(&sb, sb.len - strlen("config"));