mbox series

[0/12] git_config_string() considered harmful

Message ID 20240407005656.GA436890@coredump.intra.peff.net (mailing list archive)
Headers show
Series git_config_string() considered harmful | expand

Message

Jeff King April 7, 2024, 12:56 a.m. UTC
On Sat, Apr 06, 2024 at 11:11:12AM -0700, Junio C Hamano wrote:

> The excludes_file variable is marked "const char *", but all the
> assignments to it are made with a piece of memory allocated just
> for it, and the variable is responsible for owning it.
> 
> When "core.excludesfile" is read, the code just lost the previous
> value, leaking memory.  Plug it.
> 
> The real problem is that the variable is mistyped; our convention
> is to never make a variable that owns the piece of memory pointed
> by it as "const".  Fixing that would reduce the chance of this kind
> of bug happening, and also would make it unnecessary to cast the
> constness away while free()ing it, but that would be a much larger
> follow-up effort.

As you noticed in your follow-up, this is just the tip of the iceberg.
And it's not just git_config_pathname(), but really git_config_string(),
and it is a potential problem for almost every call.

I have a series that I started a few months ago to try to improve this,
but I never sent it in because I didn't have a good solution for the
long tail of variables where we assign a string literal as the default.

But that doesn't mean we can't incrementally make things better. So I
polished it up a bit, and will send the result in a minute.

Looking at your sketch, I think I glossed over the parse-options
OPT_FILENAME_DUP() issue. In practice it's OK because we wouldn't
generally re-read the config after parsing the options. But leaving it
does seem rather ugly, and your solution looks reasonable. I'm not sure
if there's an easy way to get the compiler to point to spots which need
it; the type information is all lost when parse-options passes
everything through a void pointer.

(I remember a while ago looking at retaining type annotations for
parse-options; this could be another use case for that).

I think it would also be useful if we could enable -Wwrite-strings to
catch cases where string literals are assigned to non-const pointers.
But there's some cleanup/refactoring to get that to compile cleanly.

  [01/11]: config: make sequencer.c's git_config_string_dup() public
  [02/11]: config: add git_config_pathname_dup()
  [03/11]: config: prefer git_config_string_dup() for temp variables
  [04/11]: config: use git_config_string_dup() for open-coded equivalents
  [05/11]: config: use git_config_string_dup() to fix leaky open coding
  [06/11]: config: use git_config_string() in easy cases
  [07/11]: config: use git_config_pathname_dup() in easy cases
  [08/11]: http: use git_config_string_dup()
  [09/11]: merge: use git_config_string_dup() for pull strategies
  [10/11]: userdiff: use git_config_string_dup() when we can
  [11/11]: blame: use "dup" string_list for ignore-revs files

 alias.c                |   3 +-
 archive-tar.c          |  10 ++--
 attr.c                 |   2 +-
 attr.h                 |   2 +-
 builtin/blame.c        |   7 +--
 builtin/commit.c       |   8 ++--
 builtin/config.c       |   6 +--
 builtin/help.c         |   7 +--
 builtin/log.c          |  16 +++----
 builtin/merge.c        |  12 ++---
 builtin/receive-pack.c |  10 ++--
 builtin/repack.c       |  16 +++----
 compat/mingw.c         |   7 +--
 config.c               |  48 ++++++++++++-------
 config.h               |  19 ++++++++
 convert.c              |  12 ++---
 delta-islands.c        |   4 +-
 diff.c                 |  12 ++---
 environment.c          |  14 +++---
 environment.h          |  14 +++---
 fetch-pack.c           |   6 +--
 fsck.c                 |   6 +--
 gpg-interface.c        |   9 ++--
 http.c                 | 105 +++++++++++++++++++----------------------
 imap-send.c            |  20 ++++----
 mailmap.c              |   4 +-
 mailmap.h              |   4 +-
 merge-ll.c             |  17 +++----
 pager.c                |   4 +-
 promisor-remote.c      |   2 +-
 promisor-remote.h      |   2 +-
 remote.c               |  45 +++++++++---------
 remote.h               |   8 ++--
 sequencer.c            |  12 +----
 setup.c                |  11 ++---
 upload-pack.c          |   4 +-
 userdiff.c             |   6 +--
 userdiff.h             |   6 +--
 38 files changed, 251 insertions(+), 249 deletions(-)

-Peff

Comments

Jeff King April 7, 2024, 1:08 a.m. UTC | #1
On Sat, Apr 06, 2024 at 08:56:56PM -0400, Jeff King wrote:

> Subject: Re: [PATCH 0/12] git_config_string() considered harmful
> [...]
>   [01/11]: config: make sequencer.c's git_config_string_dup() public
>   [02/11]: config: add git_config_pathname_dup()
>   [03/11]: config: prefer git_config_string_dup() for temp variables
>   [04/11]: config: use git_config_string_dup() for open-coded equivalents
>   [05/11]: config: use git_config_string_dup() to fix leaky open coding
>   [06/11]: config: use git_config_string() in easy cases
>   [07/11]: config: use git_config_pathname_dup() in easy cases
>   [08/11]: http: use git_config_string_dup()
>   [09/11]: merge: use git_config_string_dup() for pull strategies
>   [10/11]: userdiff: use git_config_string_dup() when we can
>   [11/11]: blame: use "dup" string_list for ignore-revs files

The cover letter subject is wrong, of course. :) I wrote it manually and
ended up squashing two of the patches at the last minute.

-Peff
Rubén Justo April 7, 2024, 5:58 p.m. UTC | #2
On Sat, Apr 06, 2024 at 08:56:56PM -0400, Jeff King wrote:

> And it's not just git_config_pathname(), but really git_config_string(),

Indeed.

After Junio's series and yours, I'm on the fence now, but my envision was
to introduce:

--- >8 ---
diff --git a/config.c b/config.c
index eebce8c7e0..7322bdfb94 100644
--- a/config.c
+++ b/config.c
@@ -1345,6 +1345,15 @@ int git_config_string(const char **dest, const char *var, const char *value)
 	return 0;
 }
 
+int git_config_strbuf(struct strbuf *dest, const char *var, const char *value)
+{
+	if (!value)
+		return config_error_nonbool(var);
+	strbuf_reset(dest);
+	strbuf_addstr(dest, value);
+	return 0;
+}
+
 int git_config_pathname(const char **dest, const char *var, const char *value)
 {
 	if (!value)
diff --git a/config.h b/config.h
index f4966e3749..46e3137612 100644
--- a/config.h
+++ b/config.h
@@ -282,6 +282,12 @@ int git_config_bool(const char *, const char *);
  */
 int git_config_string(const char **, const char *, const char *);
 
+/**
+ * Copies the value string into the `dest` parameter; if no
+ * string is given, prints an error message and returns -1.
+ */
+int git_config_strbuf(struct strbuf *, const char *, const char *);
+
 /**
  * Similar to `git_config_string`, but expands `~` or `~user` into the
  * user's home directory when found at the beginning of the path.
--- 8< ---

To allow uses like:

--- >8 ---
diff --git a/config.c b/config.c
index 7322bdfb94..03884fa782 100644
--- a/config.c
+++ b/config.c
@@ -1572,7 +1572,7 @@ static int git_default_core_config(const char *var, const char *value,
 	}
 
 	if (!strcmp(var, "core.editor"))
-		return git_config_string(&editor_program, var, value);
+		return git_config_strbuf(&editor_program, var, value);
 
 	if (!strcmp(var, "core.commentchar") ||
 	    !strcmp(var, "core.commentstring")) {
diff --git a/editor.c b/editor.c
index b67b802ddf..618c193249 100644
--- a/editor.c
+++ b/editor.c
@@ -27,8 +27,8 @@ const char *git_editor(void)
 	const char *editor = getenv("GIT_EDITOR");
 	int terminal_is_dumb = is_terminal_dumb();
 
-	if (!editor && editor_program)
-		editor = editor_program;
+	if (!editor && editor_program.len)
+		editor = editor_program.buf;
 	if (!editor && !terminal_is_dumb)
 		editor = getenv("VISUAL");
 	if (!editor)
diff --git a/environment.c b/environment.c
index a73ba9c12c..b5073ff972 100644
--- a/environment.c
+++ b/environment.c
@@ -58,7 +58,7 @@ size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
 unsigned long big_file_threshold = 512 * 1024 * 1024;
-const char *editor_program;
+struct strbuf editor_program = STRBUF_INIT;
 const char *askpass_program;
 const char *excludes_file;
 enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
diff --git a/environment.h b/environment.h
index 05fd94d7be..c20898345e 100644
--- a/environment.h
+++ b/environment.h
@@ -220,7 +220,7 @@ const char *get_commit_output_encoding(void);
 extern const char *git_commit_encoding;
 extern const char *git_log_output_encoding;
 
-extern const char *editor_program;
+extern struct strbuf editor_program;
 extern const char *askpass_program;
 extern const char *excludes_file;
Jeff King April 8, 2024, 8:55 p.m. UTC | #3
On Sun, Apr 07, 2024 at 07:58:02PM +0200, Rubén Justo wrote:

> After Junio's series and yours, I'm on the fence now, but my envision was
> to introduce:
> 
> --- >8 ---
> diff --git a/config.c b/config.c
> index eebce8c7e0..7322bdfb94 100644
> --- a/config.c
> +++ b/config.c
> @@ -1345,6 +1345,15 @@ int git_config_string(const char **dest, const char *var, const char *value)
>  	return 0;
>  }
>  
> +int git_config_strbuf(struct strbuf *dest, const char *var, const char *value)
> +{
> +	if (!value)
> +		return config_error_nonbool(var);
> +	strbuf_reset(dest);
> +	strbuf_addstr(dest, value);
> +	return 0;
> +}
> +
>  int git_config_pathname(const char **dest, const char *var, const char *value)
>  {
>  	if (!value)

Hmm. I think that is nice in some ways, because it is a much bigger
signal about memory ownership than just dropping "const" from the
pointer.

But it is less nice in other ways. Every _user_ of the value now needs
to care that it is a strbuf, and use foo.buf consistently (which you
obviously noticed). Likewise, any downstream writers of the variable
need to treat it like a strbuf, too. So the parse-options OPT_FILENAME()
macro, for example, needs to be replaced with a strbuf-aware variant
(though arguably that is an improvement, as using the wrong one would
fail catastrophically, whereas using a non-const pointer with
OPT_FILENAME() creates a subtle bug).

I'm also not sure what the solution is for setting default values, like:

  const char *my_config = "default";

Of course that is a problem with my solution, too. Perhaps in the long
run we need to accept that they should always default to NULL, and
readers of the variable need to fill in the default when they look at it
(possibly with an accessor function).

Or I guess the alternative is to stop using bare pointers, and carry a
bit which tells us if something is heap-allocated. Which starts to look
an awful lot like a strbuf. ;)

I think in the past we've talked about being able to initialize a strbuf
like:

  struct strbuf foo = STRBUF_INIT_VAL("bar");

That would use "bar" instead of the usual slopbuf, but we can still tell
it's not a heap buffer by the fact that foo.alloc is 0.

-Peff
Rubén Justo April 11, 2024, 11:36 p.m. UTC | #4
On Mon, Apr 08, 2024 at 04:55:11PM -0400, Jeff King wrote:
> On Sun, Apr 07, 2024 at 07:58:02PM +0200, Rubén Justo wrote:
> 
> > After Junio's series and yours, I'm on the fence now, but my envision was
> > to introduce:
> > 
> > --- >8 ---
> > diff --git a/config.c b/config.c
> > index eebce8c7e0..7322bdfb94 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -1345,6 +1345,15 @@ int git_config_string(const char **dest, const char *var, const char *value)
> >  	return 0;
> >  }
> >  
> > +int git_config_strbuf(struct strbuf *dest, const char *var, const char *value)
> > +{
> > +	if (!value)
> > +		return config_error_nonbool(var);
> > +	strbuf_reset(dest);
> > +	strbuf_addstr(dest, value);
> > +	return 0;
> > +}
> > +
> >  int git_config_pathname(const char **dest, const char *var, const char *value)
> >  {
> >  	if (!value)
> 
> Hmm. I think that is nice in some ways, because it is a much bigger
> signal about memory ownership than just dropping "const" from the
> pointer.
> 
> But it is less nice in other ways. Every _user_ of the value now needs
> to care that it is a strbuf, and use foo.buf consistently (which you
> obviously noticed). Likewise, any downstream writers of the variable
> need to treat it like a strbuf, too. So the parse-options OPT_FILENAME()
> macro, for example, needs to be replaced with a strbuf-aware variant
> (though arguably that is an improvement, as using the wrong one would
> fail catastrophically, whereas using a non-const pointer with
> OPT_FILENAME() creates a subtle bug).
> 
> I'm also not sure what the solution is for setting default values, like:
> 
>   const char *my_config = "default";
> 
> Of course that is a problem with my solution, too. Perhaps in the long
> run we need to accept that they should always default to NULL, and
> readers of the variable need to fill in the default when they look at it
> (possibly with an accessor function).
> 
> Or I guess the alternative is to stop using bare pointers, and carry a
> bit which tells us if something is heap-allocated. Which starts to look
> an awful lot like a strbuf. ;)
> 
> I think in the past we've talked about being able to initialize a strbuf
> like:
> 
>   struct strbuf foo = STRBUF_INIT_VAL("bar");
> 
> That would use "bar" instead of the usual slopbuf, but we can still tell
> it's not a heap buffer by the fact that foo.alloc is 0.
> 
> -Peff

Hi Peff.

Thanks for your ideas.

For the globals we have in environment.h, maybe we can keep them const
and avoid the other inconveniences, doing something like:

diff --git a/config.c b/config.c
index 146856567a..ead3565c27 100644
--- a/config.c
+++ b/config.c
@@ -1671,8 +1671,13 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
                return 0;
        }

-       if (!strcmp(var, "core.editor"))
-               return git_config_string(&editor_program, var, value);
+       if (!strcmp(var, "core.editor")) {
+               static struct strbuf editor_program_ = STRBUF_INIT;
+               if (git_config_strbuf(&editor_program_, var, value))
+                       return -1;
+               editor_program = editor_program_.buf;
+               return 0;
+       }

        if (!strcmp(var, "core.commentchar")) {
                if (!value)