diff mbox series

[1/2] log: UNLEAK rev to silence a large number of leaks

Message ID 6d54bc264e2f9ce519f32c0673167a00bab55573.1631972978.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Squash leaks in t0000 | expand

Commit Message

Andrzej Hunt Sept. 18, 2021, 1:49 p.m. UTC
From: Andrzej Hunt <ajrhunt@google.com>

cmd_show puts a lot of data into rev, and doesn't clean it up before
returning. That's reasonable - we use most if not all of rev up until
cmd_show is finished - there's not much value in doing a proper cleanup.
Therefore we take the easy way out and UNLEAK rev.

The UNLEAK has to be performed early on, as cmd_show might return via
cmd_log_walk() in the next few lines, or it might continue to the
no-walk implementation below.

This patch silences the following leaks which were found when running
t0000 against LSAN:

Direct leak of 41 byte(s) in 1 object(s) allocated from:
    #0 0x486834 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0x9ab168 in xstrdup /home/ahunt/oss-fuzz/git/wrapper.c:29:14
    #2 0x83cced in add_object_array_with_path /home/ahunt/oss-fuzz/git/object.c:349:17
    #3 0x8f4f5a in add_pending_object_with_path /home/ahunt/oss-fuzz/git/revision.c:329:2
    #4 0x8eb2b6 in handle_revision_arg_1 /home/ahunt/oss-fuzz/git/revision.c:2082:2
    #5 0x8eadad in handle_revision_arg /home/ahunt/oss-fuzz/git/revision.c:2089:12
    #6 0x8eea99 in setup_revisions /home/ahunt/oss-fuzz/git/revision.c:2756:7
    #7 0x59c024 in cmd_log_init_finish /home/ahunt/oss-fuzz/git/builtin/log.c:206:9
    #8 0x5998d8 in cmd_log_init /home/ahunt/oss-fuzz/git/builtin/log.c:275:2
    #9 0x599f9b in cmd_show /home/ahunt/oss-fuzz/git/builtin/log.c:641:2
    #10 0x4cd92d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #11 0x4cb5fa in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #12 0x4ccf57 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #13 0x4caf49 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #14 0x69ce3e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #15 0x7f7c56197349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x49a9d2 in calloc /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0x9ab4c2 in xcalloc /home/ahunt/oss-fuzz/git/wrapper.c:140:8
    #2 0x59c269 in cmd_log_init_finish /home/ahunt/oss-fuzz/git/builtin/log.c:233:18
    #3 0x5998d8 in cmd_log_init /home/ahunt/oss-fuzz/git/builtin/log.c:275:2
    #4 0x599f9b in cmd_show /home/ahunt/oss-fuzz/git/builtin/log.c:641:2
    #5 0x4cd92d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #6 0x4cb5fa in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #7 0x4ccf57 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #8 0x4caf49 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #9 0x69ce3e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #10 0x7f7c56197349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 41 byte(s) in 1 object(s) allocated from:
    #0 0x486834 in strdup /home/abuild/rpmbuild/BUILD/llvm-11.0.0.src/build/../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0x9ab168 in xstrdup /home/ahunt/oss-fuzz/git/wrapper.c:29:14
    #2 0x8f5e30 in add_rev_cmdline /home/ahunt/oss-fuzz/git/revision.c:1482:23
    #3 0x8eb26d in handle_revision_arg_1 /home/ahunt/oss-fuzz/git/revision.c:2081:2
    #4 0x8eadad in handle_revision_arg /home/ahunt/oss-fuzz/git/revision.c:2089:12
    #5 0x8eea99 in setup_revisions /home/ahunt/oss-fuzz/git/revision.c:2756:7
    #6 0x59c024 in cmd_log_init_finish /home/ahunt/oss-fuzz/git/builtin/log.c:206:9
    #7 0x5998d8 in cmd_log_init /home/ahunt/oss-fuzz/git/builtin/log.c:275:2
    #8 0x599f9b in cmd_show /home/ahunt/oss-fuzz/git/builtin/log.c:641:2
    #9 0x4cd92d in run_builtin /home/ahunt/oss-fuzz/git/git.c:453:11
    #10 0x4cb5fa in handle_builtin /home/ahunt/oss-fuzz/git/git.c:704:3
    #11 0x4ccf57 in run_argv /home/ahunt/oss-fuzz/git/git.c:771:4
    #12 0x4caf49 in cmd_main /home/ahunt/oss-fuzz/git/git.c:902:19
    #13 0x69ce3e in main /home/ahunt/oss-fuzz/git/common-main.c:52:11
    #14 0x7fc4b3f06349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
---
 builtin/log.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Carlo Marcelo Arenas Belón Sept. 18, 2021, 8:06 p.m. UTC | #1
My equivalent version for these fixes is obviously more verbose but IMHO
not that ugly (and as safe)

It avoids the need to UNLEAK early by changing the program flow also for
the early return so the cleanup could be centralized in one single
function.

Both, the cmdline and mailmap arrays (and the objects they accumulate)
are cleaned in a "reusable" way.

Note that the cleaning of the "name" in the cmdline item throws a warning
as shown below which I intentionally didn't fix, as it would seem that
either the use of const there or the need to strdup is wrong.  So hope
someone that knows this code better could chime in.

Carlo
------ >8 ------
Subject: [PATCH] builtin/log: leaks from `git show` in t0000

obviously not ready, since the following will need to be corrected:

  revision.c:1496:8: warning: passing 'const char *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
                  free(info->rev[i].name);
                       ^~~~~~~~~~~~~~~~~

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 builtin/log.c |  8 ++++++--
 revision.c    | 20 ++++++++++++++++++++
 revision.h    |  5 +++++
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index f75d87e8d7..1b1c1f53f4 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -645,8 +645,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 	opt.tweak = show_setup_revisions_tweak;
 	cmd_log_init(argc, argv, prefix, &rev, &opt);
 
-	if (!rev.no_walk)
-		return cmd_log_walk(&rev);
+	if (!rev.no_walk) {
+		ret = cmd_log_walk(&rev);
+		goto done;
+	}
 
 	count = rev.pending.nr;
 	objects = rev.pending.objects;
@@ -702,6 +704,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 		}
 	}
 	free(objects);
+done:
+	repo_clear_revisions(&rev);
 	return ret;
 }
 
diff --git a/revision.c b/revision.c
index 0dabb5a0bc..ce62192dd8 100644
--- a/revision.c
+++ b/revision.c
@@ -1487,6 +1487,18 @@ static void add_rev_cmdline(struct rev_info *revs,
 	info->nr++;
 }
 
+static void clear_rev_cmdline(struct rev_info *revs)
+{
+	struct rev_cmdline_info *info = &revs->cmdline;
+	size_t i, nr = info->nr;
+
+	for (i = 0; i < nr; i++)
+		free(info->rev[i].name);
+
+	FREE_AND_NULL(info->rev);
+	info->nr = info->alloc = 0;
+}
+
 static void add_rev_cmdline_list(struct rev_info *revs,
 				 struct commit_list *commit_list,
 				 int whence,
@@ -1845,6 +1857,14 @@ void repo_init_revisions(struct repository *r,
 	init_display_notes(&revs->notes_opt);
 }
 
+void repo_clear_revisions(struct rev_info *revs)
+{
+	if (revs->mailmap)
+		clear_mailmap(revs->mailmap);
+	FREE_AND_NULL(revs->mailmap);
+	clear_rev_cmdline(revs);
+}
+
 static void add_pending_commit_list(struct rev_info *revs,
 				    struct commit_list *commit_list,
 				    unsigned int flags)
diff --git a/revision.h b/revision.h
index 0c65a760ee..f695c41cee 100644
--- a/revision.h
+++ b/revision.h
@@ -358,6 +358,11 @@ void repo_init_revisions(struct repository *r,
 			 struct rev_info *revs,
 			 const char *prefix);
 
+/*
+ * Free all structures dynamically allocated for the provided rev_info
+ */
+void repo_clear_revisions(struct rev_info *revs);
+
 /**
  * Parse revision information, filling in the `rev_info` structure, and
  * removing the used arguments from the argument list. Returns the number
Andrzej Hunt Sept. 19, 2021, 3:51 p.m. UTC | #2
On 18/09/2021 22:06, Carlo Marcelo Arenas Belón wrote:
> My equivalent version for these fixes is obviously more verbose but IMHO
> not that ugly (and as safe)
> 
> It avoids the need to UNLEAK early by changing the program flow also for
> the early return so the cleanup could be centralized in one single
> function.
> 
> Both, the cmdline and mailmap arrays (and the objects they accumulate)
> are cleaned in a "reusable" way.
> 
> Note that the cleaning of the "name" in the cmdline item throws a warning
> as shown below which I intentionally didn't fix, as it would seem that
> either the use of const there or the need to strdup is wrong.  So hope
> someone that knows this code better could chime in.
> 
> Carlo
> ------ >8 ------
> Subject: [PATCH] builtin/log: leaks from `git show` in t0000
> 
> obviously not ready, since the following will need to be corrected:
> 
>    revision.c:1496:8: warning: passing 'const char *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
>                    free(info->rev[i].name);
>                         ^~~~~~~~~~~~~~~~~
> 

Casting the pointer a la "free((void *) ...)" seems to be a common 
pattern in git, and seems like a reasonable option here. AFAIUI the 
const is still needed because clients  of rev_cmdline_info shouldn't be 
changing name. But since we own and created rev_cmdline_info, we also 
know it's safe to clean it up. For comparison, here's an example of 
submodule_entry being cleaned up - all members end up needing a cast:

static void free_one_config(struct submodule_entry *entry)
{
	free((void *) entry->config->path);
	free((void *) entry->config->name);
	free((void *) entry->config->branch);
	free((void *) entry->config->update_strategy.command);
	free(entry->config);
}

> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>   builtin/log.c |  8 ++++++--
>   revision.c    | 20 ++++++++++++++++++++
>   revision.h    |  5 +++++
>   3 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/log.c b/builtin/log.c
> index f75d87e8d7..1b1c1f53f4 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -645,8 +645,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>   	opt.tweak = show_setup_revisions_tweak;
>   	cmd_log_init(argc, argv, prefix, &rev, &opt);
>   
> -	if (!rev.no_walk)
> -		return cmd_log_walk(&rev);
> +	if (!rev.no_walk) {
> +		ret = cmd_log_walk(&rev);
> +		goto done;
> +	}
>   
>   	count = rev.pending.nr;
>   	objects = rev.pending.objects;
> @@ -702,6 +704,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>   		}
>   	}
>   	free(objects);
> +done:
> +	repo_clear_revisions(&rev);
>   	return ret;
>   }
>   
> diff --git a/revision.c b/revision.c
> index 0dabb5a0bc..ce62192dd8 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1487,6 +1487,18 @@ static void add_rev_cmdline(struct rev_info *revs,
>   	info->nr++;
>   }
>   
> +static void clear_rev_cmdline(struct rev_info *revs)
> +{
> +	struct rev_cmdline_info *info = &revs->cmdline;
> +	size_t i, nr = info->nr;
> +
> +	for (i = 0; i < nr; i++)
> +		free(info->rev[i].name);
> +
> +	FREE_AND_NULL(info->rev);
> +	info->nr = info->alloc = 0;
> +}
> +
>   static void add_rev_cmdline_list(struct rev_info *revs,
>   				 struct commit_list *commit_list,
>   				 int whence,
> @@ -1845,6 +1857,14 @@ void repo_init_revisions(struct repository *r,
>   	init_display_notes(&revs->notes_opt);
>   }
>   
> +void repo_clear_revisions(struct rev_info *revs)
> +{
> +	if (revs->mailmap)
> +		clear_mailmap(revs->mailmap);
> +	FREE_AND_NULL(revs->mailmap);
> +	clear_rev_cmdline(revs);
> +}
> +
>   static void add_pending_commit_list(struct rev_info *revs,
>   				    struct commit_list *commit_list,
>   				    unsigned int flags)
> diff --git a/revision.h b/revision.h
> index 0c65a760ee..f695c41cee 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -358,6 +358,11 @@ void repo_init_revisions(struct repository *r,
>   			 struct rev_info *revs,
>   			 const char *prefix);
>   
> +/*
> + * Free all structures dynamically allocated for the provided rev_info
> + */
> +void repo_clear_revisions(struct rev_info *revs);
> +
>   /**
>    * Parse revision information, filling in the `rev_info` structure, and
>    * removing the used arguments from the argument list. Returns the number
> 

This patch looks good to me (modulo adding the cast as discussed above), 
and is obviously much better than my approach of using an UNLEAK!

ATB,

Andrzej
Ævar Arnfjörð Bjarmason Sept. 19, 2021, 4:13 p.m. UTC | #3
On Sat, Sep 18 2021, Carlo Marcelo Arenas Belón wrote:

> My equivalent version for these fixes is obviously more verbose but IMHO
> not that ugly (and as safe)
>
> It avoids the need to UNLEAK early by changing the program flow also for
> the early return so the cleanup could be centralized in one single
> function.
>
> Both, the cmdline and mailmap arrays (and the objects they accumulate)
> are cleaned in a "reusable" way.
>
> Note that the cleaning of the "name" in the cmdline item throws a warning
> as shown below which I intentionally didn't fix, as it would seem that
> either the use of const there or the need to strdup is wrong.  So hope
> someone that knows this code better could chime in.

It should just be a "char *", I got that wrong in my version posted in
the side-thread[1] & mentioned in the side-reply[2].

(I think I got it right in some earlier version days ago, it should be a
'char *' like anyting we malloc, but brainfart when re-doing/re-basing
those changes).

Yours here below has a bug where you free() rev_cmdline_info items, you
need to use release_revisions_cmdline_rev().

I should have said in [2], but thanks a lot to you and Andrzej for
following up on the mess in t0000-basic.sh addressed by my v7
re-roll. It'll be really nice to get some of these leaks fixed & tested
for.

I think I was rather curt in [2] after a long debugging session, just
saying I appreciate it. Hopefully we can figure out some plan for mostly
pulling in the same direction with regards to the way forward.

1. https://github.com/git/git/commit/06380cd4f56f4c542685eb7aa79e28285fe02c55
2. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/

> Carlo
> ------ >8 ------
> Subject: [PATCH] builtin/log: leaks from `git show` in t0000
>
> obviously not ready, since the following will need to be corrected:
>
>   revision.c:1496:8: warning: passing 'const char *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
>                   free(info->rev[i].name);
>                        ^~~~~~~~~~~~~~~~~
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  builtin/log.c |  8 ++++++--
>  revision.c    | 20 ++++++++++++++++++++
>  revision.h    |  5 +++++
>  3 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/log.c b/builtin/log.c
> index f75d87e8d7..1b1c1f53f4 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -645,8 +645,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  	opt.tweak = show_setup_revisions_tweak;
>  	cmd_log_init(argc, argv, prefix, &rev, &opt);
>  
> -	if (!rev.no_walk)
> -		return cmd_log_walk(&rev);
> +	if (!rev.no_walk) {
> +		ret = cmd_log_walk(&rev);
> +		goto done;
> +	}
>  
>  	count = rev.pending.nr;
>  	objects = rev.pending.objects;
> @@ -702,6 +704,8 @@ int cmd_show(int argc, const char **argv, const char *prefix)
>  		}
>  	}
>  	free(objects);
> +done:
> +	repo_clear_revisions(&rev);
>  	return ret;
>  }
>  
> diff --git a/revision.c b/revision.c
> index 0dabb5a0bc..ce62192dd8 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1487,6 +1487,18 @@ static void add_rev_cmdline(struct rev_info *revs,
>  	info->nr++;
>  }
>  
> +static void clear_rev_cmdline(struct rev_info *revs)
> +{
> +	struct rev_cmdline_info *info = &revs->cmdline;
> +	size_t i, nr = info->nr;
> +
> +	for (i = 0; i < nr; i++)
> +		free(info->rev[i].name);
> +
> +	FREE_AND_NULL(info->rev);
> +	info->nr = info->alloc = 0;
> +}
> +
>  static void add_rev_cmdline_list(struct rev_info *revs,
>  				 struct commit_list *commit_list,
>  				 int whence,
> @@ -1845,6 +1857,14 @@ void repo_init_revisions(struct repository *r,
>  	init_display_notes(&revs->notes_opt);
>  }
>  
> +void repo_clear_revisions(struct rev_info *revs)
> +{
> +	if (revs->mailmap)
> +		clear_mailmap(revs->mailmap);
> +	FREE_AND_NULL(revs->mailmap);
> +	clear_rev_cmdline(revs);
> +}
> +
>  static void add_pending_commit_list(struct rev_info *revs,
>  				    struct commit_list *commit_list,
>  				    unsigned int flags)
> diff --git a/revision.h b/revision.h
> index 0c65a760ee..f695c41cee 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -358,6 +358,11 @@ void repo_init_revisions(struct repository *r,
>  			 struct rev_info *revs,
>  			 const char *prefix);
>  
> +/*
> + * Free all structures dynamically allocated for the provided rev_info
> + */
> +void repo_clear_revisions(struct rev_info *revs);
> +
>  /**
>   * Parse revision information, filling in the `rev_info` structure, and
>   * removing the used arguments from the argument list. Returns the number
Carlo Marcelo Arenas Belón Sept. 19, 2021, 9:34 p.m. UTC | #4
On Sun, Sep 19, 2021 at 06:13:43PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Sep 18 2021, Carlo Marcelo Arenas Belón wrote:
> 
> > Note that the cleaning of the "name" in the cmdline item throws a warning
> > as shown below which I intentionally didn't fix, as it would seem that
> > either the use of const there or the need to strdup is wrong.  So hope
> > someone that knows this code better could chime in.
> 
> It should just be a "char *", I got that wrong in my version posted in
> the side-thread[1] & mentioned in the side-reply[2].

I was instead leaning towards keeping it as a "const char *" and removing
the strdup as shown in the patch below (obviously the last hunk not relevant
to your series).

This object doesn't hold or even manipulate, the objects it contains, so
it might be also a cleaner API to ensure it only keeps references and
doesn't own any in the more CS sense (note I am not a CS guy, so maybe I
get the concept here wrong).

Ironically the original patch that added the strdup was because of leak
related work, but I think that in this case might had gotten it backwards.

Even if we start holding pointers to names that are not static, I would
expect whoever created those buffers to own the data anyway.

Carlo

CC Michael for advise as the original author
------ >8 ------
Subject: [PATCH] revision: remove dup() of name in add_rev_cmdline()

df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
2013-05-25) adds it, probably introducing a leak.

All names we will ever get will either come from the commandline
or be pointers to a static buffer in hex.c, so it is safe not to
xstrdup and clean them up (just like the struct object *item).

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 revision.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/revision.c b/revision.c
index ce62192dd8..b20bc58ccd 100644
--- a/revision.c
+++ b/revision.c
@@ -1468,7 +1468,6 @@ static int limit_list(struct rev_info *revs)
 
 /*
  * Add an entry to refs->cmdline with the specified information.
- * *name is copied.
  */
 static void add_rev_cmdline(struct rev_info *revs,
 			    struct object *item,
@@ -1481,7 +1480,7 @@ static void add_rev_cmdline(struct rev_info *revs,
 
 	ALLOC_GROW(info->rev, nr + 1, info->alloc);
 	info->rev[nr].item = item;
-	info->rev[nr].name = xstrdup(name);
+	info->rev[nr].name = name;
 	info->rev[nr].whence = whence;
 	info->rev[nr].flags = flags;
 	info->nr++;
@@ -1490,10 +1489,6 @@ static void add_rev_cmdline(struct rev_info *revs,
 static void clear_rev_cmdline(struct rev_info *revs)
 {
 	struct rev_cmdline_info *info = &revs->cmdline;
-	size_t i, nr = info->nr;
-
-	for (i = 0; i < nr; i++)
-		free(info->rev[i].name);
 
 	FREE_AND_NULL(info->rev);
 	info->nr = info->alloc = 0;
Eric Sunshine Sept. 20, 2021, 6:06 a.m. UTC | #5
On Sun, Sep 19, 2021 at 5:34 PM Carlo Marcelo Arenas Belón
<carenas@gmail.com> wrote:
> Subject: [PATCH] revision: remove dup() of name in add_rev_cmdline()
>
> df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> 2013-05-25) adds it, probably introducing a leak.
>
> All names we will ever get will either come from the commandline
> or be pointers to a static buffer in hex.c, so it is safe not to
> xstrdup and clean them up (just like the struct object *item).

I haven't been following this thread closely, but the mention of the
static buffer in hex.c invalidates the premise of this patch, as far
as I can tell. The "static buffer" is actually a ring of four buffers
which oid_to_hex() uses, one after another, into which it formats an
OID as hex. This allows a caller to format up to -- and only up to --
four OIDs without worrying about allocating its own memory for the hex
result. Beyond four, the caller can't use oid_to_hex() without doing
some sort of memory management itself, whether that be duplicating the
result of oid_to_hex() or by allocating its own buffers and calling
oid_to_hex_r() instead.

In this particular case, one of the callers of add_rev_cmdline() is
add_rev_cmdline_list(), which does this:

    while (commit_list) {
        ...
        add_rev_cmdline(..., oid_to_hex(...), ...);
        ...
    }

which may call add_rev_cmdline() any number of times, quite possibly
more than four.

Therefore (if I'm reading this correctly), it is absolutely correct
for add_rev_cmdline() to be duplicating that string to ensure that the
hexified OID value remains valid, and incorrect for this patch to be
removing the call to xstrdup().

> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
> diff --git a/revision.c b/revision.c
> @@ -1481,7 +1480,7 @@ static void add_rev_cmdline(struct rev_info *revs,
>         info->rev[nr].item = item;
> -       info->rev[nr].name = xstrdup(name);
> +       info->rev[nr].name = name;
>         info->rev[nr].whence = whence;
> @@ -1490,10 +1489,6 @@ static void add_rev_cmdline(struct rev_info *revs,
>  static void clear_rev_cmdline(struct rev_info *revs)
>  {
>         struct rev_cmdline_info *info = &revs->cmdline;
> -       size_t i, nr = info->nr;
> -
> -       for (i = 0; i < nr; i++)
> -               free(info->rev[i].name);
>
>         FREE_AND_NULL(info->rev);
Carlo Marcelo Arenas Belón Sept. 20, 2021, 9:39 p.m. UTC | #6
On Mon, Sep 20, 2021 at 02:06:01AM -0400, Eric Sunshine wrote:
> On Sun, Sep 19, 2021 at 5:34 PM Carlo Marcelo Arenas Belón
> <carenas@gmail.com> wrote:
> > Subject: [PATCH] revision: remove dup() of name in add_rev_cmdline()
> >
> > df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> > 2013-05-25) adds it, probably introducing a leak.
> >
> > All names we will ever get will either come from the commandline
> > or be pointers to a static buffer in hex.c, so it is safe not to
> > xstrdup and clean them up (just like the struct object *item).
> 
> I haven't been following this thread closely, but the mention of the
> static buffer in hex.c invalidates the premise of this patch, as far
> as I can tell. The "static buffer" is actually a ring of four buffers
> which oid_to_hex() uses, one after another, into which it formats an
> OID as hex. This allows a caller to format up to -- and only up to --
> four OIDs without worrying about allocating its own memory for the hex
> result. Beyond four, the caller can't use oid_to_hex() without doing
> some sort of memory management itself, whether that be duplicating the
> result of oid_to_hex() or by allocating its own buffers and calling
> oid_to_hex_r() instead.

Thanks; this then explains why as I was suspecting add_rev_cmdline_list()
was indeed buggy, and might had even triggered the workaround of doing the
strdup.

> Therefore (if I'm reading this correctly), it is absolutely correct
> for add_rev_cmdline() to be duplicating that string to ensure that the
> hexified OID value remains valid, and incorrect for this patch to be
> removing the call to xstrdup().

Indeed, but the values that are being strdup were never used anyway, so
I suspect the original code might had just put it as a logical default.

We might do better instead as shown in the following patch (again, second
hunk not relevant for the current code); I suspect if we were to land this,
the last hunks probably should be done first in an independent patch, as
well.

Carlo
-------- >8 --------
Subject: [PATCH] revision: remove xstrdup() of name in add_rev_cmdline()

a765499a08 (revision.c: treat A...B merge bases as if manually
specified, 2013-05-13) adds calls to this function in a loop,
abusing oid_to_hex (at that time called sha1_to_hex).

df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
2013-05-25) adds the strdup, introducing a leak.

All names we will ever get should come from the commandline or be
constant values, so it is safe not to xstrdup and clean them up.

Just like the struct object *item, that is referenced in the same
struct, name isn't owned or managed so correct both issues by making
sure all entries are indeed constant or valid global pointers (from
the real command line) and remove the leak.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 revision.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index ce62192dd8..829af28658 100644
--- a/revision.c
+++ b/revision.c
@@ -1468,7 +1468,6 @@ static int limit_list(struct rev_info *revs)
 
 /*
  * Add an entry to refs->cmdline with the specified information.
- * *name is copied.
  */
 static void add_rev_cmdline(struct rev_info *revs,
 			    struct object *item,
@@ -1481,7 +1480,7 @@ static void add_rev_cmdline(struct rev_info *revs,
 
 	ALLOC_GROW(info->rev, nr + 1, info->alloc);
 	info->rev[nr].item = item;
-	info->rev[nr].name = xstrdup(name);
+	info->rev[nr].name = name;
 	info->rev[nr].whence = whence;
 	info->rev[nr].flags = flags;
 	info->nr++;
@@ -1490,10 +1489,6 @@ static void add_rev_cmdline(struct rev_info *revs,
 static void clear_rev_cmdline(struct rev_info *revs)
 {
 	struct rev_cmdline_info *info = &revs->cmdline;
-	size_t i, nr = info->nr;
-
-	for (i = 0; i < nr; i++)
-		free(info->rev[i].name);
 
 	FREE_AND_NULL(info->rev);
 	info->nr = info->alloc = 0;
@@ -1504,10 +1499,10 @@ static void add_rev_cmdline_list(struct rev_info *revs,
 				 int whence,
 				 unsigned flags)
 {
+	static const char *synthetic = ".synthetic";
 	while (commit_list) {
 		struct object *object = &commit_list->item->object;
-		add_rev_cmdline(revs, object, oid_to_hex(&object->oid),
-				whence, flags);
+		add_rev_cmdline(revs, object, synthetic, whence, flags);
 		commit_list = commit_list->next;
 	}
 }
@@ -1753,7 +1748,7 @@ struct add_alternate_refs_data {
 static void add_one_alternate_ref(const struct object_id *oid,
 				  void *vdata)
 {
-	const char *name = ".alternate";
+	static const char *name = ".alternate";
 	struct add_alternate_refs_data *data = vdata;
 	struct object *obj;
 
@@ -1940,7 +1935,7 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
 			   struct object_context *a_oc,
 			   struct object_context *b_oc)
 {
-	const char *a_name, *b_name;
+	static const char *a_name, *b_name;
 	struct object_id a_oid, b_oid;
 	struct object *a_obj, *b_obj;
 	unsigned int a_flags, b_flags;
Jeff King Sept. 21, 2021, 3:09 a.m. UTC | #7
On Mon, Sep 20, 2021 at 02:39:12PM -0700, Carlo Marcelo Arenas Belón wrote:

> > Therefore (if I'm reading this correctly), it is absolutely correct
> > for add_rev_cmdline() to be duplicating that string to ensure that the
> > hexified OID value remains valid, and incorrect for this patch to be
> > removing the call to xstrdup().
> 
> Indeed, but the values that are being strdup were never used anyway, so
> I suspect the original code might had just put it as a logical default.

We do look at them in a few cases, like "fast-export", but only if they
are not marked UNINTERESTING. And add_rev_cmdline_list(), the variant
that writes the hex values, only ever gets called with the UNINTERESTING
flag.

So I think you're right that these ones would never be seen. I did
wonder if we'd ever show them with "log --source" or similar, but that
pulls the name from object_array_entry, I think.

> -------- >8 --------
> Subject: [PATCH] revision: remove xstrdup() of name in add_rev_cmdline()
> 
> a765499a08 (revision.c: treat A...B merge bases as if manually
> specified, 2013-05-13) adds calls to this function in a loop,
> abusing oid_to_hex (at that time called sha1_to_hex).
> 
> df835d3a0c (add_rev_cmdline(): make a copy of the name argument,
> 2013-05-25) adds the strdup, introducing a leak.
> 
> All names we will ever get should come from the commandline or be
> constant values, so it is safe not to xstrdup and clean them up.

This last paragraph is questionable, I think. We feed the argv from
setup_revisions() here, but that is not always coming from the actual
command line. Most cases seem to finish with the traversal before what
they've passed in goes out of scope, but not all. The call in
bisect_rev_setup() intentionally leaks the strvec (even though it
doesn't need to do so with the current code). The one in
cmd__fast_rebase() does clear its strvec after setup_revisions() but
before the actual traversal. I wouldn't be surprised if your patch
triggered memory problems there.

> @@ -1504,10 +1499,10 @@ static void add_rev_cmdline_list(struct rev_info *revs,
>  				 int whence,
>  				 unsigned flags)
>  {
> +	static const char *synthetic = ".synthetic";

I don't think there's any point in making this static. It's not an
array, but rather a pointer to a string literal. That string literal
will remain valid regardless (the standard does not guarantee we get the
_same_ string literal every time, but that doesn't matter for our
purposes. And in practice it will be the same one).

> @@ -1753,7 +1748,7 @@ struct add_alternate_refs_data {
>  static void add_one_alternate_ref(const struct object_id *oid,
>  				  void *vdata)
>  {
> -	const char *name = ".alternate";
> +	static const char *name = ".alternate";
>  	struct add_alternate_refs_data *data = vdata;
>  	struct object *obj;

Ditto here.

> @@ -1940,7 +1935,7 @@ static int handle_dotdot_1(const char *arg, char *dotdot,
>  			   struct object_context *a_oc,
>  			   struct object_context *b_oc)
>  {
> -	const char *a_name, *b_name;
> +	static const char *a_name, *b_name;
>  	struct object_id a_oid, b_oid;
>  	struct object *a_obj, *b_obj;
>  	unsigned int a_flags, b_flags;

And I don't see how this changes anything at all. Our pointers will live
on, but the memory they point to is not affected.

-Peff
diff mbox series

Patch

diff --git a/builtin/log.c b/builtin/log.c
index f75d87e8d7f..6faaddf17a6 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -644,6 +644,7 @@  int cmd_show(int argc, const char **argv, const char *prefix)
 	opt.def = "HEAD";
 	opt.tweak = show_setup_revisions_tweak;
 	cmd_log_init(argc, argv, prefix, &rev, &opt);
+	UNLEAK(rev);
 
 	if (!rev.no_walk)
 		return cmd_log_walk(&rev);