diff mbox series

[v2,1/2] merge-ort: fix small memory leak in detect_and_process_renames()

Message ID f1f7fc97fe2fe5079365bb91c71fb7033378995d.1645320592.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 8d60e9d2010d34f8c8ca65967ed02a2b06d74dc5
Headers show
Series Fix a couple small leaks in merge-ort | expand

Commit Message

Elijah Newren Feb. 20, 2022, 1:29 a.m. UTC
From: Elijah Newren <newren@gmail.com>

detect_and_process_renames() detects renames on both sides of history
and then combines these into a single diff_queue_struct.  The combined
diff_queue_struct needs to be able to hold the renames found on either
side, and since it knows the (maximum) size it needs, it pre-emptively
grows the array to the appropriate size:

	ALLOC_GROW(combined.queue,
		   renames->pairs[1].nr + renames->pairs[2].nr,
		   combined.alloc);

It then collects the items from each side:

	collect_renames(opt, &combined, MERGE_SIDE1, ...)
	collect_renames(opt, &combined, MERGE_SIDE2, ...)

Note, though, that collect_renames() sometimes determines that some
pairs are unnecessary and does not include them in the combined array.
When it is done, detect_and_process_renames() frees this memory:

	if (combined.nr) {
                ...
		free(combined.queue);
        }

The problem is that sometimes even when there are pairs, none of them
are necessary.  Instead of checking combined.nr, just remove the
if-check; free() knows to skip NULL pointers.  This change fixes the
following memory leak, as reported by valgrind:

==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
==PID==    at 0xADDRESS: malloc
==PID==    by 0xADDRESS: realloc
==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
==PID==    by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134)
==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610)
==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)
==PID==    by 0xADDRESS: cmd_merge (merge.c:1676)
==PID==    by 0xADDRESS: run_builtin (git.c:461)
==PID==    by 0xADDRESS: handle_builtin (git.c:713)
==PID==    by 0xADDRESS: run_argv (git.c:780)
==PID==    by 0xADDRESS: cmd_main (git.c:911)
==PID==    by 0xADDRESS: main (common-main.c:52)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

Comments

Taylor Blau Feb. 21, 2022, 2:35 a.m. UTC | #1
On Sun, Feb 20, 2022 at 01:29:50AM +0000, Elijah Newren via GitGitGadget wrote:
>  merge-ort.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)

Both versions of this patch look good to me (in fact, I appreciated
seeing both v1 and v2, since v1 makes it more obvious what is changing,
but v2 does the whole thing in a little bit of a cleaner way).

> diff --git a/merge-ort.c b/merge-ort.c
> index d85b1cd99e9..3d7f9feb6f7 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
>  				      struct tree *side1,
>  				      struct tree *side2)
>  {
> -	struct diff_queue_struct combined;
> +	struct diff_queue_struct combined = { 0 };
>  	struct rename_info *renames = &opt->priv->renames;
> -	int need_dir_renames, s, clean = 1;
> +	int need_dir_renames, s, i, clean = 1;

And this entire patch looks good to me, but I did wonder about why "i"
is an int here. Shouldn't it be a size_t instead? Looking at
diff_queue_struct's definition, both "alloc" and "nr" are signed ints,
when they should almost certainly be unsigned to avoid overflow.

Thanks,
Taylor
Elijah Newren Feb. 23, 2022, 7:57 a.m. UTC | #2
On Sun, Feb 20, 2022 at 6:35 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Sun, Feb 20, 2022 at 01:29:50AM +0000, Elijah Newren via GitGitGadget wrote:
> >  merge-ort.c | 15 +++++----------
> >  1 file changed, 5 insertions(+), 10 deletions(-)
>
> Both versions of this patch look good to me (in fact, I appreciated
> seeing both v1 and v2, since v1 makes it more obvious what is changing,
> but v2 does the whole thing in a little bit of a cleaner way).

Thanks for taking a look!

> > diff --git a/merge-ort.c b/merge-ort.c
> > index d85b1cd99e9..3d7f9feb6f7 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
> >                                     struct tree *side1,
> >                                     struct tree *side2)
> >  {
> > -     struct diff_queue_struct combined;
> > +     struct diff_queue_struct combined = { 0 };
> >       struct rename_info *renames = &opt->priv->renames;
> > -     int need_dir_renames, s, clean = 1;
> > +     int need_dir_renames, s, i, clean = 1;
>
> And this entire patch looks good to me, but I did wonder about why "i"
> is an int here. Shouldn't it be a size_t instead? Looking at
> diff_queue_struct's definition, both "alloc" and "nr" are signed ints,
> when they should almost certainly be unsigned to avoid overflow.

You may be right, but I'm not sure we're too worried right now about
folks having billions of paths involved in renames; such a repo would
make even the Microsoft monorepos look miniscule.  ;-)
Ævar Arnfjörð Bjarmason June 1, 2022, 10 a.m. UTC | #3
On Sun, Feb 20 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> detect_and_process_renames() detects renames on both sides of history
> and then combines these into a single diff_queue_struct.  The combined
> diff_queue_struct needs to be able to hold the renames found on either
> side, and since it knows the (maximum) size it needs, it pre-emptively
> grows the array to the appropriate size:
>
> 	ALLOC_GROW(combined.queue,
> 		   renames->pairs[1].nr + renames->pairs[2].nr,
> 		   combined.alloc);
>
> It then collects the items from each side:
>
> 	collect_renames(opt, &combined, MERGE_SIDE1, ...)
> 	collect_renames(opt, &combined, MERGE_SIDE2, ...)
>
> Note, though, that collect_renames() sometimes determines that some
> pairs are unnecessary and does not include them in the combined array.
> When it is done, detect_and_process_renames() frees this memory:
>
> 	if (combined.nr) {
>                 ...
> 		free(combined.queue);
>         }
>
> The problem is that sometimes even when there are pairs, none of them
> are necessary.  Instead of checking combined.nr, just remove the
> if-check; free() knows to skip NULL pointers.  This change fixes the
> following memory leak, as reported by valgrind:
>
> ==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
> ==PID==    at 0xADDRESS: malloc
> ==PID==    by 0xADDRESS: realloc
> ==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
> ==PID==    by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134)
> ==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610)
> ==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
> ==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
> ==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
> ==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)
> ==PID==    by 0xADDRESS: cmd_merge (merge.c:1676)
> ==PID==    by 0xADDRESS: run_builtin (git.c:461)
> ==PID==    by 0xADDRESS: handle_builtin (git.c:713)
> ==PID==    by 0xADDRESS: run_argv (git.c:780)
> ==PID==    by 0xADDRESS: cmd_main (git.c:911)
> ==PID==    by 0xADDRESS: main (common-main.c:52)
>
> Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index d85b1cd99e9..3d7f9feb6f7 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
>  				      struct tree *side1,
>  				      struct tree *side2)
>  {
> -	struct diff_queue_struct combined;
> +	struct diff_queue_struct combined = { 0 };
>  	struct rename_info *renames = &opt->priv->renames;
> -	int need_dir_renames, s, clean = 1;
> +	int need_dir_renames, s, i, clean = 1;
>  	unsigned detection_run = 0;
>  
> -	memset(&combined, 0, sizeof(combined));
>  	if (!possible_renames(renames))
>  		goto cleanup;
>  
> @@ -3175,13 +3174,9 @@ simple_cleanup:
>  		free(renames->pairs[s].queue);
>  		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
>  	}
> -	if (combined.nr) {
> -		int i;
> -		for (i = 0; i < combined.nr; i++)
> -			pool_diff_free_filepair(&opt->priv->pool,
> -						combined.queue[i]);
> -		free(combined.queue);
> -	}
> +	for (i = 0; i < combined.nr; i++)
> +		pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
> +	free(combined.queue);
>  
>  	return clean;
>  }
Ævar Arnfjörð Bjarmason June 1, 2022, 10:09 a.m. UTC | #4
On Sun, Feb 20 2022, Elijah Newren via GitGitGadget wrote:

[I fat-fingered an empty E-Mail reply to this earlier in
https://lore.kernel.org/git/220601.86h7541yqj.gmgdl@evledraar.gmail.com/,
sorry!]

> From: Elijah Newren <newren@gmail.com>
>
> detect_and_process_renames() detects renames on both sides of history
> and then combines these into a single diff_queue_struct.  The combined
> diff_queue_struct needs to be able to hold the renames found on either
> side, and since it knows the (maximum) size it needs, it pre-emptively
> grows the array to the appropriate size:
> [...]
> diff --git a/merge-ort.c b/merge-ort.c
> index d85b1cd99e9..3d7f9feb6f7 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3086,12 +3086,11 @@ static int detect_and_process_renames(struct merge_options *opt,
>  				      struct tree *side1,
>  				      struct tree *side2)
>  {
> -	struct diff_queue_struct combined;
> +	struct diff_queue_struct combined = { 0 };
>  	struct rename_info *renames = &opt->priv->renames;
> -	int need_dir_renames, s, clean = 1;
> +	int need_dir_renames, s, i, clean = 1;
>  	unsigned detection_run = 0;
>  
> -	memset(&combined, 0, sizeof(combined));
>  	if (!possible_renames(renames))
>  		goto cleanup;
>  
> @@ -3175,13 +3174,9 @@ simple_cleanup:
>  		free(renames->pairs[s].queue);
>  		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
>  	}
> -	if (combined.nr) {
> -		int i;
> -		for (i = 0; i < combined.nr; i++)
> -			pool_diff_free_filepair(&opt->priv->pool,
> -						combined.queue[i]);
> -		free(combined.queue);
> -	}
> +	for (i = 0; i < combined.nr; i++)
> +		pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
> +	free(combined.queue);
>  
>  	return clean;
>  }

I haven't been able to find whether this actually causes anything bad,
but when I started digging into SANITIZE=leak failures I found that this
change made t6435-merge-sparse.sh flaky in the v2.36.0 release when
compiled with SANITIZE=leak.

I.e. it "should" fail, but with 8d60e9d2010 (merge-ort: fix small memory
leak in detect_and_process_renames(), 2022-02-20) it will sometimes
succeed.

I bisected it between 2.35.0..2.36.0 with:

	git bisect run sh -c 'make SANITIZE=leak && (cd t && ! (for i in $(seq 1 20); do ./t6435-merge-sparse.sh >/dev/null && echo $? || echo $?; done | grep -q 0))'

I.e. "fail if we succeed" (there's some redundancy in the ad-hoc
one-liner). Manually debugging it with:
	
	diff --git a/dir.c b/dir.c
	index d91295f2bcd..3e860769c26 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -878,6 +878,8 @@ void add_pattern(const char *string, const char *base,
	 	unsigned flags;
	 	int nowildcardlen;
	 
	+	fprintf(stderr, "adding pattern %s\n", string);
	+
	 	parse_path_pattern(&string, &patternlen, &flags, &nowildcardlen);
	 	if (flags & PATTERN_FLAG_MUSTBEDIR) {
	 		FLEXPTR_ALLOC_MEM(pattern, pattern, string, patternlen);

Turns up this odd case, i.e. we "should" fail at the end of the "setup"
in this bit of test code:

	[...]
        test_commit_this ours &&
        git config core.sparseCheckout true &&
        echo "/checked-out" >.git/info/sparse-checkout &&
        git reset --hard &&
        test_must_fail git merge their

Here we leak in "git reset", so we "should" get a leak like:
	
	[...]	
	+ git tag ours
	+ git config core.sparseCheckout true
	+ echo /checked-out
	+ git reset --hard
	adding pattern /checked-out
	adding pattern /checked-out
	adding pattern /checked-out
	HEAD is now at 3dfe889 ours
	
	=================================================================
	==25385==ERROR: LeakSanitizer: detected memory leaks
	
	Indirect leak of 512 byte(s) in 1 object(s) allocated from:
	    #0 0x7f9652ef50aa in __interceptor_calloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:90
	    #1 0x6ab6dc in xcalloc /home/avar/g/git/wrapper.c:140
	    #2 0x58e5ac in alloc_table /home/avar/g/git/hashmap.c:79
	    #3 0x58e8e9 in hashmap_init /home/avar/g/git/hashmap.c:168
	    #4 0x569808 in add_patterns_from_buffer /home/avar/g/git/dir.c:1136
	    #5 0x5697a1 in add_patterns /home/avar/g/git/dir.c:1124
	    #6 0x56996a in add_patterns_from_file_to_list /home/avar/g/git/dir.c:1164
	    #7 0x56e0b2 in get_sparse_checkout_patterns /home/avar/g/git/dir.c:3273
	    #8 0x56a240 in init_sparse_checkout_patterns /home/avar/g/git/dir.c:1451
	[...]
	
That's consistent with what I see before, but sometimes it'll succeed
like this:

	[...]		
	+ git tag ours
	+ git config core.sparseCheckout true
	+ echo /checked-out
	+ git reset --hard
	adding pattern /checked-out   
	HEAD is now at 3dfe889 ours   
        [...]

I.e. for some reason the same "reset --hard" now adds one, not three
patterns (which before were all identical).

Now, the "flaky success" with SANITIZE=leak does appear to be new in
8d60e9d2010, but before that running this in a loop reveals that the 2nd
test sometimes succeeds, so perhaps the underlying "issue" isn't new.

I say "issue" because I haven't dug enough to see if this has any impact
on anything, and the failure I discovered doesn't per-se matter now.

But perhaps this observed indeterminism is pointing the way to some
deeper bug here? Or maybe it isn't...
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index d85b1cd99e9..3d7f9feb6f7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3086,12 +3086,11 @@  static int detect_and_process_renames(struct merge_options *opt,
 				      struct tree *side1,
 				      struct tree *side2)
 {
-	struct diff_queue_struct combined;
+	struct diff_queue_struct combined = { 0 };
 	struct rename_info *renames = &opt->priv->renames;
-	int need_dir_renames, s, clean = 1;
+	int need_dir_renames, s, i, clean = 1;
 	unsigned detection_run = 0;
 
-	memset(&combined, 0, sizeof(combined));
 	if (!possible_renames(renames))
 		goto cleanup;
 
@@ -3175,13 +3174,9 @@  simple_cleanup:
 		free(renames->pairs[s].queue);
 		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
 	}
-	if (combined.nr) {
-		int i;
-		for (i = 0; i < combined.nr; i++)
-			pool_diff_free_filepair(&opt->priv->pool,
-						combined.queue[i]);
-		free(combined.queue);
-	}
+	for (i = 0; i < combined.nr; i++)
+		pool_diff_free_filepair(&opt->priv->pool, combined.queue[i]);
+	free(combined.queue);
 
 	return clean;
 }