diff mbox series

[15/23] pseudo-merge: fix leaking strmap keys

Message ID 9e1161d55f96253bfeb5cddd1bbd381e2dad8a71.1727687410.git.ps@pks.im (mailing list archive)
State Accepted
Commit d0ab6630a710c8b3f4cf7e1a3f630a685c5d5721
Headers show
Series Memory leak fixes (pt.8) | expand

Commit Message

Patrick Steinhardt Sept. 30, 2024, 9:13 a.m. UTC
When creating a new pseudo-merge group we collect a set of matchnig
commits and put them into a string map. This strmap is initialized such
that it does not allocate its keys, and instead we try to pass ownership
of the keys to it via `strmap_put()`. This isn't how it works though:
the strmap will never try to release these keys, and consequently they
end up leaking.

Fix this leak by initializing the strmap as duplicating its keys and not
trying to hand over ownership.

The leak is exposed by t5333, but plugging it does not yet make the full
test suite pass.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 pseudo-merge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Taylor Blau Sept. 30, 2024, 9:22 p.m. UTC | #1
On Mon, Sep 30, 2024 at 11:13:53AM +0200, Patrick Steinhardt wrote:
> When creating a new pseudo-merge group we collect a set of matchnig

s/matchnig/matching/

> commits and put them into a string map. This strmap is initialized such
> that it does not allocate its keys, and instead we try to pass ownership
> of the keys to it via `strmap_put()`. This isn't how it works though:
> the strmap will never try to release these keys, and consequently they
> end up leaking.
>
> Fix this leak by initializing the strmap as duplicating its keys and not
> trying to hand over ownership.

Hmm. I think I am probably splitting hairs, since a few xstrdup() calls
between friends is unlikely to matter here, but... ;-)

It does seem wasteful if we can avoid it. We already allocated heap
space for the matches via the underlying strbuf, and we really do want
to hand ownership over if we can.

Would doing something like this on top of your previous patch do the
trick?

--- >8 ---
diff --git a/pseudo-merge.c b/pseudo-merge.c
index 28782a31c6..6b6605d3dc 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -110,6 +110,7 @@ void pseudo_merge_group_release(struct pseudo_merge_group *group)
 		free(matches->stable);
 		free(matches->unstable);
 		free(matches);
+		free((char*)e->key);
 	}
 	strmap_clear(&group->matches, 0);
--- 8< ---

That introduces an ugly const-cast, but I think it's tolerable (and may
be moreso with a comment explaining why it's safe).

> diff --git a/pseudo-merge.c b/pseudo-merge.c
> index 28782a31c6..bb59965ed2 100644
> --- a/pseudo-merge.c
> +++ b/pseudo-merge.c
> @@ -87,7 +87,7 @@ static void pseudo_merge_group_init(struct pseudo_merge_group *group)
>  {
>  	memset(group, 0, sizeof(struct pseudo_merge_group));
>
> -	strmap_init_with_options(&group->matches, NULL, 0);
> +	strmap_init_with_options(&group->matches, NULL, 1);
>
>  	group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
>  	group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
> @@ -275,7 +275,7 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
>  		matches = strmap_get(&group->matches, group_name.buf);
>  		if (!matches) {
>  			matches = xcalloc(1, sizeof(*matches));
> -			strmap_put(&group->matches, strbuf_detach(&group_name, NULL),
> +			strmap_put(&group->matches, group_name.buf,
>  				   matches);
>  		}

Otherwise, I think what's written here looks good to me.

Thanks,
Taylor
Patrick Steinhardt Oct. 7, 2024, 9:41 a.m. UTC | #2
On Mon, Sep 30, 2024 at 05:22:28PM -0400, Taylor Blau wrote:
> On Mon, Sep 30, 2024 at 11:13:53AM +0200, Patrick Steinhardt wrote:
> > When creating a new pseudo-merge group we collect a set of matchnig
> 
> s/matchnig/matching/
> 
> > commits and put them into a string map. This strmap is initialized such
> > that it does not allocate its keys, and instead we try to pass ownership
> > of the keys to it via `strmap_put()`. This isn't how it works though:
> > the strmap will never try to release these keys, and consequently they
> > end up leaking.
> >
> > Fix this leak by initializing the strmap as duplicating its keys and not
> > trying to hand over ownership.
> 
> Hmm. I think I am probably splitting hairs, since a few xstrdup() calls
> between friends is unlikely to matter here, but... ;-)
> 
> It does seem wasteful if we can avoid it. We already allocated heap
> space for the matches via the underlying strbuf, and we really do want
> to hand ownership over if we can.
> 
> Would doing something like this on top of your previous patch do the
> trick?
> 
> --- >8 ---
> diff --git a/pseudo-merge.c b/pseudo-merge.c
> index 28782a31c6..6b6605d3dc 100644
> --- a/pseudo-merge.c
> +++ b/pseudo-merge.c
> @@ -110,6 +110,7 @@ void pseudo_merge_group_release(struct pseudo_merge_group *group)
>  		free(matches->stable);
>  		free(matches->unstable);
>  		free(matches);
> +		free((char*)e->key);
>  	}
>  	strmap_clear(&group->matches, 0);
> --- 8< ---
> 
> That introduces an ugly const-cast, but I think it's tolerable (and may
> be moreso with a comment explaining why it's safe).

Well, to me this is a tradeoff between complexity and performance. If
the performance benefit outweighs the additional complexity that this
shared ownership of keys brings along with it then I'm okay with the
code being intimate with each others lifetime requirements.

But as far as I can see the number of entries is determined by the group
pattern, and I expect that in most cases it's going to be quite limited.
So it does not feel like this should lead to all that many extra
allocations, and thus I tend to prefer the simpler solution.

But please let me know in case you disagree with that assessment.

Patrick
Jeff King Oct. 8, 2024, 8:54 a.m. UTC | #3
On Mon, Oct 07, 2024 at 11:41:16AM +0200, Patrick Steinhardt wrote:

> > Hmm. I think I am probably splitting hairs, since a few xstrdup() calls
> > between friends is unlikely to matter here, but... ;-)
> > 
> > It does seem wasteful if we can avoid it. We already allocated heap
> > space for the matches via the underlying strbuf, and we really do want
> > to hand ownership over if we can.
> > 
> > Would doing something like this on top of your previous patch do the
> > trick?
> > 
> > --- >8 ---
> > diff --git a/pseudo-merge.c b/pseudo-merge.c
> > index 28782a31c6..6b6605d3dc 100644
> > --- a/pseudo-merge.c
> > +++ b/pseudo-merge.c
> > @@ -110,6 +110,7 @@ void pseudo_merge_group_release(struct pseudo_merge_group *group)
> >  		free(matches->stable);
> >  		free(matches->unstable);
> >  		free(matches);
> > +		free((char*)e->key);
> >  	}
> >  	strmap_clear(&group->matches, 0);
> > --- 8< ---
> > 
> > That introduces an ugly const-cast, but I think it's tolerable (and may
> > be moreso with a comment explaining why it's safe).
> 
> Well, to me this is a tradeoff between complexity and performance. If
> the performance benefit outweighs the additional complexity that this
> shared ownership of keys brings along with it then I'm okay with the
> code being intimate with each others lifetime requirements.
> 
> But as far as I can see the number of entries is determined by the group
> pattern, and I expect that in most cases it's going to be quite limited.
> So it does not feel like this should lead to all that many extra
> allocations, and thus I tend to prefer the simpler solution.

I'd tend to agree with you that the allocations aren't a big deal here.
But I think we've run into this kind of strbuf-detaching thing before,
and there's another pattern that is efficient without getting too
intimate between modules. Like this (plus your change to set the
strmap's strdup_strings mode):

diff --git a/pseudo-merge.c b/pseudo-merge.c
index 10ebd9a4e9..fb1164edfa 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -210,6 +210,7 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
 	struct bitmap_writer *writer = _data;
 	struct object_id peeled;
 	struct commit *c;
+	struct strbuf group_name = STRBUF_INIT;
 	uint32_t i;
 	int has_bitmap;
 
@@ -227,10 +228,11 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
 	for (i = 0; i < writer->pseudo_merge_groups.nr; i++) {
 		struct pseudo_merge_group *group;
 		struct pseudo_merge_matches *matches;
-		struct strbuf group_name = STRBUF_INIT;
 		regmatch_t captures[16];
 		size_t j;
 
+		strbuf_reset(&group_name);
+
 		group = writer->pseudo_merge_groups.items[i].util;
 		if (regexec(group->pattern, refname, ARRAY_SIZE(captures),
 			    captures, 0))
@@ -256,8 +258,7 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
 		matches = strmap_get(&group->matches, group_name.buf);
 		if (!matches) {
 			matches = xcalloc(1, sizeof(*matches));
-			strmap_put(&group->matches, strbuf_detach(&group_name, NULL),
-				   matches);
+			strmap_put(&group->matches, group_name.buf, matches);
 		}
 
 		if (c->date <= group->stable_threshold) {
@@ -270,9 +271,9 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
 			matches->unstable[matches->unstable_nr++] = c;
 		}
 
-		strbuf_release(&group_name);
 	}
 
+	strbuf_release(&group_name);
 	return 0;
 }
 

Now we skip the duplicate allocations because we are reusing the
group_name scratch buffer in the loop over and over. But wait, there's
more! We're actually _more_ efficient than the original which did one
allocation per entry, because:

  1. We can allocate the correct number of bytes for each name, rather
     than using the over-estimated buffer made by strbuf.

  2. In strdup_strings mode, strmap is smart enough to use a FLEXPTR to
     stick the name inside the struct. So we've actually reduced the
     number of allocations per entry by 1.

Now possibly it is not even worth doing this optimization, because this
is not really a hot path. But it doesn't violate any module boundaries,
and I think the "loop over a reusable strbuf" trick is a general idiom
for solving these kinds of allocation problems. So a good thing to keep
in our toolbox.

-Peff
Patrick Steinhardt Oct. 9, 2024, 7:06 a.m. UTC | #4
On Tue, Oct 08, 2024 at 04:54:23AM -0400, Jeff King wrote:
> On Mon, Oct 07, 2024 at 11:41:16AM +0200, Patrick Steinhardt wrote:
> 
> > > Hmm. I think I am probably splitting hairs, since a few xstrdup() calls
> > > between friends is unlikely to matter here, but... ;-)
> > > 
> > > It does seem wasteful if we can avoid it. We already allocated heap
> > > space for the matches via the underlying strbuf, and we really do want
> > > to hand ownership over if we can.
> > > 
> > > Would doing something like this on top of your previous patch do the
> > > trick?
> > > 
> > > --- >8 ---
> > > diff --git a/pseudo-merge.c b/pseudo-merge.c
> > > index 28782a31c6..6b6605d3dc 100644
> > > --- a/pseudo-merge.c
> > > +++ b/pseudo-merge.c
> > > @@ -110,6 +110,7 @@ void pseudo_merge_group_release(struct pseudo_merge_group *group)
> > >  		free(matches->stable);
> > >  		free(matches->unstable);
> > >  		free(matches);
> > > +		free((char*)e->key);
> > >  	}
> > >  	strmap_clear(&group->matches, 0);
> > > --- 8< ---
> > > 
> > > That introduces an ugly const-cast, but I think it's tolerable (and may
> > > be moreso with a comment explaining why it's safe).
> > 
> > Well, to me this is a tradeoff between complexity and performance. If
> > the performance benefit outweighs the additional complexity that this
> > shared ownership of keys brings along with it then I'm okay with the
> > code being intimate with each others lifetime requirements.
> > 
> > But as far as I can see the number of entries is determined by the group
> > pattern, and I expect that in most cases it's going to be quite limited.
> > So it does not feel like this should lead to all that many extra
> > allocations, and thus I tend to prefer the simpler solution.
> 
> I'd tend to agree with you that the allocations aren't a big deal here.
> But I think we've run into this kind of strbuf-detaching thing before,
> and there's another pattern that is efficient without getting too
> intimate between modules. Like this (plus your change to set the
> strmap's strdup_strings mode):
> 
> diff --git a/pseudo-merge.c b/pseudo-merge.c
> index 10ebd9a4e9..fb1164edfa 100644
> --- a/pseudo-merge.c
> +++ b/pseudo-merge.c
> @@ -210,6 +210,7 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
>  	struct bitmap_writer *writer = _data;
>  	struct object_id peeled;
>  	struct commit *c;
> +	struct strbuf group_name = STRBUF_INIT;
>  	uint32_t i;
>  	int has_bitmap;
>  
> @@ -227,10 +228,11 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
>  	for (i = 0; i < writer->pseudo_merge_groups.nr; i++) {
>  		struct pseudo_merge_group *group;
>  		struct pseudo_merge_matches *matches;
> -		struct strbuf group_name = STRBUF_INIT;
>  		regmatch_t captures[16];
>  		size_t j;
>  
> +		strbuf_reset(&group_name);
> +
>  		group = writer->pseudo_merge_groups.items[i].util;
>  		if (regexec(group->pattern, refname, ARRAY_SIZE(captures),
>  			    captures, 0))
> @@ -256,8 +258,7 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
>  		matches = strmap_get(&group->matches, group_name.buf);
>  		if (!matches) {
>  			matches = xcalloc(1, sizeof(*matches));
> -			strmap_put(&group->matches, strbuf_detach(&group_name, NULL),
> -				   matches);
> +			strmap_put(&group->matches, group_name.buf, matches);
>  		}
>  
>  		if (c->date <= group->stable_threshold) {
> @@ -270,9 +271,9 @@ static int find_pseudo_merge_group_for_ref(const char *refname,
>  			matches->unstable[matches->unstable_nr++] = c;
>  		}
>  
> -		strbuf_release(&group_name);
>  	}
>  
> +	strbuf_release(&group_name);
>  	return 0;
>  }
>  
> 
> Now we skip the duplicate allocations because we are reusing the
> group_name scratch buffer in the loop over and over. But wait, there's
> more! We're actually _more_ efficient than the original which did one
> allocation per entry, because:
> 
>   1. We can allocate the correct number of bytes for each name, rather
>      than using the over-estimated buffer made by strbuf.
> 
>   2. In strdup_strings mode, strmap is smart enough to use a FLEXPTR to
>      stick the name inside the struct. So we've actually reduced the
>      number of allocations per entry by 1.
> 
> Now possibly it is not even worth doing this optimization, because this
> is not really a hot path. But it doesn't violate any module boundaries,
> and I think the "loop over a reusable strbuf" trick is a general idiom
> for solving these kinds of allocation problems. So a good thing to keep
> in our toolbox.

Nice, thanks for digging! I see that Junio has already merged the topic
to `next` though -- is this a mere "Let this cook before the next cycle
starts" or will it stay in next?

If the latter then I'll leave this as-is, otherwise I'll send a revised
version to make this a bit more efficient.

Thanks!

Patrick
diff mbox series

Patch

diff --git a/pseudo-merge.c b/pseudo-merge.c
index 28782a31c6..bb59965ed2 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -87,7 +87,7 @@  static void pseudo_merge_group_init(struct pseudo_merge_group *group)
 {
 	memset(group, 0, sizeof(struct pseudo_merge_group));
 
-	strmap_init_with_options(&group->matches, NULL, 0);
+	strmap_init_with_options(&group->matches, NULL, 1);
 
 	group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
 	group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES;
@@ -275,7 +275,7 @@  static int find_pseudo_merge_group_for_ref(const char *refname,
 		matches = strmap_get(&group->matches, group_name.buf);
 		if (!matches) {
 			matches = xcalloc(1, sizeof(*matches));
-			strmap_put(&group->matches, strbuf_detach(&group_name, NULL),
+			strmap_put(&group->matches, group_name.buf,
 				   matches);
 		}