diff mbox series

clear_pattern_list(): clear embedded hashmaps

Message ID 20200814111049.GA4101811@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 8dc3156373f4e02c1b1f657350ffae8ee94cbf44
Headers show
Series clear_pattern_list(): clear embedded hashmaps | expand

Commit Message

Jeff King Aug. 14, 2020, 11:10 a.m. UTC
Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns,
2019-11-21) added some auxiliary hashmaps to the pattern_list struct,
but they're leaked when clear_pattern_list() is called.

Signed-off-by: Jeff King <peff@peff.net>
---
I have no idea how often this leak triggers in practice. I just noticed
it while poking at LSan output (which we remain depressingly far
from getting a clean run on).

 dir.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Derrick Stolee Aug. 14, 2020, 12:13 p.m. UTC | #1
On 8/14/2020 7:10 AM, Jeff King wrote:
> Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns,
> 2019-11-21) added some auxiliary hashmaps to the pattern_list struct,
> but they're leaked when clear_pattern_list() is called.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I have no idea how often this leak triggers in practice. I just noticed
> it while poking at LSan output (which we remain depressingly far
> from getting a clean run on).

Good find. The impact of the leak is likely low since we don't create
multiple pattern_list structs per process (with these hashmaps) very
often. The sparse-checkout builtin is likely the only place where
multiple could be instantiated at the same time.

I also double-checked that hashmap_free_entries() handles a NULL
hashmap pointer or uninitialized hashmap, which is what happens
when cone mode is not enabled _or_ the pattern_list corresponds to
something like a .gitignore file.

Thanks,
-Stolee

>  dir.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/dir.c b/dir.c
> index fe64be30ed..9411b94e9b 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -916,6 +916,8 @@ void clear_pattern_list(struct pattern_list *pl)
>  		free(pl->patterns[i]);
>  	free(pl->patterns);
>  	free(pl->filebuf);
> +	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
> +	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
>  
>  	memset(pl, 0, sizeof(*pl));
>  }
>
Elijah Newren Aug. 17, 2020, 4:55 p.m. UTC | #2
Hi,

On Fri, Aug 14, 2020 at 5:23 AM Jeff King <peff@peff.net> wrote:
>
> Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns,
> 2019-11-21) added some auxiliary hashmaps to the pattern_list struct,
> but they're leaked when clear_pattern_list() is called.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I have no idea how often this leak triggers in practice. I just noticed
> it while poking at LSan output (which we remain depressingly far
> from getting a clean run on).
>
>  dir.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/dir.c b/dir.c
> index fe64be30ed..9411b94e9b 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -916,6 +916,8 @@ void clear_pattern_list(struct pattern_list *pl)
>                 free(pl->patterns[i]);
>         free(pl->patterns);
>         free(pl->filebuf);
> +       hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
> +       hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);

This clears up the hash entries, but continues to leak the hash table.
Since you submitted first, can you fix this to use hashmap_free_()
instead, as per
https://lore.kernel.org/git/932741d7598ca2934dbca40f715ba2d3819fcc51.1597561152.git.gitgitgadget@gmail.com/?
 Then I'll rebase my series on yours and drop my first patch (since
it'll then be identical).

Thanks,
Elijah
Elijah Newren Aug. 17, 2020, 5:22 p.m. UTC | #3
On Mon, Aug 17, 2020 at 9:55 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi,
>
> On Fri, Aug 14, 2020 at 5:23 AM Jeff King <peff@peff.net> wrote:
> >
> > Commit 96cc8ab531 (sparse-checkout: use hashmaps for cone patterns,
> > 2019-11-21) added some auxiliary hashmaps to the pattern_list struct,
> > but they're leaked when clear_pattern_list() is called.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > I have no idea how often this leak triggers in practice. I just noticed
> > it while poking at LSan output (which we remain depressingly far
> > from getting a clean run on).
> >
> >  dir.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/dir.c b/dir.c
> > index fe64be30ed..9411b94e9b 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -916,6 +916,8 @@ void clear_pattern_list(struct pattern_list *pl)
> >                 free(pl->patterns[i]);
> >         free(pl->patterns);
> >         free(pl->filebuf);
> > +       hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
> > +       hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
>
> This clears up the hash entries, but continues to leak the hash table.
> Since you submitted first, can you fix this to use hashmap_free_()
> instead, as per
> https://lore.kernel.org/git/932741d7598ca2934dbca40f715ba2d3819fcc51.1597561152.git.gitgitgadget@gmail.com/?
>  Then I'll rebase my series on yours and drop my first patch (since
> it'll then be identical).

Nevermind, I got confused once again by the name.
hashmap_free_entries() doesn't mean just free the entries, it means
free what hashmap_free() would plus all the entries, i.e. do what
hashmap_free() *should* *have* *been* defined to do.  Such a confusing
API.  And hashmap_free() really perplexes me -- it seems like a
function that can't possibly be useful; it's sole purpose seems to be
a trap for the unwary.
Jeff King Aug. 17, 2020, 6:48 p.m. UTC | #4
On Mon, Aug 17, 2020 at 10:22:27AM -0700, Elijah Newren wrote:

> > > +       hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
> > > +       hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
> >
> > This clears up the hash entries, but continues to leak the hash table.
> > Since you submitted first, can you fix this to use hashmap_free_()
> > instead, as per
> > https://lore.kernel.org/git/932741d7598ca2934dbca40f715ba2d3819fcc51.1597561152.git.gitgitgadget@gmail.com/?
> >  Then I'll rebase my series on yours and drop my first patch (since
> > it'll then be identical).
> 
> Nevermind, I got confused once again by the name.
> hashmap_free_entries() doesn't mean just free the entries, it means
> free what hashmap_free() would plus all the entries, i.e. do what
> hashmap_free() *should* *have* *been* defined to do.  Such a confusing
> API.  And hashmap_free() really perplexes me -- it seems like a
> function that can't possibly be useful; it's sole purpose seems to be
> a trap for the unwary.

There used to be an "also free entries" flag, but that got complicated
by the loosening of the "hashmap_entry must be at the front of the
struct to be freed" rule.

With this kind of embedded-entry data structure (and list.h is in the
same boat) it _is_ sometimes useful to be part of a data structure
without giving up ownership of the memory. But I agree that the more
normal case is to free items when the hashmap is destroyed.

Likewise, the whole "you have to define a struct that contains the map
entry" thing is flexible and efficient, but a pain to use.

I generally find khash's "map this type to that type, the hash owns the
memory" much more natural. And it doesn't lose efficiency (and indeed
sometimes even gains it) because it uses macros to store concrete types.
But of course macros create their own headaches. :)

Anyway, I'm definitely open to renaming to something more sensible. I
already mentioned the free/clear thing earlier, but
hashmap_clear_entries() ends up _very_ confusing. Because it's clearing
the hashmap but freeing the entries. hashmap_clear_and_free_entries() is
kind of long, but a lot more descriptive.

-Peff
diff mbox series

Patch

diff --git a/dir.c b/dir.c
index fe64be30ed..9411b94e9b 100644
--- a/dir.c
+++ b/dir.c
@@ -916,6 +916,8 @@  void clear_pattern_list(struct pattern_list *pl)
 		free(pl->patterns[i]);
 	free(pl->patterns);
 	free(pl->filebuf);
+	hashmap_free_entries(&pl->recursive_hashmap, struct pattern_entry, ent);
+	hashmap_free_entries(&pl->parent_hashmap, struct pattern_entry, ent);
 
 	memset(pl, 0, sizeof(*pl));
 }