diff mbox series

[v2,01/10] hashmap: add usage documentation explaining hashmap_free[_entries]()

Message ID af6b6fcb460be900d3fffeb743a42f3f87ce6b7f.1602549650.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2,01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() | expand

Commit Message

Elijah Newren Oct. 13, 2020, 12:40 a.m. UTC
From: Elijah Newren <newren@gmail.com>

The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough.  We are dealing with a map table,
entries in that table, and possibly also things each of those entries
point to.  I had to consult other source code examples and the
implementation.  Add a brief note to clarify the differences.  This will
become even more important once we introduce a new
hashmap_partial_clear() function which will add the question of whether
the table itself has been freed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 hashmap.h | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)

Comments

Jeff King Oct. 30, 2020, 12:50 p.m. UTC | #1
On Tue, Oct 13, 2020 at 12:40:41AM +0000, Elijah Newren via GitGitGadget wrote:

> The existence of hashmap_free() and hashmap_free_entries() confused me,
> and the docs weren't clear enough.  We are dealing with a map table,
> entries in that table, and possibly also things each of those entries
> point to.  I had to consult other source code examples and the
> implementation.  Add a brief note to clarify the differences.  This will
> become even more important once we introduce a new
> hashmap_partial_clear() function which will add the question of whether
> the table itself has been freed.

This is a definite improvement, and I don't see any inaccuracies in the
descriptions. I do think some re-naming would help in the long run,
though. E.g.:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

So always call it "clear", but allow options in two dimensions (partial
or not, free entries or not).

Those could be parameters to a single function, but I think it gets a
little ugly because "and_free" requires passing in the type of the
entries in order to find the pointers.

The "not" cases are implied in the names, but hashmap_clear_full() would
be OK with me, too.

But I think in the current scheme that "free" is somewhat overloaded,
and if we end with a "clear" and a "free" that seems confusing to me.

-Peff
Elijah Newren Oct. 30, 2020, 7:55 p.m. UTC | #2
On Fri, Oct 30, 2020 at 5:51 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:41AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > The existence of hashmap_free() and hashmap_free_entries() confused me,
> > and the docs weren't clear enough.  We are dealing with a map table,
> > entries in that table, and possibly also things each of those entries
> > point to.  I had to consult other source code examples and the
> > implementation.  Add a brief note to clarify the differences.  This will
> > become even more important once we introduce a new
> > hashmap_partial_clear() function which will add the question of whether
> > the table itself has been freed.
>
> This is a definite improvement, and I don't see any inaccuracies in the
> descriptions. I do think some re-naming would help in the long run,
> though. E.g.:
>
>   - hashmap_clear() - remove all entries and de-allocate any
>     hashmap-specific data, but be ready for reuse
>
>   - hashmap_clear_and_free() - ditto, but free the entries themselves
>
>   - hashmap_partial_clear() - remove all entries but don't deallocate
>     table
>
>   - hashmap_partial_clear_and_free() - ditto, but free the entries
>
> So always call it "clear", but allow options in two dimensions (partial
> or not, free entries or not).
>
> Those could be parameters to a single function, but I think it gets a
> little ugly because "and_free" requires passing in the type of the
> entries in order to find the pointers.
>
> The "not" cases are implied in the names, but hashmap_clear_full() would
> be OK with me, too.
>
> But I think in the current scheme that "free" is somewhat overloaded,
> and if we end with a "clear" and a "free" that seems confusing to me.

Hmm...there are quite a few calls to hashmap_free() and
hashmap_free_entries() throughout the codebase.  I'm wondering if I
should make switching these over to your new naming suggestions a
separate follow-on series from this one, so that if there are any
conflicts with other series it doesn't need to hold these first 10
patches up.

If I do that, I could also add a patch to convert several callers of
hashmap_init() to use the new HASHMAP_INIT() macro, and another patch
to convert shortlog to using my strset instead of its own.
Jeff King Nov. 3, 2020, 4:26 p.m. UTC | #3
On Fri, Oct 30, 2020 at 12:55:51PM -0700, Elijah Newren wrote:

> > But I think in the current scheme that "free" is somewhat overloaded,
> > and if we end with a "clear" and a "free" that seems confusing to me.
> 
> Hmm...there are quite a few calls to hashmap_free() and
> hashmap_free_entries() throughout the codebase.  I'm wondering if I
> should make switching these over to your new naming suggestions a
> separate follow-on series from this one, so that if there are any
> conflicts with other series it doesn't need to hold these first 10
> patches up.

Yeah, it will definitely need a lot of mechanical fix-up. Those kinds of
conflicts aren't usually a big deal. Junio will have to resolve them,
but if the resolution is easy and mechanical, then it's not likely to
hold up either topic.

> If I do that, I could also add a patch to convert several callers of
> hashmap_init() to use the new HASHMAP_INIT() macro, and another patch
> to convert shortlog to using my strset instead of its own.

Yeah, both would be nice. I'm happy if it comes as part of the series,
or separately on top.

-Peff
Elijah Newren Nov. 3, 2020, 4:48 p.m. UTC | #4
On Tue, Nov 3, 2020 at 8:26 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Oct 30, 2020 at 12:55:51PM -0700, Elijah Newren wrote:
>
> > > But I think in the current scheme that "free" is somewhat overloaded,
> > > and if we end with a "clear" and a "free" that seems confusing to me.
> >
> > Hmm...there are quite a few calls to hashmap_free() and
> > hashmap_free_entries() throughout the codebase.  I'm wondering if I
> > should make switching these over to your new naming suggestions a
> > separate follow-on series from this one, so that if there are any
> > conflicts with other series it doesn't need to hold these first 10
> > patches up.
>
> Yeah, it will definitely need a lot of mechanical fix-up. Those kinds of
> conflicts aren't usually a big deal. Junio will have to resolve them,
> but if the resolution is easy and mechanical, then it's not likely to
> hold up either topic.
>
> > If I do that, I could also add a patch to convert several callers of
> > hashmap_init() to use the new HASHMAP_INIT() macro, and another patch
> > to convert shortlog to using my strset instead of its own.
>
> Yeah, both would be nice. I'm happy if it comes as part of the series,
> or separately on top.

After sending the email, I ended up deciding to convert the callers
just to sanity check the HASHMAP_INIT macro and discovered that the
code will BUG() if you don't also include .do_count_items = 1.  So, I
just decided to include that in the v3 of the series after all.
diff mbox series

Patch

diff --git a/hashmap.h b/hashmap.h
index b011b394fe..2994dc7a9c 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -236,13 +236,40 @@  void hashmap_init(struct hashmap *map,
 void hashmap_free_(struct hashmap *map, ssize_t offset);
 
 /*
- * Frees a hashmap structure and allocated memory, leaves entries undisturbed
+ * Frees a hashmap structure and allocated memory for the table, but does not
+ * free the entries nor anything they point to.
+ *
+ * Usage note:
+ *
+ * Many callers will need to iterate over all entries and free the data each
+ * entry points to; in such a case, they can free the entry itself while at it.
+ * Thus, you might see:
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *      free(e);
+ *    }
+ *    hashmap_free(map);
+ *
+ * instead of
+ *
+ *    hashmap_for_each_entry(map, hashmap_iter, e, hashmap_entry_name) {
+ *      free(e->somefield);
+ *    }
+ *    hashmap_free_entries(map, struct my_entry_struct, hashmap_entry_name);
+ *
+ * to avoid the implicit extra loop over the entries.  However, if there are
+ * no special fields in your entry that need to be freed beyond the entry
+ * itself, it is probably simpler to avoid the explicit loop and just call
+ * hashmap_free_entries().
  */
 #define hashmap_free(map) hashmap_free_(map, -1)
 
 /*
  * Frees @map and all entries.  @type is the struct type of the entry
- * where @member is the hashmap_entry struct used to associate with @map
+ * where @member is the hashmap_entry struct used to associate with @map.
+ *
+ * See usage note above hashmap_free().
  */
 #define hashmap_free_entries(map, type, member) \
 	hashmap_free_(map, offsetof(type, member));