mbox series

[v5,00/15] Add struct strmap and associated utility functions

Message ID pull.835.v5.git.git.1604622298.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Add struct strmap and associated utility functions | expand

Message

John Cai via GitGitGadget Nov. 6, 2020, 12:24 a.m. UTC
Here I introduce new strmap, strintmap, and strset types. This strmap type
was based on Peff's proposal from a couple years ago[1], but has additions
that I made as I used it, and a number of additions/changes suggested by
Peff in his reviews (and Junio in his). I also start the series off with
some changes to hashmap, based on Peff's feedback on v1 & v2.

NOTE: While en/merge-ort-impl depends on this series, there are no changes
in v5 that affect it so en/merge-ort-impl does not need a reroll.

Changes since v4:

 * Make strset_add() return a boolean -- 1 if it added the value to the set,
   0 if the value was already in the set.
 * Add a preparatory patch to the above which adds a create_entry() helper
   function so that strset_add() can bypass strmap_put().
 * Add a patch which updates shortlog to use the new strset API.

[1] 
https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/

Elijah Newren (15):
  hashmap: add usage documentation explaining hashmap_free[_entries]()
  hashmap: adjust spacing to fix argument alignment
  hashmap: allow re-use after hashmap_free()
  hashmap: introduce a new hashmap_partial_clear()
  hashmap: provide deallocation function names
  strmap: new utility functions
  strmap: add more utility functions
  strmap: enable faster clearing and reusing of strmaps
  strmap: add functions facilitating use as a string->int map
  strmap: split create_entry() out of strmap_put()
  strmap: add a strset sub-type
  strmap: enable allocations to come from a mem_pool
  strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  Use new HASHMAP_INIT macro to simplify hashmap initialization
  shortlog: use strset from strmap.h

 Makefile                |   1 +
 add-interactive.c       |   2 +-
 attr.c                  |  26 ++--
 blame.c                 |   2 +-
 bloom.c                 |   5 +-
 builtin/difftool.c      |   9 +-
 builtin/fetch.c         |   6 +-
 builtin/shortlog.c      |  61 +--------
 config.c                |   2 +-
 diff.c                  |   4 +-
 diffcore-rename.c       |   2 +-
 dir.c                   |   8 +-
 hashmap.c               |  74 +++++++----
 hashmap.h               |  91 +++++++++++---
 merge-recursive.c       |   6 +-
 name-hash.c             |   4 +-
 object.c                |   2 +-
 oidmap.c                |   2 +-
 patch-ids.c             |   2 +-
 range-diff.c            |   6 +-
 ref-filter.c            |   2 +-
 revision.c              |  11 +-
 sequencer.c             |   4 +-
 strmap.c                | 178 ++++++++++++++++++++++++++
 strmap.h                | 268 ++++++++++++++++++++++++++++++++++++++++
 submodule-config.c      |   4 +-
 t/helper/test-hashmap.c |   9 +-
 27 files changed, 621 insertions(+), 170 deletions(-)
 create mode 100644 strmap.c
 create mode 100644 strmap.h


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-835%2Fnewren%2Fstrmap-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-835/newren/strmap-v5
Pull-Request: https://github.com/git/git/pull/835

Range-diff vs v4:

  1:  af6b6fcb46 =  1:  af6b6fcb46 hashmap: add usage documentation explaining hashmap_free[_entries]()
  2:  591161fd78 =  2:  591161fd78 hashmap: adjust spacing to fix argument alignment
  3:  f2718d036d =  3:  f2718d036d hashmap: allow re-use after hashmap_free()
  4:  61f1da3c51 =  4:  61f1da3c51 hashmap: introduce a new hashmap_partial_clear()
  5:  861e8d65ae =  5:  861e8d65ae hashmap: provide deallocation function names
  6:  448d3b219f =  6:  448d3b219f strmap: new utility functions
  7:  5e8004c728 =  7:  5e8004c728 strmap: add more utility functions
  8:  fd96e9fc8d =  8:  fd96e9fc8d strmap: enable faster clearing and reusing of strmaps
  9:  f499934f54 =  9:  f499934f54 strmap: add functions facilitating use as a string->int map
  -:  ---------- > 10:  3bcceb8cdb strmap: split create_entry() out of strmap_put()
 10:  ee1ec55f1b ! 11:  e128a71fec strmap: add a strset sub-type
     @@ Commit message
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     + ## strmap.c ##
     +@@ strmap.c: void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
     + 	else
     + 		strintmap_set(map, str, map->default_value + amt);
     + }
     ++
     ++int strset_add(struct strset *set, const char *str)
     ++{
     ++	/*
     ++	 * Cannot use strmap_put() because it'll return NULL in both cases:
     ++	 *   - cannot find str: NULL means "not found"
     ++	 *   - does find str: NULL is the value associated with str
     ++	 */
     ++	struct strmap_entry *entry = find_strmap_entry(&set->map, str);
     ++
     ++	if (entry)
     ++		return 0;
     ++
     ++	entry = create_entry(&set->map, str, NULL);
     ++	hashmap_add(&set->map.map, &entry->ent);
     ++	return 1;
     ++}
     +
       ## strmap.h ##
      @@ strmap.h: int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
       			.map = STRMAP_INIT,   \
     @@ strmap.h: static inline void strintmap_set(struct strintmap *map, const char *st
      +	return strmap_get_size(&set->map);
      +}
      +
     -+static inline void strset_add(struct strset *set, const char *str)
     -+{
     -+	strmap_put(&set->map, str, NULL);
     -+}
     ++/* Returns 1 if str is added to the set; returns 0 if str was already in set */
     ++int strset_add(struct strset *set, const char *str);
      +
       #endif /* STRMAP_H */
 11:  73a57045c3 ! 12:  34f542d9dd strmap: enable allocations to come from a mem_pool
     @@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_values)
       	}
       }
       
     -@@ strmap.c: void *strmap_put(struct strmap *map, const char *str, void *data)
     - 	} else {
     - 		const char *key = str;
     - 
     --		entry = xmalloc(sizeof(*entry));
     -+		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     -+				  : xmalloc(sizeof(*entry));
     - 		hashmap_entry_init(&entry->ent, strhash(str));
     - 
     - 		if (map->strdup_strings)
     --			key = xstrdup(str);
     -+			key = map->pool ? mem_pool_strdup(map->pool, str)
     -+					: xstrdup(str);
     - 		entry->key = key;
     - 		entry->value = data;
     - 		hashmap_add(&map->map, &entry->ent);
     +@@ strmap.c: static struct strmap_entry *create_entry(struct strmap *map,
     + 	struct strmap_entry *entry;
     + 	const char *key = str;
     + 
     +-	entry = xmalloc(sizeof(*entry));
     ++	entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     ++			  : xmalloc(sizeof(*entry));
     + 	hashmap_entry_init(&entry->ent, strhash(str));
     + 
     + 	if (map->strdup_strings)
     +-		key = xstrdup(str);
     ++		key = map->pool ? mem_pool_strdup(map->pool, str)
     ++				: xstrdup(str);
     + 	entry->key = key;
     + 	entry->value = data;
     + 	return entry;
      @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_value)
       		return;
       	if (free_value)
 12:  0352260de4 ! 13:  39ec2fa411 strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
     @@ strmap.c: static void strmap_free_entries_(struct strmap *map, int free_values)
       	}
       }
       
     -@@ strmap.c: void strmap_partial_clear(struct strmap *map, int free_values)
     - void *strmap_put(struct strmap *map, const char *str, void *data)
     +@@ strmap.c: static struct strmap_entry *create_entry(struct strmap *map,
     + 					 void *data)
       {
     - 	struct strmap_entry *entry = find_strmap_entry(map, str);
     --	void *old = NULL;
     + 	struct strmap_entry *entry;
     +-	const char *key = str;
       
     - 	if (entry) {
     --		old = entry->value;
     -+		void *old = entry->value;
     - 		entry->value = data;
     --	} else {
     --		const char *key = str;
     --
     --		entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     --				  : xmalloc(sizeof(*entry));
     --		hashmap_entry_init(&entry->ent, strhash(str));
     -+		return old;
     -+	}
     - 
     --		if (map->strdup_strings)
     --			key = map->pool ? mem_pool_strdup(map->pool, str)
     --					: xstrdup(str);
     --		entry->key = key;
     --		entry->value = data;
     --		hashmap_add(&map->map, &entry->ent);
     +-	entry = map->pool ? mem_pool_alloc(map->pool, sizeof(*entry))
     +-			  : xmalloc(sizeof(*entry));
      +	if (map->strdup_strings) {
      +		if (!map->pool) {
      +			FLEXPTR_ALLOC_STR(entry, key, str);
     @@ strmap.c: void strmap_partial_clear(struct strmap *map, int free_values)
      +		entry = xmalloc(sizeof(*entry));
      +	} else {
      +		entry = mem_pool_alloc(map->pool, sizeof(*entry));
     - 	}
     --	return old;
     -+	hashmap_entry_init(&entry->ent, strhash(str));
     ++	}
     + 	hashmap_entry_init(&entry->ent, strhash(str));
     +-
     +-	if (map->strdup_strings)
     +-		key = map->pool ? mem_pool_strdup(map->pool, str)
     +-				: xstrdup(str);
     +-	entry->key = key;
      +	if (!map->strdup_strings)
      +		entry->key = str;
     -+	entry->value = data;
     -+	hashmap_add(&map->map, &entry->ent);
     -+	return NULL;
     + 	entry->value = data;
     + 	return entry;
       }
     - 
     - struct strmap_entry *strmap_get_entry(struct strmap *map, const char *str)
      @@ strmap.c: void strmap_remove(struct strmap *map, const char *str, int free_value)
       		return;
       	if (free_value)
 13:  617926540b = 14:  d3713d88f2 Use new HASHMAP_INIT macro to simplify hashmap initialization
  -:  ---------- > 15:  24e5ce60f5 shortlog: use strset from strmap.h

Comments

Junio C Hamano Nov. 6, 2020, 2 a.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Changes since v4:
> ...
>  * Add a patch which updates shortlog to use the new strset API.

This makes my life so much simpler ;-)

Would the implementation be very different from Peff's that you can
take the authorship?  Thanks.
Elijah Newren Nov. 6, 2020, 2:42 a.m. UTC | #2
On Thu, Nov 5, 2020 at 6:00 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Changes since v4:
> > ...
> >  * Add a patch which updates shortlog to use the new strset API.
>
> This makes my life so much simpler ;-)
>
> Would the implementation be very different from Peff's that you can
> take the authorship?  Thanks.

Yes; I didn't use his patch, I simply implemented what was needed from
scratch.  I'm not attached to being author of this though; the changes
were trivial.  Feel free to change as you see fit.


If more detail is needed...

There's only two things in my patch: (1) deleting a bunch of code, (2)
search and replace strset_check_and_add() with !strset_add().

His patch has three things: (1) deleting a bunch of code, (2)
introducing strset_dup() [which may have been a copy of my
implementation of strset_check_and_add() from an earlier round of the
series; the code is identical to my implementation, but it's only a
few lines so he might have just reimplemented it identically], (3)
search and replace strset_check_and_add() with strset_dup().

If I were to modify his patch into mine (which I didn't do), it'd
require two things: deleting the strdup() definition and still doing a
search and replace.  In other words, it'd be approximately equivalent
work to just doing the patch from scratch.

Further, I wrote a patch that was nearly the same as my current
submission a few days ago, but it used my old strset_check_and_add().
It triggered some weird windows bug that I think was an infrastructure
flake, but I was worried at the time that it'd require familiarity
with shortlog and its tests to address.  Since I didn't think my
series really depended on that change (shortlog could change to take
advantage of the new strset later), I just dropped it.  Then after
further reviews, the series changed a bit more, and Peff at the end
added a patch to reintroduce strset_check_and_add() with a different
name and use it, then you suggested to modify strset_add() so it can
just be used directly.

So, at the end, taking my existing patch that pre-dated his submission
and tweaking it was the easiest route for me.  I didn't actually look
at his latest patch until after you asked if it was okay for me to
take the authorship.  I see it as two similar from-scratch
implementations that were nearly trivial in either event.
Jeff King Nov. 6, 2020, 2:48 a.m. UTC | #3
On Thu, Nov 05, 2020 at 06:42:38PM -0800, Elijah Newren wrote:

> > > Changes since v4:
> > > ...
> > >  * Add a patch which updates shortlog to use the new strset API.
> >
> > This makes my life so much simpler ;-)
> >
> > Would the implementation be very different from Peff's that you can
> > take the authorship?  Thanks.
> 
> Yes; I didn't use his patch, I simply implemented what was needed from
> scratch.  I'm not attached to being author of this though; the changes
> were trivial.  Feel free to change as you see fit.

Yeah, I am fine either way with the authorship here. The patch is
trivial, and I was pretty sure you had written the same or similar
already. My main point in posting it was to push it over the finish line
so we didn't forget. ;)

-Peff
Junio C Hamano Nov. 6, 2020, 5:32 p.m. UTC | #4
Jeff King <peff@peff.net> writes:

> On Thu, Nov 05, 2020 at 06:42:38PM -0800, Elijah Newren wrote:
>
>> > > Changes since v4:
>> > > ...
>> > >  * Add a patch which updates shortlog to use the new strset API.
>> >
>> > This makes my life so much simpler ;-)
>> >
>> > Would the implementation be very different from Peff's that you can
>> > take the authorship?  Thanks.
>> 
>> Yes; I didn't use his patch, I simply implemented what was needed from
>> scratch.  I'm not attached to being author of this though; the changes
>> were trivial.  Feel free to change as you see fit.
>
> Yeah, I am fine either way with the authorship here. The patch is
> trivial, and I was pretty sure you had written the same or similar
> already. My main point in posting it was to push it over the finish line
> so we didn't forget. ;)

Yes.  I just was double-checking in case Elijah forgot, as I
couldn't tell if it was deliberate.  The way you two agree on
is the best for me, too.

Thanks.