diff mbox series

[v2,08/10] strmap: add functions facilitating use as a string->int map

Message ID cc8d702f98f94fd9202d227d19be43a912467e84.1602549650.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Add struct strmap and associated utility functions | expand

Commit Message

Elijah Newren Oct. 13, 2020, 12:40 a.m. UTC
From: Elijah Newren <newren@gmail.com>

Although strmap could be used as a string->int map, one either had to
allocate an int for every entry and then deallocate later, or one had to
do a bunch of casting between (void*) and (intptr_t).

Add some special functions that do the casting.  Also, rename put->set
for such wrapper functions since 'put' implied there may be some
deallocation needed if the string was already found in the map, which
isn't the case when we're storing an int value directly in the void*
slot instead of using the void* slot as a pointer to data.

A note on the name: if anyone has a better name suggestion than
strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
not been able to come up with a better name.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 strmap.c | 11 ++++++++
 strmap.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

Comments

Jeff King Oct. 30, 2020, 2:39 p.m. UTC | #1
On Tue, Oct 13, 2020 at 12:40:48AM +0000, Elijah Newren via GitGitGadget wrote:

> Although strmap could be used as a string->int map, one either had to
> allocate an int for every entry and then deallocate later, or one had to
> do a bunch of casting between (void*) and (intptr_t).
> 
> Add some special functions that do the casting.  Also, rename put->set
> for such wrapper functions since 'put' implied there may be some
> deallocation needed if the string was already found in the map, which
> isn't the case when we're storing an int value directly in the void*
> slot instead of using the void* slot as a pointer to data.

I think this is worth doing. That kind of casting is an implementation
detail, and it's nice for callers not to have to see it.

You might want to mention that this _could_ be done as just accessors to
strmap, but using a separate struct provides type safety against
misusing pointers as integers or vice versa.

> A note on the name: if anyone has a better name suggestion than
> strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
> not been able to come up with a better name.

I still don't have a better suggestion on the name. Another convention
could be to name map types as "map_from_to". So "struct map_str_int".
But it's pretty ugly, and strmap would become "map_str_ptr" or
something. As ugly as "strintmap" is, I like it better.

> +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
> +{
> +	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
> +	if (entry) {
> +		intptr_t *whence = (intptr_t*)&entry->value;
> +		*whence += amt;
> +	}
> +	else
> +		strintmap_set(map, str, amt);
> +}

Incrementing a missing entry auto-vivifies it at 0.  That makes perfect
sense, but might be worth noting above the function in the header file.

Though maybe it's a little weird since strintmap_get() takes a default
value. Why don't we use that here? I'd have to see how its used, but
would it make sense to set a default value when initializing the map,
rather than providing it on each call?

> +/*
> + * strintmap:
> + *    A map of string -> int, typecasting the void* of strmap to an int.

Are the size and signedness of an int flexible enough for all uses?

I doubt the standard makes any promises about the relationship between
intptr_t and int, but I'd be surprised if any modern platform has an
intptr_t that isn't at least as big as an int (on most 32-bit platforms
they'll be the same, and on 64-bit ones intptr_t is strictly bigger).

Would any callers care about using the full 32-bits, though? I.e., would
they prefer casting through uintptr_t to an "unsigned int"?

-Peff
Elijah Newren Oct. 30, 2020, 5:28 p.m. UTC | #2
On Fri, Oct 30, 2020 at 7:39 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:48AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Although strmap could be used as a string->int map, one either had to
> > allocate an int for every entry and then deallocate later, or one had to
> > do a bunch of casting between (void*) and (intptr_t).
> >
> > Add some special functions that do the casting.  Also, rename put->set
> > for such wrapper functions since 'put' implied there may be some
> > deallocation needed if the string was already found in the map, which
> > isn't the case when we're storing an int value directly in the void*
> > slot instead of using the void* slot as a pointer to data.
>
> I think this is worth doing. That kind of casting is an implementation
> detail, and it's nice for callers not to have to see it.
>
> You might want to mention that this _could_ be done as just accessors to
> strmap, but using a separate struct provides type safety against
> misusing pointers as integers or vice versa.

If I just did it as accessors, it makes it harder for myself and
others to remember what my huge piles of strmaps in merge-ort do; I
found that it became easier to follow the code and remember what
things were doing when some were marked as strmap, some as strintmap,
and some as strset.

> > A note on the name: if anyone has a better name suggestion than
> > strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
> > not been able to come up with a better name.
>
> I still don't have a better suggestion on the name. Another convention
> could be to name map types as "map_from_to". So "struct map_str_int".
> But it's pretty ugly, and strmap would become "map_str_ptr" or
> something. As ugly as "strintmap" is, I like it better.
>
> > +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(&map->map, str);
> > +     if (entry) {
> > +             intptr_t *whence = (intptr_t*)&entry->value;
> > +             *whence += amt;
> > +     }
> > +     else
> > +             strintmap_set(map, str, amt);
> > +}
>
> Incrementing a missing entry auto-vivifies it at 0.  That makes perfect
> sense, but might be worth noting above the function in the header file.
>
> Though maybe it's a little weird since strintmap_get() takes a default
> value. Why don't we use that here? I'd have to see how its used, but
> would it make sense to set a default value when initializing the map,
> rather than providing it on each call?

That probably makes sense.  It turns out there is one strintmap for
which I call strintmap_get() in two different places with different
default values, but I think I can fix that up (one of them really
needed -1 as the default, while the other callsite just needed the
default to not accidentally match a specific enum value and 0 was
convenient).

>
> > +/*
> > + * strintmap:
> > + *    A map of string -> int, typecasting the void* of strmap to an int.
>
> Are the size and signedness of an int flexible enough for all uses?

If some users want signed values and others want unsigned, I'm not
sure how we can satisfy both.  Maybe make a struintmap?

Perhaps that could be added later if uses come up for it?  Some of my
uses need int, the rest of them wouldn't care about int vs unsigned.

> I doubt the standard makes any promises about the relationship between
> intptr_t and int, but I'd be surprised if any modern platform has an
> intptr_t that isn't at least as big as an int (on most 32-bit platforms
> they'll be the same, and on 64-bit ones intptr_t is strictly bigger).
>
> Would any callers care about using the full 32-bits, though? I.e., would
> they prefer casting through uintptr_t to an "unsigned int"?

I don't care about the full 32 bits (I'll probably use less than 16),
but I absolutely wanted it signed for my uses.  I think it makes sense
to be signed when using it for an index within an array (-1 for "not
found" makes sense; using arbitrary large numbers seems really ugly
(and perhaps buggy) to me).  It also makes sense to me to use -1 as an
invalid enum value, though I guess I could technically specify an
additional "INVALID_VALUE" within the enum and use it as the default.

If someone does care about the full range of bits up to 64 on relevant
platforms, I guess I should make it strintptr_t_map.  But besides the
egregiously ugly name, one advantage of int over intptr_t (or unsigned
over uintptr_t) is that you can use it in a printf easily:
   printf("Size: %d\n", strintmap_get(&foo, 0));
whereas if it strintmap_get() returns an intptr_t, then it's a royal
mess to attempt to portably use it without adding additional manual
casts.  Maybe I was just missing something obvious, but I couldn't
figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the
statement to compile on all platforms, so I'd always just cast to int
or unsigned at the time of calling printf.
Jeff King Nov. 3, 2020, 4:20 p.m. UTC | #3
On Fri, Oct 30, 2020 at 10:28:51AM -0700, Elijah Newren wrote:

> > You might want to mention that this _could_ be done as just accessors to
> > strmap, but using a separate struct provides type safety against
> > misusing pointers as integers or vice versa.
> 
> If I just did it as accessors, it makes it harder for myself and
> others to remember what my huge piles of strmaps in merge-ort do; I
> found that it became easier to follow the code and remember what
> things were doing when some were marked as strmap, some as strintmap,
> and some as strset.

Oh, I'm definitely on board with that argument. I was just suggesting
you might want to put it in the commit message for posterity.

> > > +/*
> > > + * strintmap:
> > > + *    A map of string -> int, typecasting the void* of strmap to an int.
> >
> > Are the size and signedness of an int flexible enough for all uses?
> 
> If some users want signed values and others want unsigned, I'm not
> sure how we can satisfy both.  Maybe make a struintmap?

Right, that was sort of my question: do your users actually want it
signed or not. Sounds like they do want it signed, and don't mind the
loss of range.

> Perhaps that could be added later if uses come up for it?  Some of my
> uses need int, the rest of them wouldn't care about int vs unsigned.

Yeah, if you don't have any callers which care, I'd definitely punt on
it for now.

> If someone does care about the full range of bits up to 64 on relevant
> platforms, I guess I should make it strintptr_t_map.

Yeah, that's what I was wondering. I suspect the use case for that is
pretty narrow, though. If you really care about having a 64-bit value
for some data, then you probably want it _everywhere_, not just on
64-bit platforms. I guess the exception would be if you're mapping into
size_t's or something.

I think my question was as much "did you think about range issues for
your intended users" as "should we provide more range in this map type".
And it sounds like you have thought about that, so I'm happy proceeding.

> But besides the
> egregiously ugly name, one advantage of int over intptr_t (or unsigned
> over uintptr_t) is that you can use it in a printf easily:
>    printf("Size: %d\n", strintmap_get(&foo, 0));
> whereas if it strintmap_get() returns an intptr_t, then it's a royal
> mess to attempt to portably use it without adding additional manual
> casts.  Maybe I was just missing something obvious, but I couldn't
> figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the
> statement to compile on all platforms, so I'd always just cast to int
> or unsigned at the time of calling printf.

The right way is:

  printf("Size: %"PRIdMAX", (intmax_t) your_intptr_t);

which will always do the right thing no matter the size (at the minor
cost of passing a larger-than-necessary parameter, but if you're
micro-optimizing then calling printf at all is probably already a
problem).

But yeah, in general using a real "int" is much more convenient and if
there's no reason to avoid it for range problems, I think it's
preferable.

-Peff
Elijah Newren Nov. 3, 2020, 4:46 p.m. UTC | #4
On Tue, Nov 3, 2020 at 8:20 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Oct 30, 2020 at 10:28:51AM -0700, Elijah Newren wrote:
>
> > > You might want to mention that this _could_ be done as just accessors to
> > > strmap, but using a separate struct provides type safety against
> > > misusing pointers as integers or vice versa.
> >
> > If I just did it as accessors, it makes it harder for myself and
> > others to remember what my huge piles of strmaps in merge-ort do; I
> > found that it became easier to follow the code and remember what
> > things were doing when some were marked as strmap, some as strintmap,
> > and some as strset.
>
> Oh, I'm definitely on board with that argument. I was just suggesting
> you might want to put it in the commit message for posterity.
>
> > > > +/*
> > > > + * strintmap:
> > > > + *    A map of string -> int, typecasting the void* of strmap to an int.
> > >
> > > Are the size and signedness of an int flexible enough for all uses?
> >
> > If some users want signed values and others want unsigned, I'm not
> > sure how we can satisfy both.  Maybe make a struintmap?
>
> Right, that was sort of my question: do your users actually want it
> signed or not. Sounds like they do want it signed, and don't mind the
> loss of range.
>
> > Perhaps that could be added later if uses come up for it?  Some of my
> > uses need int, the rest of them wouldn't care about int vs unsigned.
>
> Yeah, if you don't have any callers which care, I'd definitely punt on
> it for now.
>
> > If someone does care about the full range of bits up to 64 on relevant
> > platforms, I guess I should make it strintptr_t_map.
>
> Yeah, that's what I was wondering. I suspect the use case for that is
> pretty narrow, though. If you really care about having a 64-bit value
> for some data, then you probably want it _everywhere_, not just on
> 64-bit platforms. I guess the exception would be if you're mapping into
> size_t's or something.
>
> I think my question was as much "did you think about range issues for
> your intended users" as "should we provide more range in this map type".
> And it sounds like you have thought about that, so I'm happy proceeding.
>
> > But besides the
> > egregiously ugly name, one advantage of int over intptr_t (or unsigned
> > over uintptr_t) is that you can use it in a printf easily:
> >    printf("Size: %d\n", strintmap_get(&foo, 0));
> > whereas if it strintmap_get() returns an intptr_t, then it's a royal
> > mess to attempt to portably use it without adding additional manual
> > casts.  Maybe I was just missing something obvious, but I couldn't
> > figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the
> > statement to compile on all platforms, so I'd always just cast to int
> > or unsigned at the time of calling printf.
>
> The right way is:
>
>   printf("Size: %"PRIdMAX", (intmax_t) your_intptr_t);

Ah, intmax_t; that's what I was missing.

> which will always do the right thing no matter the size (at the minor
> cost of passing a larger-than-necessary parameter, but if you're
> micro-optimizing then calling printf at all is probably already a
> problem).
>
> But yeah, in general using a real "int" is much more convenient and if
> there's no reason to avoid it for range problems, I think it's
> preferable.

Yep, I like the simplicity of "int", the signedness of "int" and it
has far more than enough range on all platforms (most my strintmaps
actually map to enum values, but my largest int usage is for counting
up to at most how many files are involved in rename detection.  Even
microsoft repos only have a number of files present in the repository
that registers in the low millions, and I'm only dealing with the
subset of those files involved in rename detection, which should be
much smaller).
diff mbox series

Patch

diff --git a/strmap.c b/strmap.c
index 47cbf11ec7..d5003a79e3 100644
--- a/strmap.c
+++ b/strmap.c
@@ -126,3 +126,14 @@  void strmap_remove(struct strmap *map, const char *str, int free_util)
 		free((char*)ret->key);
 	free(ret);
 }
+
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
+{
+	struct strmap_entry *entry = find_strmap_entry(&map->map, str);
+	if (entry) {
+		intptr_t *whence = (intptr_t*)&entry->value;
+		*whence += amt;
+	}
+	else
+		strintmap_set(map, str, amt);
+}
diff --git a/strmap.h b/strmap.h
index 5bb7650d65..fe15e74b78 100644
--- a/strmap.h
+++ b/strmap.h
@@ -98,4 +98,84 @@  static inline unsigned int strmap_get_size(struct strmap *map)
 		var = hashmap_iter_next_entry_offset(iter, \
 						     OFFSETOF_VAR(var, ent)))
 
+
+/*
+ * strintmap:
+ *    A map of string -> int, typecasting the void* of strmap to an int.
+ *
+ * Primary differences:
+ *    1) Since the void* value is just an int in disguise, there is no value
+ *       to free.  (Thus one fewer argument to strintmap_clear)
+ *    2) strintmap_get() returns an int; it also requires an extra parameter to
+ *       be specified so it knows what value to return if the underlying strmap
+ *       has not key matching the given string.
+ *    3) No strmap_put() equivalent; strintmap_set() and strintmap_incr()
+ *       instead.
+ */
+
+struct strintmap {
+	struct strmap map;
+};
+
+#define strintmap_for_each_entry(mystrmap, iter, var)	\
+	strmap_for_each_entry(&(mystrmap)->map, iter, var)
+
+static inline void strintmap_init(struct strintmap *map)
+{
+	strmap_init(&map->map);
+}
+
+static inline void strintmap_ocd_init(struct strintmap *map,
+				      int strdup_strings)
+{
+	strmap_ocd_init(&map->map, strdup_strings);
+}
+
+static inline void strintmap_clear(struct strintmap *map)
+{
+	strmap_clear(&map->map, 0);
+}
+
+static inline void strintmap_partial_clear(struct strintmap *map)
+{
+	strmap_partial_clear(&map->map, 0);
+}
+
+static inline int strintmap_contains(struct strintmap *map, const char *str)
+{
+	return strmap_contains(&map->map, str);
+}
+
+static inline void strintmap_remove(struct strintmap *map, const char *str)
+{
+	return strmap_remove(&map->map, str, 0);
+}
+
+static inline int strintmap_empty(struct strintmap *map)
+{
+	return strmap_empty(&map->map);
+}
+
+static inline unsigned int strintmap_get_size(struct strintmap *map)
+{
+	return strmap_get_size(&map->map);
+}
+
+static inline int strintmap_get(struct strintmap *map, const char *str,
+				int default_value)
+{
+	struct strmap_entry *result = strmap_get_entry(&map->map, str);
+	if (!result)
+		return default_value;
+	return (intptr_t)result->value;
+}
+
+static inline void strintmap_set(struct strintmap *map, const char *str,
+				 intptr_t v)
+{
+	strmap_put(&map->map, str, (void *)v);
+}
+
+void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt);
+
 #endif /* STRMAP_H */