Message ID | cc8d702f98f94fd9202d227d19be43a912467e84.1602549650.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add struct strmap and associated utility functions | expand |
On Tue, Oct 13, 2020 at 12:40:48AM +0000, Elijah Newren via GitGitGadget wrote: > Although strmap could be used as a string->int map, one either had to > allocate an int for every entry and then deallocate later, or one had to > do a bunch of casting between (void*) and (intptr_t). > > Add some special functions that do the casting. Also, rename put->set > for such wrapper functions since 'put' implied there may be some > deallocation needed if the string was already found in the map, which > isn't the case when we're storing an int value directly in the void* > slot instead of using the void* slot as a pointer to data. I think this is worth doing. That kind of casting is an implementation detail, and it's nice for callers not to have to see it. You might want to mention that this _could_ be done as just accessors to strmap, but using a separate struct provides type safety against misusing pointers as integers or vice versa. > A note on the name: if anyone has a better name suggestion than > strintmap, I'm happy to take it. It seems slightly unwieldy, but I have > not been able to come up with a better name. I still don't have a better suggestion on the name. Another convention could be to name map types as "map_from_to". So "struct map_str_int". But it's pretty ugly, and strmap would become "map_str_ptr" or something. As ugly as "strintmap" is, I like it better. > +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt) > +{ > + struct strmap_entry *entry = find_strmap_entry(&map->map, str); > + if (entry) { > + intptr_t *whence = (intptr_t*)&entry->value; > + *whence += amt; > + } > + else > + strintmap_set(map, str, amt); > +} Incrementing a missing entry auto-vivifies it at 0. That makes perfect sense, but might be worth noting above the function in the header file. Though maybe it's a little weird since strintmap_get() takes a default value. Why don't we use that here? I'd have to see how its used, but would it make sense to set a default value when initializing the map, rather than providing it on each call? > +/* > + * strintmap: > + * A map of string -> int, typecasting the void* of strmap to an int. Are the size and signedness of an int flexible enough for all uses? I doubt the standard makes any promises about the relationship between intptr_t and int, but I'd be surprised if any modern platform has an intptr_t that isn't at least as big as an int (on most 32-bit platforms they'll be the same, and on 64-bit ones intptr_t is strictly bigger). Would any callers care about using the full 32-bits, though? I.e., would they prefer casting through uintptr_t to an "unsigned int"? -Peff
On Fri, Oct 30, 2020 at 7:39 AM Jeff King <peff@peff.net> wrote: > > On Tue, Oct 13, 2020 at 12:40:48AM +0000, Elijah Newren via GitGitGadget wrote: > > > Although strmap could be used as a string->int map, one either had to > > allocate an int for every entry and then deallocate later, or one had to > > do a bunch of casting between (void*) and (intptr_t). > > > > Add some special functions that do the casting. Also, rename put->set > > for such wrapper functions since 'put' implied there may be some > > deallocation needed if the string was already found in the map, which > > isn't the case when we're storing an int value directly in the void* > > slot instead of using the void* slot as a pointer to data. > > I think this is worth doing. That kind of casting is an implementation > detail, and it's nice for callers not to have to see it. > > You might want to mention that this _could_ be done as just accessors to > strmap, but using a separate struct provides type safety against > misusing pointers as integers or vice versa. If I just did it as accessors, it makes it harder for myself and others to remember what my huge piles of strmaps in merge-ort do; I found that it became easier to follow the code and remember what things were doing when some were marked as strmap, some as strintmap, and some as strset. > > A note on the name: if anyone has a better name suggestion than > > strintmap, I'm happy to take it. It seems slightly unwieldy, but I have > > not been able to come up with a better name. > > I still don't have a better suggestion on the name. Another convention > could be to name map types as "map_from_to". So "struct map_str_int". > But it's pretty ugly, and strmap would become "map_str_ptr" or > something. As ugly as "strintmap" is, I like it better. > > > +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt) > > +{ > > + struct strmap_entry *entry = find_strmap_entry(&map->map, str); > > + if (entry) { > > + intptr_t *whence = (intptr_t*)&entry->value; > > + *whence += amt; > > + } > > + else > > + strintmap_set(map, str, amt); > > +} > > Incrementing a missing entry auto-vivifies it at 0. That makes perfect > sense, but might be worth noting above the function in the header file. > > Though maybe it's a little weird since strintmap_get() takes a default > value. Why don't we use that here? I'd have to see how its used, but > would it make sense to set a default value when initializing the map, > rather than providing it on each call? That probably makes sense. It turns out there is one strintmap for which I call strintmap_get() in two different places with different default values, but I think I can fix that up (one of them really needed -1 as the default, while the other callsite just needed the default to not accidentally match a specific enum value and 0 was convenient). > > > +/* > > + * strintmap: > > + * A map of string -> int, typecasting the void* of strmap to an int. > > Are the size and signedness of an int flexible enough for all uses? If some users want signed values and others want unsigned, I'm not sure how we can satisfy both. Maybe make a struintmap? Perhaps that could be added later if uses come up for it? Some of my uses need int, the rest of them wouldn't care about int vs unsigned. > I doubt the standard makes any promises about the relationship between > intptr_t and int, but I'd be surprised if any modern platform has an > intptr_t that isn't at least as big as an int (on most 32-bit platforms > they'll be the same, and on 64-bit ones intptr_t is strictly bigger). > > Would any callers care about using the full 32-bits, though? I.e., would > they prefer casting through uintptr_t to an "unsigned int"? I don't care about the full 32 bits (I'll probably use less than 16), but I absolutely wanted it signed for my uses. I think it makes sense to be signed when using it for an index within an array (-1 for "not found" makes sense; using arbitrary large numbers seems really ugly (and perhaps buggy) to me). It also makes sense to me to use -1 as an invalid enum value, though I guess I could technically specify an additional "INVALID_VALUE" within the enum and use it as the default. If someone does care about the full range of bits up to 64 on relevant platforms, I guess I should make it strintptr_t_map. But besides the egregiously ugly name, one advantage of int over intptr_t (or unsigned over uintptr_t) is that you can use it in a printf easily: printf("Size: %d\n", strintmap_get(&foo, 0)); whereas if it strintmap_get() returns an intptr_t, then it's a royal mess to attempt to portably use it without adding additional manual casts. Maybe I was just missing something obvious, but I couldn't figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the statement to compile on all platforms, so I'd always just cast to int or unsigned at the time of calling printf.
On Fri, Oct 30, 2020 at 10:28:51AM -0700, Elijah Newren wrote: > > You might want to mention that this _could_ be done as just accessors to > > strmap, but using a separate struct provides type safety against > > misusing pointers as integers or vice versa. > > If I just did it as accessors, it makes it harder for myself and > others to remember what my huge piles of strmaps in merge-ort do; I > found that it became easier to follow the code and remember what > things were doing when some were marked as strmap, some as strintmap, > and some as strset. Oh, I'm definitely on board with that argument. I was just suggesting you might want to put it in the commit message for posterity. > > > +/* > > > + * strintmap: > > > + * A map of string -> int, typecasting the void* of strmap to an int. > > > > Are the size and signedness of an int flexible enough for all uses? > > If some users want signed values and others want unsigned, I'm not > sure how we can satisfy both. Maybe make a struintmap? Right, that was sort of my question: do your users actually want it signed or not. Sounds like they do want it signed, and don't mind the loss of range. > Perhaps that could be added later if uses come up for it? Some of my > uses need int, the rest of them wouldn't care about int vs unsigned. Yeah, if you don't have any callers which care, I'd definitely punt on it for now. > If someone does care about the full range of bits up to 64 on relevant > platforms, I guess I should make it strintptr_t_map. Yeah, that's what I was wondering. I suspect the use case for that is pretty narrow, though. If you really care about having a 64-bit value for some data, then you probably want it _everywhere_, not just on 64-bit platforms. I guess the exception would be if you're mapping into size_t's or something. I think my question was as much "did you think about range issues for your intended users" as "should we provide more range in this map type". And it sounds like you have thought about that, so I'm happy proceeding. > But besides the > egregiously ugly name, one advantage of int over intptr_t (or unsigned > over uintptr_t) is that you can use it in a printf easily: > printf("Size: %d\n", strintmap_get(&foo, 0)); > whereas if it strintmap_get() returns an intptr_t, then it's a royal > mess to attempt to portably use it without adding additional manual > casts. Maybe I was just missing something obvious, but I couldn't > figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the > statement to compile on all platforms, so I'd always just cast to int > or unsigned at the time of calling printf. The right way is: printf("Size: %"PRIdMAX", (intmax_t) your_intptr_t); which will always do the right thing no matter the size (at the minor cost of passing a larger-than-necessary parameter, but if you're micro-optimizing then calling printf at all is probably already a problem). But yeah, in general using a real "int" is much more convenient and if there's no reason to avoid it for range problems, I think it's preferable. -Peff
On Tue, Nov 3, 2020 at 8:20 AM Jeff King <peff@peff.net> wrote: > > On Fri, Oct 30, 2020 at 10:28:51AM -0700, Elijah Newren wrote: > > > > You might want to mention that this _could_ be done as just accessors to > > > strmap, but using a separate struct provides type safety against > > > misusing pointers as integers or vice versa. > > > > If I just did it as accessors, it makes it harder for myself and > > others to remember what my huge piles of strmaps in merge-ort do; I > > found that it became easier to follow the code and remember what > > things were doing when some were marked as strmap, some as strintmap, > > and some as strset. > > Oh, I'm definitely on board with that argument. I was just suggesting > you might want to put it in the commit message for posterity. > > > > > +/* > > > > + * strintmap: > > > > + * A map of string -> int, typecasting the void* of strmap to an int. > > > > > > Are the size and signedness of an int flexible enough for all uses? > > > > If some users want signed values and others want unsigned, I'm not > > sure how we can satisfy both. Maybe make a struintmap? > > Right, that was sort of my question: do your users actually want it > signed or not. Sounds like they do want it signed, and don't mind the > loss of range. > > > Perhaps that could be added later if uses come up for it? Some of my > > uses need int, the rest of them wouldn't care about int vs unsigned. > > Yeah, if you don't have any callers which care, I'd definitely punt on > it for now. > > > If someone does care about the full range of bits up to 64 on relevant > > platforms, I guess I should make it strintptr_t_map. > > Yeah, that's what I was wondering. I suspect the use case for that is > pretty narrow, though. If you really care about having a 64-bit value > for some data, then you probably want it _everywhere_, not just on > 64-bit platforms. I guess the exception would be if you're mapping into > size_t's or something. > > I think my question was as much "did you think about range issues for > your intended users" as "should we provide more range in this map type". > And it sounds like you have thought about that, so I'm happy proceeding. > > > But besides the > > egregiously ugly name, one advantage of int over intptr_t (or unsigned > > over uintptr_t) is that you can use it in a printf easily: > > printf("Size: %d\n", strintmap_get(&foo, 0)); > > whereas if it strintmap_get() returns an intptr_t, then it's a royal > > mess to attempt to portably use it without adding additional manual > > casts. Maybe I was just missing something obvious, but I couldn't > > figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the > > statement to compile on all platforms, so I'd always just cast to int > > or unsigned at the time of calling printf. > > The right way is: > > printf("Size: %"PRIdMAX", (intmax_t) your_intptr_t); Ah, intmax_t; that's what I was missing. > which will always do the right thing no matter the size (at the minor > cost of passing a larger-than-necessary parameter, but if you're > micro-optimizing then calling printf at all is probably already a > problem). > > But yeah, in general using a real "int" is much more convenient and if > there's no reason to avoid it for range problems, I think it's > preferable. Yep, I like the simplicity of "int", the signedness of "int" and it has far more than enough range on all platforms (most my strintmaps actually map to enum values, but my largest int usage is for counting up to at most how many files are involved in rename detection. Even microsoft repos only have a number of files present in the repository that registers in the low millions, and I'm only dealing with the subset of those files involved in rename detection, which should be much smaller).
diff --git a/strmap.c b/strmap.c index 47cbf11ec7..d5003a79e3 100644 --- a/strmap.c +++ b/strmap.c @@ -126,3 +126,14 @@ void strmap_remove(struct strmap *map, const char *str, int free_util) free((char*)ret->key); free(ret); } + +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt) +{ + struct strmap_entry *entry = find_strmap_entry(&map->map, str); + if (entry) { + intptr_t *whence = (intptr_t*)&entry->value; + *whence += amt; + } + else + strintmap_set(map, str, amt); +} diff --git a/strmap.h b/strmap.h index 5bb7650d65..fe15e74b78 100644 --- a/strmap.h +++ b/strmap.h @@ -98,4 +98,84 @@ static inline unsigned int strmap_get_size(struct strmap *map) var = hashmap_iter_next_entry_offset(iter, \ OFFSETOF_VAR(var, ent))) + +/* + * strintmap: + * A map of string -> int, typecasting the void* of strmap to an int. + * + * Primary differences: + * 1) Since the void* value is just an int in disguise, there is no value + * to free. (Thus one fewer argument to strintmap_clear) + * 2) strintmap_get() returns an int; it also requires an extra parameter to + * be specified so it knows what value to return if the underlying strmap + * has not key matching the given string. + * 3) No strmap_put() equivalent; strintmap_set() and strintmap_incr() + * instead. + */ + +struct strintmap { + struct strmap map; +}; + +#define strintmap_for_each_entry(mystrmap, iter, var) \ + strmap_for_each_entry(&(mystrmap)->map, iter, var) + +static inline void strintmap_init(struct strintmap *map) +{ + strmap_init(&map->map); +} + +static inline void strintmap_ocd_init(struct strintmap *map, + int strdup_strings) +{ + strmap_ocd_init(&map->map, strdup_strings); +} + +static inline void strintmap_clear(struct strintmap *map) +{ + strmap_clear(&map->map, 0); +} + +static inline void strintmap_partial_clear(struct strintmap *map) +{ + strmap_partial_clear(&map->map, 0); +} + +static inline int strintmap_contains(struct strintmap *map, const char *str) +{ + return strmap_contains(&map->map, str); +} + +static inline void strintmap_remove(struct strintmap *map, const char *str) +{ + return strmap_remove(&map->map, str, 0); +} + +static inline int strintmap_empty(struct strintmap *map) +{ + return strmap_empty(&map->map); +} + +static inline unsigned int strintmap_get_size(struct strintmap *map) +{ + return strmap_get_size(&map->map); +} + +static inline int strintmap_get(struct strintmap *map, const char *str, + int default_value) +{ + struct strmap_entry *result = strmap_get_entry(&map->map, str); + if (!result) + return default_value; + return (intptr_t)result->value; +} + +static inline void strintmap_set(struct strintmap *map, const char *str, + intptr_t v) +{ + strmap_put(&map->map, str, (void *)v); +} + +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt); + #endif /* STRMAP_H */