[15/17] khash: rename oid helper functions
diff mbox series

Message ID 20190620074141.GO3713@sigill.intra.peff.net
State New
Headers show
Series
  • drop non-object_id hashing
Related show

Commit Message

Jeff King June 20, 2019, 7:41 a.m. UTC
For use in object_id hash tables, we have oid_hash() and oid_equal().
But these are confusingly similar to the existing oideq() and the
oidhash() we plan to add to replace sha1hash().

The big difference from those functions is that rather than accepting a
const pointer to the "struct object_id", we take the arguments by value
(which is a khash internal convention). So let's make that obvious by
calling them oidhash_by_value() and oideq_by_value().

Those names are fairly horrendous to type, but we rarely need to do so;
they are passed to the khash implementation macro and then only used
internally. Callers get to use the nice kh_put_oid_map(), etc.

Signed-off-by: Jeff King <peff@peff.net>
---
 khash.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Junio C Hamano June 20, 2019, 5:44 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> For use in object_id hash tables, we have oid_hash() and oid_equal().
> But these are confusingly similar to the existing oideq() and the
> oidhash() we plan to add to replace sha1hash().
>
> The big difference from those functions is that rather than accepting a
> const pointer to the "struct object_id", we take the arguments by value
> (which is a khash internal convention). So let's make that obvious by
> calling them oidhash_by_value() and oideq_by_value().
>
> Those names are fairly horrendous to type, but we rarely need to do so;
> they are passed to the khash implementation macro and then only used
> internally. Callers get to use the nice kh_put_oid_map(), etc.

The pass-by-value interface feels a bit unforunate but hopefully
"static inline" would help us avoid actually copying the struct left
and right as we make calls to them X-<.


> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  khash.h | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/khash.h b/khash.h
> index cb2cd3d7e4..f911d2b005 100644
> --- a/khash.h
> +++ b/khash.h
> @@ -324,20 +324,20 @@ static const double __ac_HASH_UPPER = 0.77;
>  		code;												\
>  	} }
>  
> -static inline unsigned int oid_hash(struct object_id oid)
> +static inline unsigned int oidhash_by_value(struct object_id oid)
>  {
>  	return sha1hash(oid.hash);
>  }
>  
> -static inline int oid_equal(struct object_id a, struct object_id b)
> +static inline int oideq_by_value(struct object_id a, struct object_id b)
>  {
>  	return oideq(&a, &b);
>  }
>  
> -KHASH_INIT(oid_set, struct object_id, int, 0, oid_hash, oid_equal)
> +KHASH_INIT(oid_set, struct object_id, int, 0, oidhash_by_value, oideq_by_value)
>  
> -KHASH_INIT(oid_map, struct object_id, void *, 1, oid_hash, oid_equal)
> +KHASH_INIT(oid_map, struct object_id, void *, 1, oidhash_by_value, oideq_by_value)
>  
> -KHASH_INIT(oid_pos, struct object_id, int, 1, oid_hash, oid_equal)
> +KHASH_INIT(oid_pos, struct object_id, int, 1, oidhash_by_value, oideq_by_value)
>  
>  #endif /* __AC_KHASH_H */
Jeff King June 20, 2019, 6:27 p.m. UTC | #2
On Thu, Jun 20, 2019 at 10:44:17AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > For use in object_id hash tables, we have oid_hash() and oid_equal().
> > But these are confusingly similar to the existing oideq() and the
> > oidhash() we plan to add to replace sha1hash().
> >
> > The big difference from those functions is that rather than accepting a
> > const pointer to the "struct object_id", we take the arguments by value
> > (which is a khash internal convention). So let's make that obvious by
> > calling them oidhash_by_value() and oideq_by_value().
> >
> > Those names are fairly horrendous to type, but we rarely need to do so;
> > they are passed to the khash implementation macro and then only used
> > internally. Callers get to use the nice kh_put_oid_map(), etc.
> 
> The pass-by-value interface feels a bit unforunate but hopefully
> "static inline" would help us avoid actually copying the struct left
> and right as we make calls to them X-<.

Yeah, exactly. I think it should end up quite fast, though if anybody
(René?) wants to try tweaking khash and timing it, be my guest.

I do think if it took the more usual pass-by-const-pointer we'd probably
still end up with the exact same code from an optimizing compiler, since
all of the references and dereferences would cancel out once it was
inlined.

-Peff
René Scharfe June 23, 2019, 4 p.m. UTC | #3
Am 20.06.19 um 20:27 schrieb Jeff King:
> On Thu, Jun 20, 2019 at 10:44:17AM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>>> For use in object_id hash tables, we have oid_hash() and oid_equal().
>>> But these are confusingly similar to the existing oideq() and the
>>> oidhash() we plan to add to replace sha1hash().
>>>
>>> The big difference from those functions is that rather than accepting a
>>> const pointer to the "struct object_id", we take the arguments by value
>>> (which is a khash internal convention). So let's make that obvious by
>>> calling them oidhash_by_value() and oideq_by_value().
>>>
>>> Those names are fairly horrendous to type, but we rarely need to do so;
>>> they are passed to the khash implementation macro and then only used
>>> internally. Callers get to use the nice kh_put_oid_map(), etc.
>>
>> The pass-by-value interface feels a bit unforunate but hopefully
>> "static inline" would help us avoid actually copying the struct left
>> and right as we make calls to them X-<.
>
> Yeah, exactly. I think it should end up quite fast, though if anybody
> (René?) wants to try tweaking khash and timing it, be my guest.
>
> I do think if it took the more usual pass-by-const-pointer we'd probably
> still end up with the exact same code from an optimizing compiler, since
> all of the references and dereferences would cancel out once it was
> inlined.

I suspect it depends on the compiler and the exact details.  Here's a
simple experiment: https://godbolt.org/z/kuv4NE.  It has a comparison
function for call by value and one for call by reference as well as a
wrapper for each with the opposite interface.

An enlightened compiler would emit the same code for functions with the
same interface, but none of the current ones do in all cases.  Clang
and MSVC do emit the same code for the two call by value functions, so
there's hope.  And moving to call by reference might make matters worse.
GCC adds some 128-bit moves to both wrappers for some reason.

Perhaps it doesn't matter much anyway e.g. due to caching, especially
for the branch-free variants.  A definite answer would require
measurements.  Your cleanup would make the necessary surgery on khash.h
a bit easier by reducing the number of functions and definitions.

René
Jeff King June 23, 2019, 10:46 p.m. UTC | #4
On Sun, Jun 23, 2019 at 06:00:50PM +0200, René Scharfe wrote:

> > I do think if it took the more usual pass-by-const-pointer we'd probably
> > still end up with the exact same code from an optimizing compiler, since
> > all of the references and dereferences would cancel out once it was
> > inlined.
> 
> I suspect it depends on the compiler and the exact details.  Here's a
> simple experiment: https://godbolt.org/z/kuv4NE.  It has a comparison
> function for call by value and one for call by reference as well as a
> wrapper for each with the opposite interface.
> 
> An enlightened compiler would emit the same code for functions with the
> same interface, but none of the current ones do in all cases.  Clang
> and MSVC do emit the same code for the two call by value functions, so
> there's hope.  And moving to call by reference might make matters worse.
> GCC adds some 128-bit moves to both wrappers for some reason.

Hmm. I'm unsure whether your simplified setup is really showing
something interesting or whether we'd need to have true "static inline"
functions that are actually _used_ in a hash table to see if there are
any significant differences.

But...

> Perhaps it doesn't matter much anyway e.g. due to caching, especially
> for the branch-free variants.  A definite answer would require
> measurements.  Your cleanup would make the necessary surgery on khash.h
> a bit easier by reducing the number of functions and definitions.

Yeah, my gut feeling is it probably doesn't matter much to the overall
operation even if the inner code is slightly different (though I'm happy
to be overridden by real data). And it's definitely something we can
punt on for later (or never if nobody feels like dealing with it).

-Peff

Patch
diff mbox series

diff --git a/khash.h b/khash.h
index cb2cd3d7e4..f911d2b005 100644
--- a/khash.h
+++ b/khash.h
@@ -324,20 +324,20 @@  static const double __ac_HASH_UPPER = 0.77;
 		code;												\
 	} }
 
-static inline unsigned int oid_hash(struct object_id oid)
+static inline unsigned int oidhash_by_value(struct object_id oid)
 {
 	return sha1hash(oid.hash);
 }
 
-static inline int oid_equal(struct object_id a, struct object_id b)
+static inline int oideq_by_value(struct object_id a, struct object_id b)
 {
 	return oideq(&a, &b);
 }
 
-KHASH_INIT(oid_set, struct object_id, int, 0, oid_hash, oid_equal)
+KHASH_INIT(oid_set, struct object_id, int, 0, oidhash_by_value, oideq_by_value)
 
-KHASH_INIT(oid_map, struct object_id, void *, 1, oid_hash, oid_equal)
+KHASH_INIT(oid_map, struct object_id, void *, 1, oidhash_by_value, oideq_by_value)
 
-KHASH_INIT(oid_pos, struct object_id, int, 1, oid_hash, oid_equal)
+KHASH_INIT(oid_pos, struct object_id, int, 1, oidhash_by_value, oideq_by_value)
 
 #endif /* __AC_KHASH_H */