Message ID | 20190620074141.GO3713@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drop non-object_id hashing | expand |
Jeff King <peff@peff.net> writes: > For use in object_id hash tables, we have oid_hash() and oid_equal(). > But these are confusingly similar to the existing oideq() and the > oidhash() we plan to add to replace sha1hash(). > > The big difference from those functions is that rather than accepting a > const pointer to the "struct object_id", we take the arguments by value > (which is a khash internal convention). So let's make that obvious by > calling them oidhash_by_value() and oideq_by_value(). > > Those names are fairly horrendous to type, but we rarely need to do so; > they are passed to the khash implementation macro and then only used > internally. Callers get to use the nice kh_put_oid_map(), etc. The pass-by-value interface feels a bit unforunate but hopefully "static inline" would help us avoid actually copying the struct left and right as we make calls to them X-<. > Signed-off-by: Jeff King <peff@peff.net> > --- > khash.h | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/khash.h b/khash.h > index cb2cd3d7e4..f911d2b005 100644 > --- a/khash.h > +++ b/khash.h > @@ -324,20 +324,20 @@ static const double __ac_HASH_UPPER = 0.77; > code; \ > } } > > -static inline unsigned int oid_hash(struct object_id oid) > +static inline unsigned int oidhash_by_value(struct object_id oid) > { > return sha1hash(oid.hash); > } > > -static inline int oid_equal(struct object_id a, struct object_id b) > +static inline int oideq_by_value(struct object_id a, struct object_id b) > { > return oideq(&a, &b); > } > > -KHASH_INIT(oid_set, struct object_id, int, 0, oid_hash, oid_equal) > +KHASH_INIT(oid_set, struct object_id, int, 0, oidhash_by_value, oideq_by_value) > > -KHASH_INIT(oid_map, struct object_id, void *, 1, oid_hash, oid_equal) > +KHASH_INIT(oid_map, struct object_id, void *, 1, oidhash_by_value, oideq_by_value) > > -KHASH_INIT(oid_pos, struct object_id, int, 1, oid_hash, oid_equal) > +KHASH_INIT(oid_pos, struct object_id, int, 1, oidhash_by_value, oideq_by_value) > > #endif /* __AC_KHASH_H */
On Thu, Jun 20, 2019 at 10:44:17AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > For use in object_id hash tables, we have oid_hash() and oid_equal(). > > But these are confusingly similar to the existing oideq() and the > > oidhash() we plan to add to replace sha1hash(). > > > > The big difference from those functions is that rather than accepting a > > const pointer to the "struct object_id", we take the arguments by value > > (which is a khash internal convention). So let's make that obvious by > > calling them oidhash_by_value() and oideq_by_value(). > > > > Those names are fairly horrendous to type, but we rarely need to do so; > > they are passed to the khash implementation macro and then only used > > internally. Callers get to use the nice kh_put_oid_map(), etc. > > The pass-by-value interface feels a bit unforunate but hopefully > "static inline" would help us avoid actually copying the struct left > and right as we make calls to them X-<. Yeah, exactly. I think it should end up quite fast, though if anybody (René?) wants to try tweaking khash and timing it, be my guest. I do think if it took the more usual pass-by-const-pointer we'd probably still end up with the exact same code from an optimizing compiler, since all of the references and dereferences would cancel out once it was inlined. -Peff
Am 20.06.19 um 20:27 schrieb Jeff King: > On Thu, Jun 20, 2019 at 10:44:17AM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: >> >>> For use in object_id hash tables, we have oid_hash() and oid_equal(). >>> But these are confusingly similar to the existing oideq() and the >>> oidhash() we plan to add to replace sha1hash(). >>> >>> The big difference from those functions is that rather than accepting a >>> const pointer to the "struct object_id", we take the arguments by value >>> (which is a khash internal convention). So let's make that obvious by >>> calling them oidhash_by_value() and oideq_by_value(). >>> >>> Those names are fairly horrendous to type, but we rarely need to do so; >>> they are passed to the khash implementation macro and then only used >>> internally. Callers get to use the nice kh_put_oid_map(), etc. >> >> The pass-by-value interface feels a bit unforunate but hopefully >> "static inline" would help us avoid actually copying the struct left >> and right as we make calls to them X-<. > > Yeah, exactly. I think it should end up quite fast, though if anybody > (René?) wants to try tweaking khash and timing it, be my guest. > > I do think if it took the more usual pass-by-const-pointer we'd probably > still end up with the exact same code from an optimizing compiler, since > all of the references and dereferences would cancel out once it was > inlined. I suspect it depends on the compiler and the exact details. Here's a simple experiment: https://godbolt.org/z/kuv4NE. It has a comparison function for call by value and one for call by reference as well as a wrapper for each with the opposite interface. An enlightened compiler would emit the same code for functions with the same interface, but none of the current ones do in all cases. Clang and MSVC do emit the same code for the two call by value functions, so there's hope. And moving to call by reference might make matters worse. GCC adds some 128-bit moves to both wrappers for some reason. Perhaps it doesn't matter much anyway e.g. due to caching, especially for the branch-free variants. A definite answer would require measurements. Your cleanup would make the necessary surgery on khash.h a bit easier by reducing the number of functions and definitions. René
On Sun, Jun 23, 2019 at 06:00:50PM +0200, René Scharfe wrote: > > I do think if it took the more usual pass-by-const-pointer we'd probably > > still end up with the exact same code from an optimizing compiler, since > > all of the references and dereferences would cancel out once it was > > inlined. > > I suspect it depends on the compiler and the exact details. Here's a > simple experiment: https://godbolt.org/z/kuv4NE. It has a comparison > function for call by value and one for call by reference as well as a > wrapper for each with the opposite interface. > > An enlightened compiler would emit the same code for functions with the > same interface, but none of the current ones do in all cases. Clang > and MSVC do emit the same code for the two call by value functions, so > there's hope. And moving to call by reference might make matters worse. > GCC adds some 128-bit moves to both wrappers for some reason. Hmm. I'm unsure whether your simplified setup is really showing something interesting or whether we'd need to have true "static inline" functions that are actually _used_ in a hash table to see if there are any significant differences. But... > Perhaps it doesn't matter much anyway e.g. due to caching, especially > for the branch-free variants. A definite answer would require > measurements. Your cleanup would make the necessary surgery on khash.h > a bit easier by reducing the number of functions and definitions. Yeah, my gut feeling is it probably doesn't matter much to the overall operation even if the inner code is slightly different (though I'm happy to be overridden by real data). And it's definitely something we can punt on for later (or never if nobody feels like dealing with it). -Peff
diff --git a/khash.h b/khash.h index cb2cd3d7e4..f911d2b005 100644 --- a/khash.h +++ b/khash.h @@ -324,20 +324,20 @@ static const double __ac_HASH_UPPER = 0.77; code; \ } } -static inline unsigned int oid_hash(struct object_id oid) +static inline unsigned int oidhash_by_value(struct object_id oid) { return sha1hash(oid.hash); } -static inline int oid_equal(struct object_id a, struct object_id b) +static inline int oideq_by_value(struct object_id a, struct object_id b) { return oideq(&a, &b); } -KHASH_INIT(oid_set, struct object_id, int, 0, oid_hash, oid_equal) +KHASH_INIT(oid_set, struct object_id, int, 0, oidhash_by_value, oideq_by_value) -KHASH_INIT(oid_map, struct object_id, void *, 1, oid_hash, oid_equal) +KHASH_INIT(oid_map, struct object_id, void *, 1, oidhash_by_value, oideq_by_value) -KHASH_INIT(oid_pos, struct object_id, int, 1, oid_hash, oid_equal) +KHASH_INIT(oid_pos, struct object_id, int, 1, oidhash_by_value, oideq_by_value) #endif /* __AC_KHASH_H */
For use in object_id hash tables, we have oid_hash() and oid_equal(). But these are confusingly similar to the existing oideq() and the oidhash() we plan to add to replace sha1hash(). The big difference from those functions is that rather than accepting a const pointer to the "struct object_id", we take the arguments by value (which is a khash internal convention). So let's make that obvious by calling them oidhash_by_value() and oideq_by_value(). Those names are fairly horrendous to type, but we rarely need to do so; they are passed to the khash implementation macro and then only used internally. Callers get to use the nice kh_put_oid_map(), etc. Signed-off-by: Jeff King <peff@peff.net> --- khash.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)