Message ID | 20200403121508.GA638328@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fast-import: replace custom hash with hashmap.c | expand |
Am 03.04.20 um 14:15 schrieb Jeff King: > We use a custom hash in fast-import to store the set of objects we've > imported so far. It has a fixed set of 2^16 buckets and chains any > collisions with a linked list. As the number of objects grows larger > than that, the load factor increases and we degrade to O(n) lookups and > O(n^2) insertions. > > We can scale better by using our hashmap.c implementation, which will > resize the bucket count as we grow. This does incur an extra memory cost > of 8 bytes per object, as hashmap stores the integer hash value for each > entry in its hashmap_entry struct (which we really don't care about > here, because we're just reusing the embedded object hash). But I think > the numbers below justify this (and our per-object memory cost is > already much higher). > > I also looked at using khash, but it seemed to perform slightly worse > than hashmap at all sizes, and worse even than the existing code for > small sizes. It's also awkward to use here, because we want to look up a > "struct object_entry" from a "struct object_id", and it doesn't handle > mismatched keys as well. Making a mapping of object_id to object_entry > would be more natural, but that would require pulling the embedded oid > out of the object_entry or incurring an extra 32 bytes per object. > > In a synthetic test creating as many cheap, tiny objects as possible > > perl -e ' > my $bits = shift; > my $nr = 2**$bits; > > for (my $i = 0; $i < $nr; $i++) { > print "blob\n"; > print "data 4\n"; > print pack("N", $i); > } > ' $bits | git fast-import > > I got these results: > > nr_objects master khash hashmap > 2^20 0m4.317s 0m5.109s 0m3.890s > 2^21 0m10.204s 0m9.702s 0m7.933s > 2^22 0m27.159s 0m17.911s 0m16.751s > 2^23 1m19.038s 0m35.080s 0m31.963s > 2^24 4m18.766s 1m10.233s 1m6.793s I get similar numbers. > > which points to hashmap as the winner. We didn't have any perf tests for > fast-export or fast-import, so I added one as a more real-world case. > It uses an export without blobs since that's significantly cheaper than > a full one, but still is an interesting case people might use (e.g., for > rewriting history). It will emphasize this change in some ways (as a > percentage we spend more time making objects and less shuffling blob > bytes around) and less in others (the total object count is lower). > > Here are the results for linux.git: > > Test HEAD^ HEAD > ---------------------------------------------------------------------------- > 9300.1: export (no-blobs) 67.64(66.96+0.67) 67.81(67.06+0.75) +0.3% > 9300.2: import (no-blobs) 284.04(283.34+0.69) 198.09(196.01+0.92) -30.3% My numbers look a bit different for this, not sure why: Test origin/master HEAD --------------------------------------------------------------------------- 9300.1: export (no-blobs) 69.36(66.44+1.56) 67.89(66.07+1.56) -2.1% 9300.2: import (no-blobs) 295.10(293.83+1.19) 283.83(282.91+0.91) -3.8% They are still in favor of the patch, just not as strongly as yours. > > It only has ~5.2M commits and trees, so this is a larger effect than I > expected (the 2^23 case above only improved by 50s or so, but here we > gained almost 90s). This is probably due to actually performing more > object lookups in a real import with trees and commits, as opposed to > just dumping a bunch of blobs into a pack. > > Signed-off-by: Jeff King <peff@peff.net> > --- > fast-import.c | 62 +++++++++++++++++++----------- > t/perf/p9300-fast-import-export.sh | 23 +++++++++++ > 2 files changed, 62 insertions(+), 23 deletions(-) > create mode 100755 t/perf/p9300-fast-import-export.sh > I have to warn you up front: I'm not familiar with hashmap or fast-import, so I'll focus on stylistic topics -- bikeshedding. The actual change looks reasonable to me overall and the performance is convincing. > diff --git a/fast-import.c b/fast-import.c > index 202dda11a6..0ef6defc10 100644 > --- a/fast-import.c > +++ b/fast-import.c > @@ -39,12 +39,28 @@ > > struct object_entry { > struct pack_idx_entry idx; > - struct object_entry *next; > + struct hashmap_entry ent; > uint32_t type : TYPE_BITS, > pack_id : PACK_ID_BITS, > depth : DEPTH_BITS; > }; > > +static int object_entry_hashcmp(const void *map_data, > + const struct hashmap_entry *eptr, > + const struct hashmap_entry *entry_or_key, > + const void *keydata) > +{ > + const struct object_id *oid = keydata; > + const struct object_entry *e1, *e2; > + > + e1 = container_of(eptr, const struct object_entry, ent); > + if (oid) > + return oidcmp(&e1->idx.oid, oid); > + > + e2 = container_of(entry_or_key, const struct object_entry, ent); > + return oidcmp(&e1->idx.oid, &e2->idx.oid); Other hashmap users express this more tersely, similar to this: const struct object_entry *e1, *e2; e1 = container_of(eptr, const struct object_entry, ent); e2 = container_of(entry_or_key, const struct object_entry, ent); return oidcmp(&e1->idx.oid, keydata ? keydata : &e2->idx.oid); > +} > + > struct object_entry_pool { > struct object_entry_pool *next_pool; > struct object_entry *next_free; > @@ -178,7 +194,7 @@ static off_t pack_size; > /* Table of objects we've written. */ > static unsigned int object_entry_alloc = 5000; > static struct object_entry_pool *blocks; > -static struct object_entry *object_table[1 << 16]; > +static struct hashmap object_table; > static struct mark_set *marks; > static const char *export_marks_file; > static const char *import_marks_file; > @@ -455,44 +471,42 @@ static struct object_entry *new_object(struct object_id *oid) > > static struct object_entry *find_object(struct object_id *oid) > { > - unsigned int h = oid->hash[0] << 8 | oid->hash[1]; > - struct object_entry *e; > - for (e = object_table[h]; e; e = e->next) > - if (oideq(oid, &e->idx.oid)) > - return e; > + struct hashmap_entry lookup_entry, *e; > + > + hashmap_entry_init(&lookup_entry, oidhash(oid)); > + e = hashmap_get(&object_table, &lookup_entry, oid); > + if (e) > + return container_of(e, struct object_entry, ent); (Worth repeating:) No casting trick, yay! > return NULL; > } > > static struct object_entry *insert_object(struct object_id *oid) > { > - unsigned int h = oid->hash[0] << 8 | oid->hash[1]; > - struct object_entry *e = object_table[h]; > + struct object_entry *e; > + struct hashmap_entry lookup_entry, *hashent; > > - while (e) { > - if (oideq(oid, &e->idx.oid)) > - return e; > - e = e->next; > - } > + hashmap_entry_init(&lookup_entry, oidhash(oid)); > + hashent = hashmap_get(&object_table, &lookup_entry, oid); > + if (hashent) > + return container_of(hashent, struct object_entry, ent); That duplicates find_object()... > > e = new_object(oid); > - e->next = object_table[h]; > e->idx.offset = 0; > - object_table[h] = e; > + e->ent.hash = lookup_entry.hash; ... except for this part. Would it make sense to replace this line with a call to hashmap_entry_init()? Seems cleaner to me. It would look like this: struct object_entry *e = find_object(oid); if (e) return e; e = new_object(oid); e->idx.offset = 0; hashmap_entry_init(&e->ent, oidhash(oid)); hashmap_add(&object_table, &e->ent); return e; > + hashmap_add(&object_table, &e->ent); > return e; > } > > static void invalidate_pack_id(unsigned int id) > { > - unsigned int h; > unsigned long lu; > struct tag *t; > + struct hashmap_iter iter; > + struct object_entry *e; > > - for (h = 0; h < ARRAY_SIZE(object_table); h++) { > - struct object_entry *e; > - > - for (e = object_table[h]; e; e = e->next) > - if (e->pack_id == id) > - e->pack_id = MAX_PACK_ID; > + hashmap_for_each_entry(&object_table, &iter, e, ent) { > + if (e->pack_id == id) > + e->pack_id = MAX_PACK_ID; > } > > for (lu = 0; lu < branch_table_sz; lu++) { > @@ -3511,6 +3525,8 @@ int cmd_main(int argc, const char **argv) > avail_tree_table = xcalloc(avail_tree_table_sz, sizeof(struct avail_tree_content*)); > marks = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set)); > > + hashmap_init(&object_table, object_entry_hashcmp, NULL, 0); > + > /* > * We don't parse most options until after we've seen the set of > * "feature" lines at the start of the stream (which allows the command > diff --git a/t/perf/p9300-fast-import-export.sh b/t/perf/p9300-fast-import-export.sh > new file mode 100755 > index 0000000000..c3c743d04a > --- /dev/null > +++ b/t/perf/p9300-fast-import-export.sh > @@ -0,0 +1,23 @@ > +#!/bin/sh > + > +test_description='test fast-import and fast-export performance' > +. ./perf-lib.sh > + > +test_perf_default_repo > + > +# Use --no-data here to produce a vastly smaller export file. > +# This is much cheaper to work with but should still exercise > +# fast-import pretty well (we'll still process all commits and > +# trees, which account for 60% or more of objects in most repos). > +# > +# Use --rencode to avoid the default of aborting on non-utf8 commits, > +# which lets this test run against a wider variety of sample repos. > +test_perf 'export (no-blobs)' ' > + git fast-export --reencode=yes --no-data HEAD >export > +' > + > +test_perf 'import (no-blobs)' ' > + git fast-import --force <export > +' > + > +test_done >
On Fri, Apr 03, 2020 at 08:53:30PM +0200, René Scharfe wrote: > > Here are the results for linux.git: > > > > Test HEAD^ HEAD > > ---------------------------------------------------------------------------- > > 9300.1: export (no-blobs) 67.64(66.96+0.67) 67.81(67.06+0.75) +0.3% > > 9300.2: import (no-blobs) 284.04(283.34+0.69) 198.09(196.01+0.92) -30.3% > > My numbers look a bit different for this, not sure why: > > Test origin/master HEAD > --------------------------------------------------------------------------- > 9300.1: export (no-blobs) 69.36(66.44+1.56) 67.89(66.07+1.56) -2.1% > 9300.2: import (no-blobs) 295.10(293.83+1.19) 283.83(282.91+0.91) -3.8% > > They are still in favor of the patch, just not as strongly as yours. Interesting. I re-ran mine just to double check, and got: Test HEAD^ HEAD ---------------------------------------------------------------------------- 9300.1: export (no-blobs) 63.11(62.31+0.79) 62.00(61.21+0.79) -1.8% 9300.2: import (no-blobs) 261.79(261.06+0.72) 188.55(187.87+0.66) -28.0% (for some reason my whole machine is faster today; maybe I had closed slack). This is on a fresh-ish clone of linux.git. > > + e1 = container_of(eptr, const struct object_entry, ent); > > + if (oid) > > + return oidcmp(&e1->idx.oid, oid); > > + > > + e2 = container_of(entry_or_key, const struct object_entry, ent); > > + return oidcmp(&e1->idx.oid, &e2->idx.oid); > > Other hashmap users express this more tersely, similar to this: > > const struct object_entry *e1, *e2; > > e1 = container_of(eptr, const struct object_entry, ent); > e2 = container_of(entry_or_key, const struct object_entry, ent); > return oidcmp(&e1->idx.oid, keydata ? keydata : &e2->idx.oid); I wasn't sure if we'd ever see entry_or_key as NULL, in which case the second container_of() would be undefined behavior. There's a container_of_or_null() for this, but it seemed simpler to just avoid the deref entirely if we didn't need it. I think in practice we won't ever pass NULL, though. Even a hashmap_get() needs to pass in a hashmap_entry with the hash value in it. Though I think that's technically _also_ undefined behavior. If I have a struct where the entry is not at the beginning, like: struct foo { const char *bar; struct hashmap_entry ent; }; then: e2 = container_of(ptr_to_entry, struct foo, ent); is going to form a pointer at 8 bytes before ptr_to_entry. If it really is a "struct hashmap_entry" on the stack, then it's pointing at garbage. We don't dereference it, so it's likely fine in practice, but even computing such a pointer does violate the standard. > > + hashmap_entry_init(&lookup_entry, oidhash(oid)); > > + hashent = hashmap_get(&object_table, &lookup_entry, oid); > > + if (hashent) > > + return container_of(hashent, struct object_entry, ent); > > That duplicates find_object()... Yes. This is how most hashmap users do it. It's only a few lines, and sharing without extra work would mean passing the hash value around (otherwise we'd have to recompute it again), which is awkward. Though in our case oidhash() is cheap enough that perhaps it's not worth worrying about? > > e = new_object(oid); > > - e->next = object_table[h]; > > e->idx.offset = 0; > > - object_table[h] = e; > > + e->ent.hash = lookup_entry.hash; > > ... except for this part. Would it make sense to replace this line with > a call to hashmap_entry_init()? Seems cleaner to me. It would look > like this: > struct object_entry *e = find_object(oid); > > if (e) > return e; > > e = new_object(oid); > e->idx.offset = 0; > hashmap_entry_init(&e->ent, oidhash(oid)); > hashmap_add(&object_table, &e->ent); > return e; Right, that calls oidhash() again. If we're OK with that, I agree it's a bit shorter and not any more awkward. I do worry slightly that it sets a bad example, though (you wouldn't want to repeat a strhash(), for example). -Peff
Am 05.04.20 um 20:59 schrieb Jeff King: > On Fri, Apr 03, 2020 at 08:53:30PM +0200, René Scharfe wrote: > >>> Here are the results for linux.git: >>> >>> Test HEAD^ HEAD >>> ---------------------------------------------------------------------------- >>> 9300.1: export (no-blobs) 67.64(66.96+0.67) 67.81(67.06+0.75) +0.3% >>> 9300.2: import (no-blobs) 284.04(283.34+0.69) 198.09(196.01+0.92) -30.3% >> >> My numbers look a bit different for this, not sure why: >> >> Test origin/master HEAD >> --------------------------------------------------------------------------- >> 9300.1: export (no-blobs) 69.36(66.44+1.56) 67.89(66.07+1.56) -2.1% >> 9300.2: import (no-blobs) 295.10(293.83+1.19) 283.83(282.91+0.91) -3.8% >> >> They are still in favor of the patch, just not as strongly as yours. > > Interesting. I re-ran mine just to double check, and got: > > Test HEAD^ HEAD > ---------------------------------------------------------------------------- > 9300.1: export (no-blobs) 63.11(62.31+0.79) 62.00(61.21+0.79) -1.8% > 9300.2: import (no-blobs) 261.79(261.06+0.72) 188.55(187.87+0.66) -28.0% > > (for some reason my whole machine is faster today; maybe I had closed > slack). > > This is on a fresh-ish clone of linux.git. A second run today reported: Test origin/master HEAD --------------------------------------------------------------------------- 9300.1: export (no-blobs) 61.58(59.93+1.40) 60.63(59.35+1.22) -1.5% 9300.2: import (no-blobs) 239.64(239.00+0.63) 246.02(245.18+0.82) +2.7% git describe says v5.4-3890-g86fd3e9df543 in that repo. Dunno. My PC has thermal issues and stressing it for half an hour straight may cause it to throttle? With Git's own repo I just got this: Test origin/master HEAD ----------------------------------------------------------------------- 9300.1: export (no-blobs) 2.58(2.45+0.05) 2.81(2.75+0.05) +8.9% 9300.2: import (no-blobs) 10.41(10.37+0.03) 10.75(10.72+0.02) +3.3% > >>> + e1 = container_of(eptr, const struct object_entry, ent); >>> + if (oid) >>> + return oidcmp(&e1->idx.oid, oid); >>> + >>> + e2 = container_of(entry_or_key, const struct object_entry, ent); >>> + return oidcmp(&e1->idx.oid, &e2->idx.oid); >> >> Other hashmap users express this more tersely, similar to this: >> >> const struct object_entry *e1, *e2; >> >> e1 = container_of(eptr, const struct object_entry, ent); >> e2 = container_of(entry_or_key, const struct object_entry, ent); >> return oidcmp(&e1->idx.oid, keydata ? keydata : &e2->idx.oid); > > I wasn't sure if we'd ever see entry_or_key as NULL, in which case the > second container_of() would be undefined behavior. There's a > container_of_or_null() for this, but it seemed simpler to just avoid the > deref entirely if we didn't need it. OK. The other users might have the same issue then, though. > I think in practice we won't ever pass NULL, though. Even a > hashmap_get() needs to pass in a hashmap_entry with the hash value in > it. Though I think that's technically _also_ undefined behavior. If I > have a struct where the entry is not at the beginning, like: > > struct foo { > const char *bar; > struct hashmap_entry ent; > }; > > then: > > e2 = container_of(ptr_to_entry, struct foo, ent); > > is going to form a pointer at 8 bytes before ptr_to_entry. If it really > is a "struct hashmap_entry" on the stack, then it's pointing at garbage. > > We don't dereference it, so it's likely fine in practice, but even > computing such a pointer does violate the standard. Really? If ptr_to_entry was NULL then any pointer arithmetic on it would be bad. If it points to a bare hashmap_entry and we lied to container_of by telling it that it's embedded in some other struct then that would be bad as well. But if there actually is a surrounding struct of the specified type then the resulting pointer must surely be valid, no matter if the object it points to was initialized? Do you have the relevant chapter and verse handy? > >>> + hashmap_entry_init(&lookup_entry, oidhash(oid)); >>> + hashent = hashmap_get(&object_table, &lookup_entry, oid); >>> + if (hashent) >>> + return container_of(hashent, struct object_entry, ent); >> >> That duplicates find_object()... > > Yes. This is how most hashmap users do it. It's only a few lines, and > sharing without extra work would mean passing the hash value around > (otherwise we'd have to recompute it again), which is awkward. I find touching hashmap_entry members directly more awkward. Compilers likely inline find_object() here, so the text size and performance should be about the same. > Though in our case oidhash() is cheap enough that perhaps it's not worth > worrying about? Probably, but I didn't measure. My system is quite noisy.. > >>> e = new_object(oid); >>> - e->next = object_table[h]; >>> e->idx.offset = 0; >>> - object_table[h] = e; >>> + e->ent.hash = lookup_entry.hash; >> >> ... except for this part. Would it make sense to replace this line with >> a call to hashmap_entry_init()? Seems cleaner to me. It would look >> like this: >> struct object_entry *e = find_object(oid); >> >> if (e) >> return e; >> >> e = new_object(oid); >> e->idx.offset = 0; >> hashmap_entry_init(&e->ent, oidhash(oid)); >> hashmap_add(&object_table, &e->ent); >> return e; > > Right, that calls oidhash() again. If we're OK with that, I agree it's a > bit shorter and not any more awkward. I do worry slightly that it sets a > bad example, though (you wouldn't want tusing only the API to o repeat a strhash(), for > example). Stuffing the oidhash() result into a variable and using it twice with hashmap_entry_init() would work as well. This would make the reason for the duplicate find_object() code obvious, while keeping struct hashmap_entry opaque. René
On Mon, Apr 06, 2020 at 08:07:45PM +0200, René Scharfe wrote: > > Interesting. I re-ran mine just to double check, and got: > [...] > A second run today reported: > > Test origin/master HEAD > --------------------------------------------------------------------------- > 9300.1: export (no-blobs) 61.58(59.93+1.40) 60.63(59.35+1.22) -1.5% > 9300.2: import (no-blobs) 239.64(239.00+0.63) 246.02(245.18+0.82) +2.7% > > git describe says v5.4-3890-g86fd3e9df543 in that repo. I'm on v5.6-rc7-188-g1b649e0bcae7, but I can't imagine it makes a big difference. > Dunno. My PC has thermal issues and stressing it for half an hour straight > may cause it to throttle? Yeah. I wonder what the variance is between the 3 runs (assuming you're using ./run and doing 3). I.e., is one run in the first set much faster than the others, and we pick it as best-of-3. > With Git's own repo I just got this: > > Test origin/master HEAD > ----------------------------------------------------------------------- > 9300.1: export (no-blobs) 2.58(2.45+0.05) 2.81(2.75+0.05) +8.9% > 9300.2: import (no-blobs) 10.41(10.37+0.03) 10.75(10.72+0.02) +3.3% I get similar numbers here. But I wouldn't expect much. We only have ~160k commits+trees in git.git, so the average chain length is ~2.5. > > I think in practice we won't ever pass NULL, though. Even a > > hashmap_get() needs to pass in a hashmap_entry with the hash value in > > it. Though I think that's technically _also_ undefined behavior. If I > > have a struct where the entry is not at the beginning, like: > > > > struct foo { > > const char *bar; > > struct hashmap_entry ent; > > }; > > > > then: > > > > e2 = container_of(ptr_to_entry, struct foo, ent); > > > > is going to form a pointer at 8 bytes before ptr_to_entry. If it really > > is a "struct hashmap_entry" on the stack, then it's pointing at garbage. > > > > We don't dereference it, so it's likely fine in practice, but even > > computing such a pointer does violate the standard. > > Really? If ptr_to_entry was NULL then any pointer arithmetic on it would > be bad. If it points to a bare hashmap_entry and we lied to container_of > by telling it that it's embedded in some other struct then that would be > bad as well. But if there actually is a surrounding struct of the > specified type then the resulting pointer must surely be valid, no matter > if the object it points to was initialized? Do you have the relevant > chapter and verse handy? Pointing to uninitialized bits is fine according to the standard (and I think that's what you're asking about for chapter and verse). But we really do lie to container_of(). See remote.c, for example. In make_remote(), we call hashmap_get() with a pointer to lookup_entry, which is a bare "struct hashmap_entry". That should end up in remotes_hash_cmp(), which unconditionally computes a pointer to a "struct remote". Now the hashmap_entry there is at the beginning of the struct, so the offset is 0 and the two pointers are the same. So while the pointer's type is incorrect, we didn't compute an invalid pointer value. And traditionally the hashmap code required that it be at the front of the containing struct, but that was loosened recently (and container_of introduced) in 5efabc7ed9 (Merge branch 'ew/hashmap', 2019-10-15). And grepping around, I don't see any cases where it's _not_ at the beginning. So perhaps this is a problem waiting to bite us. > Stuffing the oidhash() result into a variable and using it twice with > hashmap_entry_init() would work as well. This would make the reason for > the duplicate find_object() code obvious, while keeping struct > hashmap_entry opaque. I'd prefer not to use a separate variable, as that requires giving it a type. Perhaps: hashmap_entry_init(&e->ent, lookup_entry.hash); which is used elsewhere is OK? That still assumes there's a hash field, but I think hashmap has always been pretty up front about that (the real sin in the original is assuming that nothing else needs to be initialized). -Peff
On Mon, Apr 06, 2020 at 02:35:42PM -0400, Jeff King wrote: > Pointing to uninitialized bits is fine according to the standard (and I > think that's what you're asking about for chapter and verse). But we > really do lie to container_of(). See remote.c, for example. In > make_remote(), we call hashmap_get() with a pointer to lookup_entry, > which is a bare "struct hashmap_entry". That should end up in > remotes_hash_cmp(), which unconditionally computes a pointer to a > "struct remote". > > Now the hashmap_entry there is at the beginning of the struct, so the > offset is 0 and the two pointers are the same. So while the pointer's > type is incorrect, we didn't compute an invalid pointer value. And > traditionally the hashmap code required that it be at the front of the > containing struct, but that was loosened recently (and container_of > introduced) in 5efabc7ed9 (Merge branch 'ew/hashmap', 2019-10-15). > > And grepping around, I don't see any cases where it's _not_ at the > beginning. So perhaps this is a problem waiting to bite us. I was curious wither ASan/UBSan might complain if we did something like this: diff --git a/oidmap.h b/oidmap.h index c66a83ab1d..1abfe966f2 100644 --- a/oidmap.h +++ b/oidmap.h @@ -12,10 +12,10 @@ * internal_entry field. */ struct oidmap_entry { + struct object_id oid; + /* For internal use only */ struct hashmap_entry internal_entry; - - struct object_id oid; }; It does, but not because of the subtle UB issue. It's just that the rest of oidmap still assumes the hashmap-is-first logic, and we need: diff --git a/oidmap.c b/oidmap.c index 423aa014a3..4065ea4b79 100644 --- a/oidmap.c +++ b/oidmap.c @@ -35,7 +35,10 @@ void *oidmap_get(const struct oidmap *map, const struct object_id *key) if (!map->map.cmpfn) return NULL; - return hashmap_get_from_hash(&map->map, oidhash(key), key); + return container_of_or_null( + hashmap_get_from_hash(&map->map, oidhash(key), key), + struct oidmap_entry, + internal_entry); } void *oidmap_remove(struct oidmap *map, const struct object_id *key) struct oidmap { @@ -79,7 +79,10 @@ static inline void oidmap_iter_init(struct oidmap *map, struct oidmap_iter *iter static inline void *oidmap_iter_next(struct oidmap_iter *iter) { /* TODO: this API could be reworked to do compile-time type checks */ - return (void *)hashmap_iter_next(&iter->h_iter); + return container_of_or_null( + hashmap_iter_next(&iter->h_iter), + struct oidmap_entry, + internal_entry); } static inline void *oidmap_iter_first(struct oidmap *map, -Peff
Am 06.04.20 um 20:35 schrieb Jeff King: > On Mon, Apr 06, 2020 at 08:07:45PM +0200, René Scharfe wrote: > >>> Interesting. I re-ran mine just to double check, and got: >> [...] >> A second run today reported: >> >> Test origin/master HEAD >> --------------------------------------------------------------------------- >> 9300.1: export (no-blobs) 61.58(59.93+1.40) 60.63(59.35+1.22) -1.5% >> 9300.2: import (no-blobs) 239.64(239.00+0.63) 246.02(245.18+0.82) +2.7% >> >> git describe says v5.4-3890-g86fd3e9df543 in that repo. > > I'm on v5.6-rc7-188-g1b649e0bcae7, but I can't imagine it makes a big > difference. > >> Dunno. My PC has thermal issues and stressing it for half an hour straight >> may cause it to throttle? > > Yeah. I wonder what the variance is between the 3 runs (assuming you're > using ./run and doing 3). I.e., is one run in the first set much faster > than the others, and we pick it as best-of-3. I did use run with the default three laps. Ran p9300 directly with patch v2 applied instead now, got this: Test this tree ----------------------------------------------- 9300.1: export (no-blobs) 64.80(60.95+1.47) 9300.2: import (no-blobs) 210.00(206.02+2.04) 9300.1: export (no-blobs) 62.58(60.74+1.35) 9300.2: import (no-blobs) 209.07(206.78+1.96) 9300.1: export (no-blobs) 61.33(60.17+1.03) 9300.2: import (no-blobs) 207.73(205.47+2.04) > But we > really do lie to container_of(). See remote.c, for example. In > make_remote(), we call hashmap_get() with a pointer to lookup_entry, > which is a bare "struct hashmap_entry". That should end up in > remotes_hash_cmp(), which unconditionally computes a pointer to a > "struct remote". Ugh. Any optimization level would delay that to after the keydata check, though, I guess. There's also function pointer casting going on (with patch_util_cmp() and sequence_entry_cmp()), which is undefined behavior IIUC. > Now the hashmap_entry there is at the beginning of the struct, so the > offset is 0 and the two pointers are the same. So while the pointer's > type is incorrect, we didn't compute an invalid pointer value. And > traditionally the hashmap code required that it be at the front of the > containing struct, but that was loosened recently (and container_of > introduced) in 5efabc7ed9 (Merge branch 'ew/hashmap', 2019-10-15). Hmm, OK. > And grepping around, I don't see any cases where it's _not_ at the > beginning. So perhaps this is a problem waiting to bite us. Possibly. Speaking of grep, I ended up with this oneliner to show all the comparison functions used with hashmaps: git grep -W -f <(git grep -wh hashmap_init | cut -f2 -d, | grep -v -e ' .* ' -e NULL -e '^ *$' | sed 's/ //; s/.*)//; s/^/int /' | sort -u) But I didn't learn much from looking at them, except that most have that unconditional container_of. >> Stuffing the oidhash() result into a variable and using it twice with >> hashmap_entry_init() would work as well. This would make the reason for >> the duplicate find_object() code obvious, while keeping struct >> hashmap_entry opaque. > > I'd prefer not to use a separate variable, as that requires giving it a > type. Specifying the type that oidhash() returns and hashmap_entry_init() accepts is redundant and inconvenient, but that line should be easy to find and adapt when the function signatures are eventually changed. Which is perhaps never going to happen anyway? > Perhaps: > > hashmap_entry_init(&e->ent, lookup_entry.hash); > > which is used elsewhere is OK? That still assumes there's a hash field, > but I think hashmap has always been pretty up front about that (the real > sin in the original is assuming that nothing else needs to be > initialized). That conflicts with that statement from the hashmap.h: * struct hashmap_entry is an opaque structure representing an entry in the * hash table. Patch v2 looks good to me, though! René
diff --git a/fast-import.c b/fast-import.c index 202dda11a6..0ef6defc10 100644 --- a/fast-import.c +++ b/fast-import.c @@ -39,12 +39,28 @@ struct object_entry { struct pack_idx_entry idx; - struct object_entry *next; + struct hashmap_entry ent; uint32_t type : TYPE_BITS, pack_id : PACK_ID_BITS, depth : DEPTH_BITS; }; +static int object_entry_hashcmp(const void *map_data, + const struct hashmap_entry *eptr, + const struct hashmap_entry *entry_or_key, + const void *keydata) +{ + const struct object_id *oid = keydata; + const struct object_entry *e1, *e2; + + e1 = container_of(eptr, const struct object_entry, ent); + if (oid) + return oidcmp(&e1->idx.oid, oid); + + e2 = container_of(entry_or_key, const struct object_entry, ent); + return oidcmp(&e1->idx.oid, &e2->idx.oid); +} + struct object_entry_pool { struct object_entry_pool *next_pool; struct object_entry *next_free; @@ -178,7 +194,7 @@ static off_t pack_size; /* Table of objects we've written. */ static unsigned int object_entry_alloc = 5000; static struct object_entry_pool *blocks; -static struct object_entry *object_table[1 << 16]; +static struct hashmap object_table; static struct mark_set *marks; static const char *export_marks_file; static const char *import_marks_file; @@ -455,44 +471,42 @@ static struct object_entry *new_object(struct object_id *oid) static struct object_entry *find_object(struct object_id *oid) { - unsigned int h = oid->hash[0] << 8 | oid->hash[1]; - struct object_entry *e; - for (e = object_table[h]; e; e = e->next) - if (oideq(oid, &e->idx.oid)) - return e; + struct hashmap_entry lookup_entry, *e; + + hashmap_entry_init(&lookup_entry, oidhash(oid)); + e = hashmap_get(&object_table, &lookup_entry, oid); + if (e) + return container_of(e, struct object_entry, ent); return NULL; } static struct object_entry *insert_object(struct object_id *oid) { - unsigned int h = oid->hash[0] << 8 | oid->hash[1]; - struct object_entry *e = object_table[h]; + struct object_entry *e; + struct hashmap_entry lookup_entry, *hashent; - while (e) { - if (oideq(oid, &e->idx.oid)) - return e; - e = e->next; - } + hashmap_entry_init(&lookup_entry, oidhash(oid)); + hashent = hashmap_get(&object_table, &lookup_entry, oid); + if (hashent) + return container_of(hashent, struct object_entry, ent); e = new_object(oid); - e->next = object_table[h]; e->idx.offset = 0; - object_table[h] = e; + e->ent.hash = lookup_entry.hash; + hashmap_add(&object_table, &e->ent); return e; } static void invalidate_pack_id(unsigned int id) { - unsigned int h; unsigned long lu; struct tag *t; + struct hashmap_iter iter; + struct object_entry *e; - for (h = 0; h < ARRAY_SIZE(object_table); h++) { - struct object_entry *e; - - for (e = object_table[h]; e; e = e->next) - if (e->pack_id == id) - e->pack_id = MAX_PACK_ID; + hashmap_for_each_entry(&object_table, &iter, e, ent) { + if (e->pack_id == id) + e->pack_id = MAX_PACK_ID; } for (lu = 0; lu < branch_table_sz; lu++) { @@ -3511,6 +3525,8 @@ int cmd_main(int argc, const char **argv) avail_tree_table = xcalloc(avail_tree_table_sz, sizeof(struct avail_tree_content*)); marks = mem_pool_calloc(&fi_mem_pool, 1, sizeof(struct mark_set)); + hashmap_init(&object_table, object_entry_hashcmp, NULL, 0); + /* * We don't parse most options until after we've seen the set of * "feature" lines at the start of the stream (which allows the command diff --git a/t/perf/p9300-fast-import-export.sh b/t/perf/p9300-fast-import-export.sh new file mode 100755 index 0000000000..c3c743d04a --- /dev/null +++ b/t/perf/p9300-fast-import-export.sh @@ -0,0 +1,23 @@ +#!/bin/sh + +test_description='test fast-import and fast-export performance' +. ./perf-lib.sh + +test_perf_default_repo + +# Use --no-data here to produce a vastly smaller export file. +# This is much cheaper to work with but should still exercise +# fast-import pretty well (we'll still process all commits and +# trees, which account for 60% or more of objects in most repos). +# +# Use --rencode to avoid the default of aborting on non-utf8 commits, +# which lets this test run against a wider variety of sample repos. +test_perf 'export (no-blobs)' ' + git fast-export --reencode=yes --no-data HEAD >export +' + +test_perf 'import (no-blobs)' ' + git fast-import --force <export +' + +test_done
We use a custom hash in fast-import to store the set of objects we've imported so far. It has a fixed set of 2^16 buckets and chains any collisions with a linked list. As the number of objects grows larger than that, the load factor increases and we degrade to O(n) lookups and O(n^2) insertions. We can scale better by using our hashmap.c implementation, which will resize the bucket count as we grow. This does incur an extra memory cost of 8 bytes per object, as hashmap stores the integer hash value for each entry in its hashmap_entry struct (which we really don't care about here, because we're just reusing the embedded object hash). But I think the numbers below justify this (and our per-object memory cost is already much higher). I also looked at using khash, but it seemed to perform slightly worse than hashmap at all sizes, and worse even than the existing code for small sizes. It's also awkward to use here, because we want to look up a "struct object_entry" from a "struct object_id", and it doesn't handle mismatched keys as well. Making a mapping of object_id to object_entry would be more natural, but that would require pulling the embedded oid out of the object_entry or incurring an extra 32 bytes per object. In a synthetic test creating as many cheap, tiny objects as possible perl -e ' my $bits = shift; my $nr = 2**$bits; for (my $i = 0; $i < $nr; $i++) { print "blob\n"; print "data 4\n"; print pack("N", $i); } ' $bits | git fast-import I got these results: nr_objects master khash hashmap 2^20 0m4.317s 0m5.109s 0m3.890s 2^21 0m10.204s 0m9.702s 0m7.933s 2^22 0m27.159s 0m17.911s 0m16.751s 2^23 1m19.038s 0m35.080s 0m31.963s 2^24 4m18.766s 1m10.233s 1m6.793s which points to hashmap as the winner. We didn't have any perf tests for fast-export or fast-import, so I added one as a more real-world case. It uses an export without blobs since that's significantly cheaper than a full one, but still is an interesting case people might use (e.g., for rewriting history). It will emphasize this change in some ways (as a percentage we spend more time making objects and less shuffling blob bytes around) and less in others (the total object count is lower). Here are the results for linux.git: Test HEAD^ HEAD ---------------------------------------------------------------------------- 9300.1: export (no-blobs) 67.64(66.96+0.67) 67.81(67.06+0.75) +0.3% 9300.2: import (no-blobs) 284.04(283.34+0.69) 198.09(196.01+0.92) -30.3% It only has ~5.2M commits and trees, so this is a larger effect than I expected (the 2^23 case above only improved by 50s or so, but here we gained almost 90s). This is probably due to actually performing more object lookups in a real import with trees and commits, as opposed to just dumping a bunch of blobs into a pack. Signed-off-by: Jeff King <peff@peff.net> --- fast-import.c | 62 +++++++++++++++++++----------- t/perf/p9300-fast-import-export.sh | 23 +++++++++++ 2 files changed, 62 insertions(+), 23 deletions(-) create mode 100755 t/perf/p9300-fast-import-export.sh