Message ID | c857e12a032f197626cd6a5eb0eafc66afbb5bed.1630291682.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Accepted |
Commit | a9fd2f207d0318cd7a4771f13c9fb4759d280f68 |
Headers | show |
Series | pack-objects: simplify add_objects_in_unpacked_packs() | expand |
On Sun, Aug 29, 2021 at 10:48:54PM -0400, Taylor Blau wrote: > +static int add_object_in_unpacked_pack(const struct object_id *oid, > + struct packed_git *pack, > + uint32_t pos, > + void *_data) > { > [...] > + struct object *obj = lookup_unknown_object(the_repository, oid); > + if (obj->flags & OBJECT_ADDED) > + return 0; > + add_object_entry(oid, obj->type, "", 0); > + obj->flags |= OBJECT_ADDED; > + return 0; > } This is not new in your patch series, but while merging this with another topic I had, I noticed another opportunity for optimization here. We already know the pack/pos in which we found the object. But when we call add_object_entry(), it will do another lookup, via want_object_in_pack(). We could shortcut that by providing it with the extra information, the way add_object_entry_from_bitmap() does. The original code before your series could have done the same optimization, but it became much more obvious after your series, since -Wunused-parameters notices that we do not look at the "pack" or "pos" parameters at all. :) It may not be that exciting an optimization, though. Pack lookups aren't _that_ expensive, and the pack-mru code would mean we always find it in the first pack (since by definition we're iterating through the objects in whole packs, our locality is perfect). It would also probably involve some slight refactoring of add_object_entry() to avoid duplication (though possibly the result could reduce similar duplication with the bitmap variant). Hmm, actually looking further, we already have add_object_entry_from_pack() for --stdin-packs. So I offer it mainly as an observation, in case somebody wants to look into it further (both for the optimization and the possibility of simplifying the code). -Peff
On Thu, Sep 02, 2021 at 07:04:36PM -0400, Jeff King wrote: > On Sun, Aug 29, 2021 at 10:48:54PM -0400, Taylor Blau wrote: > > > +static int add_object_in_unpacked_pack(const struct object_id *oid, > > + struct packed_git *pack, > > + uint32_t pos, > > + void *_data) > > { > > [...] > > + struct object *obj = lookup_unknown_object(the_repository, oid); > > + if (obj->flags & OBJECT_ADDED) > > + return 0; > > + add_object_entry(oid, obj->type, "", 0); > > + obj->flags |= OBJECT_ADDED; > > + return 0; > > } > > This is not new in your patch series, but while merging this with > another topic I had, I noticed another opportunity for optimization > here. We already know the pack/pos in which we found the object. But > when we call add_object_entry(), it will do another lookup, via > want_object_in_pack(). > > [...] > > It would also probably involve some slight refactoring of > add_object_entry() to avoid duplication (though possibly the result > could reduce similar duplication with the bitmap variant). Hmm, actually > looking further, we already have add_object_entry_from_pack() for > --stdin-packs. I was trying to remember how I handled this for cruft packs (which want to take a slightly different path to add_object_entry() in order to record the object mtimes along the way). Indeed, in that case there is a special add_cruft_object_entry() which uses pack and offset directly. Having looked at all of these functions which are layered on top of add_object_entry() recently, I tend to agree that there is some room for clean-up there. I plan on sending the cruft pack series soon, after I untangle all of the `repack` + multi-pack bitmaps stuff, so perhaps that will be a good time to pursue this. Anyway, just to say that I probably agree that the pack mru cache is helping us enough to make this negligible, but that it is on my mind and we'll have a chance to look at it soon. Thanks, Taylor
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index df49f656b9..87ddbd5553 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3505,79 +3505,27 @@ static void show_edge(struct commit *commit) add_preferred_base(&commit->object.oid); } -struct in_pack_object { - off_t offset; - struct object *object; -}; - -struct in_pack { - unsigned int alloc; - unsigned int nr; - struct in_pack_object *array; -}; - -static void mark_in_pack_object(struct object *object, struct packed_git *p, struct in_pack *in_pack) +static int add_object_in_unpacked_pack(const struct object_id *oid, + struct packed_git *pack, + uint32_t pos, + void *_data) { - in_pack->array[in_pack->nr].offset = find_pack_entry_one(object->oid.hash, p); - in_pack->array[in_pack->nr].object = object; - in_pack->nr++; -} - -/* - * Compare the objects in the offset order, in order to emulate the - * "git rev-list --objects" output that produced the pack originally. - */ -static int ofscmp(const void *a_, const void *b_) -{ - struct in_pack_object *a = (struct in_pack_object *)a_; - struct in_pack_object *b = (struct in_pack_object *)b_; - - if (a->offset < b->offset) - return -1; - else if (a->offset > b->offset) - return 1; - else - return oidcmp(&a->object->oid, &b->object->oid); + struct object *obj = lookup_unknown_object(the_repository, oid); + if (obj->flags & OBJECT_ADDED) + return 0; + add_object_entry(oid, obj->type, "", 0); + obj->flags |= OBJECT_ADDED; + return 0; } static void add_objects_in_unpacked_packs(void) { - struct packed_git *p; - struct in_pack in_pack; - uint32_t i; - - memset(&in_pack, 0, sizeof(in_pack)); - - for (p = get_all_packs(the_repository); p; p = p->next) { - struct object_id oid; - struct object *o; - - if (!p->pack_local || p->pack_keep || p->pack_keep_in_core) - continue; - if (open_pack_index(p)) - die(_("cannot open pack index")); - - ALLOC_GROW(in_pack.array, - in_pack.nr + p->num_objects, - in_pack.alloc); - - for (i = 0; i < p->num_objects; i++) { - nth_packed_object_id(&oid, p, i); - o = lookup_unknown_object(the_repository, &oid); - if (!(o->flags & OBJECT_ADDED)) - mark_in_pack_object(o, p, &in_pack); - o->flags |= OBJECT_ADDED; - } - } - - if (in_pack.nr) { - QSORT(in_pack.array, in_pack.nr, ofscmp); - for (i = 0; i < in_pack.nr; i++) { - struct object *o = in_pack.array[i].object; - add_object_entry(&o->oid, o->type, "", 0); - } - } - free(in_pack.array); + if (for_each_packed_object(add_object_in_unpacked_pack, NULL, + FOR_EACH_OBJECT_PACK_ORDER | + FOR_EACH_OBJECT_LOCAL_ONLY | + FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS | + FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS)) + die(_("cannot open pack index")); } static int add_loose_object(const struct object_id *oid, const char *path,
This function is used to implement `pack-objects`'s `--keep-unreachable` option, but can be simplified in a couple of ways: - add_objects_in_unpacked_packs() iterates over all packs (and then all packed objects) itself, but could use for_each_packed_object() instead since the missing flags necessary were added in the previous commit - objects are added to an in_pack array which store (off_t, object) tuples, and then sorted in offset order when we could iterate objects in offset order. There is a slight behavior change here: before we would have added objects in sorted offset order among _all_ packs. Handing objects to create_object_entry() in pack order for each pack (instead of feeding objects from all packs simultaneously their offset relative to different packs) is much more reasonable, if different than how the code currently works. - objects in a single pack are iterated in index order and searched for in order to discover their offsets, which is much less efficient than using the on-disk reverse index Simplify the function by addressing each of the above and moving the core of the loop into a callback function that we then pass to for_each_packed_object() instead of open-coding the latter function ourselves. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/pack-objects.c | 84 ++++++++---------------------------------- 1 file changed, 16 insertions(+), 68 deletions(-)