diff mbox series

[2/3] builtin/pack-objects.c: simplify add_objects_in_unpacked_packs()

Message ID c857e12a032f197626cd6a5eb0eafc66afbb5bed.1630291682.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit a9fd2f207d0318cd7a4771f13c9fb4759d280f68
Headers show
Series pack-objects: simplify add_objects_in_unpacked_packs() | expand

Commit Message

Taylor Blau Aug. 30, 2021, 2:48 a.m. UTC
This function is used to implement `pack-objects`'s `--keep-unreachable`
option, but can be simplified in a couple of ways:

  - add_objects_in_unpacked_packs() iterates over all packs (and then
    all packed objects) itself, but could use for_each_packed_object()
    instead since the missing flags necessary were added in the previous
    commit

  - objects are added to an in_pack array which store (off_t, object)
    tuples, and then sorted in offset order when we could iterate
    objects in offset order.

    There is a slight behavior change here: before we would have added
    objects in sorted offset order among _all_ packs. Handing objects to
    create_object_entry() in pack order for each pack (instead of
    feeding objects from all packs simultaneously their offset relative
    to different packs) is much more reasonable, if different than how
    the code currently works.

  - objects in a single pack are iterated in index order and searched
    for in order to discover their offsets, which is much less efficient
    than using the on-disk reverse index

Simplify the function by addressing each of the above and moving the
core of the loop into a callback function that we then pass to
for_each_packed_object() instead of open-coding the latter function
ourselves.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 84 ++++++++----------------------------------
 1 file changed, 16 insertions(+), 68 deletions(-)

Comments

Jeff King Sept. 2, 2021, 11:04 p.m. UTC | #1
On Sun, Aug 29, 2021 at 10:48:54PM -0400, Taylor Blau wrote:

> +static int add_object_in_unpacked_pack(const struct object_id *oid,
> +				       struct packed_git *pack,
> +				       uint32_t pos,
> +				       void *_data)
>  {
> [...]
> +	struct object *obj = lookup_unknown_object(the_repository, oid);
> +	if (obj->flags & OBJECT_ADDED)
> +		return 0;
> +	add_object_entry(oid, obj->type, "", 0);
> +	obj->flags |= OBJECT_ADDED;
> +	return 0;
>  }

This is not new in your patch series, but while merging this with
another topic I had, I noticed another opportunity for optimization
here. We already know the pack/pos in which we found the object. But
when we call add_object_entry(), it will do another lookup, via
want_object_in_pack().

We could shortcut that by providing it with the extra information, the
way add_object_entry_from_bitmap() does. The original code before your
series could have done the same optimization, but it became much more
obvious after your series, since -Wunused-parameters notices that we do
not look at the "pack" or "pos" parameters at all. :)

It may not be that exciting an optimization, though. Pack lookups aren't
_that_ expensive, and the pack-mru code would mean we always find it in
the first pack (since by definition we're iterating through the objects
in whole packs, our locality is perfect).

It would also probably involve some slight refactoring of
add_object_entry() to avoid duplication (though possibly the result
could reduce similar duplication with the bitmap variant). Hmm, actually
looking further, we already have add_object_entry_from_pack() for
--stdin-packs.

So I offer it mainly as an observation, in case somebody wants to look
into it further (both for the optimization and the possibility of
simplifying the code).

-Peff
Taylor Blau Sept. 3, 2021, 2:33 a.m. UTC | #2
On Thu, Sep 02, 2021 at 07:04:36PM -0400, Jeff King wrote:
> On Sun, Aug 29, 2021 at 10:48:54PM -0400, Taylor Blau wrote:
>
> > +static int add_object_in_unpacked_pack(const struct object_id *oid,
> > +				       struct packed_git *pack,
> > +				       uint32_t pos,
> > +				       void *_data)
> >  {
> > [...]
> > +	struct object *obj = lookup_unknown_object(the_repository, oid);
> > +	if (obj->flags & OBJECT_ADDED)
> > +		return 0;
> > +	add_object_entry(oid, obj->type, "", 0);
> > +	obj->flags |= OBJECT_ADDED;
> > +	return 0;
> >  }
>
> This is not new in your patch series, but while merging this with
> another topic I had, I noticed another opportunity for optimization
> here. We already know the pack/pos in which we found the object. But
> when we call add_object_entry(), it will do another lookup, via
> want_object_in_pack().
>
> [...]
>
> It would also probably involve some slight refactoring of
> add_object_entry() to avoid duplication (though possibly the result
> could reduce similar duplication with the bitmap variant). Hmm, actually
> looking further, we already have add_object_entry_from_pack() for
> --stdin-packs.

I was trying to remember how I handled this for cruft packs (which want
to take a slightly different path to add_object_entry() in order to
record the object mtimes along the way). Indeed, in that case there is a
special add_cruft_object_entry() which uses pack and offset directly.

Having looked at all of these functions which are layered on top of
add_object_entry() recently, I tend to agree that there is some room
for clean-up there.

I plan on sending the cruft pack series soon, after I untangle all of
the `repack` + multi-pack bitmaps stuff, so perhaps that will be a good
time to pursue this.

Anyway, just to say that I probably agree that the pack mru cache is
helping us enough to make this negligible, but that it is on my mind and
we'll have a chance to look at it soon.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index df49f656b9..87ddbd5553 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3505,79 +3505,27 @@  static void show_edge(struct commit *commit)
 	add_preferred_base(&commit->object.oid);
 }
 
-struct in_pack_object {
-	off_t offset;
-	struct object *object;
-};
-
-struct in_pack {
-	unsigned int alloc;
-	unsigned int nr;
-	struct in_pack_object *array;
-};
-
-static void mark_in_pack_object(struct object *object, struct packed_git *p, struct in_pack *in_pack)
+static int add_object_in_unpacked_pack(const struct object_id *oid,
+				       struct packed_git *pack,
+				       uint32_t pos,
+				       void *_data)
 {
-	in_pack->array[in_pack->nr].offset = find_pack_entry_one(object->oid.hash, p);
-	in_pack->array[in_pack->nr].object = object;
-	in_pack->nr++;
-}
-
-/*
- * Compare the objects in the offset order, in order to emulate the
- * "git rev-list --objects" output that produced the pack originally.
- */
-static int ofscmp(const void *a_, const void *b_)
-{
-	struct in_pack_object *a = (struct in_pack_object *)a_;
-	struct in_pack_object *b = (struct in_pack_object *)b_;
-
-	if (a->offset < b->offset)
-		return -1;
-	else if (a->offset > b->offset)
-		return 1;
-	else
-		return oidcmp(&a->object->oid, &b->object->oid);
+	struct object *obj = lookup_unknown_object(the_repository, oid);
+	if (obj->flags & OBJECT_ADDED)
+		return 0;
+	add_object_entry(oid, obj->type, "", 0);
+	obj->flags |= OBJECT_ADDED;
+	return 0;
 }
 
 static void add_objects_in_unpacked_packs(void)
 {
-	struct packed_git *p;
-	struct in_pack in_pack;
-	uint32_t i;
-
-	memset(&in_pack, 0, sizeof(in_pack));
-
-	for (p = get_all_packs(the_repository); p; p = p->next) {
-		struct object_id oid;
-		struct object *o;
-
-		if (!p->pack_local || p->pack_keep || p->pack_keep_in_core)
-			continue;
-		if (open_pack_index(p))
-			die(_("cannot open pack index"));
-
-		ALLOC_GROW(in_pack.array,
-			   in_pack.nr + p->num_objects,
-			   in_pack.alloc);
-
-		for (i = 0; i < p->num_objects; i++) {
-			nth_packed_object_id(&oid, p, i);
-			o = lookup_unknown_object(the_repository, &oid);
-			if (!(o->flags & OBJECT_ADDED))
-				mark_in_pack_object(o, p, &in_pack);
-			o->flags |= OBJECT_ADDED;
-		}
-	}
-
-	if (in_pack.nr) {
-		QSORT(in_pack.array, in_pack.nr, ofscmp);
-		for (i = 0; i < in_pack.nr; i++) {
-			struct object *o = in_pack.array[i].object;
-			add_object_entry(&o->oid, o->type, "", 0);
-		}
-	}
-	free(in_pack.array);
+	if (for_each_packed_object(add_object_in_unpacked_pack, NULL,
+				   FOR_EACH_OBJECT_PACK_ORDER |
+				   FOR_EACH_OBJECT_LOCAL_ONLY |
+				   FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS |
+				   FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS))
+		die(_("cannot open pack index"));
 }
 
 static int add_loose_object(const struct object_id *oid, const char *path,