diff mbox series

[v2,6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic

Message ID c3868c7df92484f0527ce500ad1156275be334e8.1612411124.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series repack: support repacking into a geometric sequence | expand

Commit Message

Taylor Blau Feb. 4, 2021, 3:59 a.m. UTC
From: Jeff King <peff@peff.net>

Now that we have find_kept_pack_entry(), we don't have to manually keep
hunting through every pack to find a possible "kept" duplicate of the
object. This should be faster, assuming only a portion of your total
packs are actually kept.

Note that we have to re-order the logic a bit here; we can deal with the
"kept" situation completely, and then just fall back to the "--local"
question. It might be worth having a similar optimized function to look
at only local packs.

Here are the results from p5303 (measurements again taken on the
kernel):

  Test                                        HEAD^                    HEAD
  -----------------------------------------------------------------------------------------------
  5303.5: repack (1)                          57.42(54.88+10.64)       57.44(54.71+10.78) +0.0%
  5303.6: repack with --stdin-packs (1)       0.01(0.01+0.00)          0.01(0.00+0.01) +0.0%
  5303.10: repack (50)                        71.26(88.24+4.96)        71.32(88.38+4.90) +0.1%
  5303.11: repack with --stdin-packs (50)     3.49(11.82+0.28)         3.43(11.81+0.22) -1.7%
  5303.15: repack (1000)                      215.64(491.33+14.80)     215.59(493.75+14.62) -0.0%
  5303.16: repack with --stdin-packs (1000)   198.79(380.51+7.97)      131.44(314.24+8.11) -33.9%

So our --stdin-packs case with many packs is now finally faster than the
non-keep case (because it gets the speed benefit of looking at fewer
objects, but not as big a penalty for looking at many packs).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 125 ++++++++++++++++++++++++-----------------
 1 file changed, 73 insertions(+), 52 deletions(-)

Comments

Jeff King Feb. 17, 2021, 4:05 p.m. UTC | #1
On Wed, Feb 03, 2021 at 10:59:17PM -0500, Taylor Blau wrote:

> @@ -1209,22 +1210,73 @@ static int want_found_object(int exclude, struct packed_git *p)
>  	 * Otherwise, we signal "-1" at the end to tell the caller that we do
>  	 * not know either way, and it needs to check more packs.
>  	 */
> -	if (!ignore_packed_keep_on_disk &&
> -	    !ignore_packed_keep_in_core &&
> -	    (!local || !have_non_local_packs))
> +
> +	/*
> +	 * Handle .keep first, as we have a fast(er) path there.
> +	 */
> +	if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) {
> +		/*
> +		 * Set the flags for the kept-pack cache to be the ones we want
> +		 * to ignore.
> +		 *
> +		 * That is, if we are ignoring objects in on-disk keep packs,
> +		 * then we want to search through the on-disk keep and ignore
> +		 * the in-core ones.
> +		 */
> +		unsigned flags = 0;
> +		if (ignore_packed_keep_on_disk)
> +			flags |= ON_DISK_KEEP_PACKS;
> +		if (ignore_packed_keep_in_core)
> +			flags |= IN_CORE_KEEP_PACKS;
> +
> +		if (ignore_packed_keep_on_disk && p->pack_keep)
> +			return 0;
> +		if (ignore_packed_keep_in_core && p->pack_keep_in_core)
> +			return 0;
> +		if (has_object_kept_pack(oid, flags))
> +			return 0;
> +	}
> +
> +	/*
> +	 * At this point we know definitively that either we don't care about
> +	 * keep-packs, or the object is not in one. Keep checking other
> +	 * conditions...
> +	 */
> +
> +	if (!local || !have_non_local_packs)
>  		return 1;
> -
>  	if (local && !p->pack_local)
>  		return 0;
> -	if (p->pack_local &&
> -	    ((ignore_packed_keep_on_disk && p->pack_keep) ||
> -	     (ignore_packed_keep_in_core && p->pack_keep_in_core)))
> -		return 0;
>  
>  	/* we don't know yet; keep looking for more packs */
>  	return -1;

I know I wrote this patch, but just looking it over again with a
critical eye: it looks like more re-ordering could avoid work in some
cases.

In particular, has_object_kept_pack() is a potentially expensive call.
But if "(local && !p->pack_local)" is true, then we could cheaply exit
the function with "0", regardless of what the keep requirement says.

That's not a case that I think anybody cares that deeply about (and it
certainly is not covered by t/perf). But I think it does regress in this
patch. Prior to the patch, we'd check that condition before returning
-1, and it was the caller who would then continue to search through all
the kept packs. Now we do it preemptively.

I think just bumping that:

  if (local && !p->pack_local)
	return 0;

above the new code would fix it. Or to lay out the logic more fully, the
order of checks should be:

  - does _this_ pack we found the object in disqualify it. If so, we can
    cheaply return 0. And that applies to both keep and local rules.

  - otherwise, check all packs via has_object_kept_pack(), which is
    cheaper than continuing to iterate through all packs by returning
    -1.

  - once we know definitively about keep-packs, then check any shortcuts
    related to local packs (like !have_non_local_packs)

  - and then if no shortcuts, we return -1

I think that might be easier to express by rewriting the patch. :)

-Peff
Taylor Blau Feb. 17, 2021, 7:23 p.m. UTC | #2
On Wed, Feb 17, 2021 at 11:05:22AM -0500, Jeff King wrote:
> I think just bumping that:
>
>   if (local && !p->pack_local)
> 	return 0;

> above the new code would fix it. Or to lay out the logic more fully, the
> order of checks should be:

>   - does _this_ pack we found the object in disqualify it. If so, we can
>     cheaply return 0. And that applies to both keep and local rules.
>
>   - otherwise, check all packs via has_object_kept_pack(), which is
>     cheaper than continuing to iterate through all packs by returning
>     -1.
>
>   - once we know definitively about keep-packs, then check any shortcuts
>     related to local packs (like !have_non_local_packs)
>
>   - and then if no shortcuts, we return -1

I don't understand what you're suggesting. Is the (local &&
!p->pack_local) a disqualifying condition? Reading the comment, I think
it is, and so we could do something like:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 36c2fa3aff..be3ba60bc2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1205,14 +1205,21 @@ static int want_found_object(const struct object_id *oid, int exclude,
         * make sure no copy of this object appears in _any_ pack that makes us
         * to omit the object, so we need to check all the packs.
         *
-        * We can however first check whether these options can possible matter;
+        * We can however first check whether these options can possibly matter;
         * if they do not matter we know we want the object in generated pack.
         * Otherwise, we signal "-1" at the end to tell the caller that we do
         * not know either way, and it needs to check more packs.
         */

        /*
-        * Handle .keep first, as we have a fast(er) path there.
+        * Objects in packs borrowed from elsewhere are discarded regardless of
+        * if they appear in other packs that weren't borrowed.
+        */
+       if (local && !p->pack_local)
+               return 0;
+
+       /*
+        * Then handle .keep first, as we have a fast(er) path there.
         */
        if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) {
                /*
@@ -1242,11 +1249,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
         * keep-packs, or the object is not in one. Keep checking other
         * conditions...
         */
-
        if (!local || !have_non_local_packs)
                return 1;
-       if (local && !p->pack_local)
-               return 0;

        /* we don't know yet; keep looking for more packs */
        return -1;

But your "check any shortcuts related to local packs" makes me think
that we should leave the code as-is.

Which are you suggesting?

Thanks,
Taylor
Jeff King Feb. 17, 2021, 7:29 p.m. UTC | #3
On Wed, Feb 17, 2021 at 02:23:27PM -0500, Taylor Blau wrote:

> On Wed, Feb 17, 2021 at 11:05:22AM -0500, Jeff King wrote:
> > I think just bumping that:
> >
> >   if (local && !p->pack_local)
> > 	return 0;
> 
> > above the new code would fix it. Or to lay out the logic more fully, the
> > order of checks should be:
> 
> >   - does _this_ pack we found the object in disqualify it. If so, we can
> >     cheaply return 0. And that applies to both keep and local rules.
> >
> >   - otherwise, check all packs via has_object_kept_pack(), which is
> >     cheaper than continuing to iterate through all packs by returning
> >     -1.
> >
> >   - once we know definitively about keep-packs, then check any shortcuts
> >     related to local packs (like !have_non_local_packs)
> >
> >   - and then if no shortcuts, we return -1
> 
> I don't understand what you're suggesting. Is the (local &&
> !p->pack_local) a disqualifying condition? Reading the comment, I think
> it is, and so we could do something like:

That's exactly what I'm suggesting. If we have a non-local pack and were
given --local, then we can shortcut immediately without caring about
kept packs: we know that we do not want the object.

> [...]
> But your "check any shortcuts related to local packs" makes me think
> that we should leave the code as-is.

No, the "shortcuts" there is the opposite:

  if (!local || !have_non_local_packs)
	return 1;

If either of those is true, we can say "definitely include" but only
with respect to the --local requirement. So we _can't_ bump that up, but
must check it only after we've definitively resolved the keep-pack
requirement.

-Peff
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6d19eb000a..fbd7b54d70 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1188,7 +1188,8 @@  static int have_duplicate_entry(const struct object_id *oid,
 	return 1;
 }
 
-static int want_found_object(int exclude, struct packed_git *p)
+static int want_found_object(const struct object_id *oid, int exclude,
+			     struct packed_git *p)
 {
 	if (exclude)
 		return 1;
@@ -1209,22 +1210,73 @@  static int want_found_object(int exclude, struct packed_git *p)
 	 * Otherwise, we signal "-1" at the end to tell the caller that we do
 	 * not know either way, and it needs to check more packs.
 	 */
-	if (!ignore_packed_keep_on_disk &&
-	    !ignore_packed_keep_in_core &&
-	    (!local || !have_non_local_packs))
+
+	/*
+	 * Handle .keep first, as we have a fast(er) path there.
+	 */
+	if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) {
+		/*
+		 * Set the flags for the kept-pack cache to be the ones we want
+		 * to ignore.
+		 *
+		 * That is, if we are ignoring objects in on-disk keep packs,
+		 * then we want to search through the on-disk keep and ignore
+		 * the in-core ones.
+		 */
+		unsigned flags = 0;
+		if (ignore_packed_keep_on_disk)
+			flags |= ON_DISK_KEEP_PACKS;
+		if (ignore_packed_keep_in_core)
+			flags |= IN_CORE_KEEP_PACKS;
+
+		if (ignore_packed_keep_on_disk && p->pack_keep)
+			return 0;
+		if (ignore_packed_keep_in_core && p->pack_keep_in_core)
+			return 0;
+		if (has_object_kept_pack(oid, flags))
+			return 0;
+	}
+
+	/*
+	 * At this point we know definitively that either we don't care about
+	 * keep-packs, or the object is not in one. Keep checking other
+	 * conditions...
+	 */
+
+	if (!local || !have_non_local_packs)
 		return 1;
-
 	if (local && !p->pack_local)
 		return 0;
-	if (p->pack_local &&
-	    ((ignore_packed_keep_on_disk && p->pack_keep) ||
-	     (ignore_packed_keep_in_core && p->pack_keep_in_core)))
-		return 0;
 
 	/* we don't know yet; keep looking for more packs */
 	return -1;
 }
 
+static int want_object_in_pack_one(struct packed_git *p,
+				   const struct object_id *oid,
+				   int exclude,
+				   struct packed_git **found_pack,
+				   off_t *found_offset)
+{
+	off_t offset;
+
+	if (p == *found_pack)
+		offset = *found_offset;
+	else
+		offset = find_pack_entry_one(oid->hash, p);
+
+	if (offset) {
+		if (!*found_pack) {
+			if (!is_pack_valid(p))
+				return -1;
+			*found_offset = offset;
+			*found_pack = p;
+		}
+		return want_found_object(oid, exclude, p);
+	}
+	return -1;
+}
+
 /*
  * Check whether we want the object in the pack (e.g., we do not want
  * objects found in non-local stores if the "--local" option was used).
@@ -1252,7 +1304,7 @@  static int want_object_in_pack(const struct object_id *oid,
 	 * are present we will determine the answer right now.
 	 */
 	if (*found_pack) {
-		want = want_found_object(exclude, *found_pack);
+		want = want_found_object(oid, exclude, *found_pack);
 		if (want != -1)
 			return want;
 	}
@@ -1260,53 +1312,22 @@  static int want_object_in_pack(const struct object_id *oid,
 	for (m = get_multi_pack_index(the_repository); m; m = m->next) {
 		struct pack_entry e;
 		if (fill_midx_entry(the_repository, oid, &e, m)) {
-			struct packed_git *p = e.p;
-			off_t offset;
-
-			if (p == *found_pack)
-				offset = *found_offset;
-			else
-				offset = find_pack_entry_one(oid->hash, p);
-
-			if (offset) {
-				if (!*found_pack) {
-					if (!is_pack_valid(p))
-						continue;
-					*found_offset = offset;
-					*found_pack = p;
-				}
-				want = want_found_object(exclude, p);
-				if (want != -1)
-					return want;
-			}
-		}
-	}
-
-	list_for_each(pos, get_packed_git_mru(the_repository)) {
-		struct packed_git *p = list_entry(pos, struct packed_git, mru);
-		off_t offset;
-
-		if (p == *found_pack)
-			offset = *found_offset;
-		else
-			offset = find_pack_entry_one(oid->hash, p);
-
-		if (offset) {
-			if (!*found_pack) {
-				if (!is_pack_valid(p))
-					continue;
-				*found_offset = offset;
-				*found_pack = p;
-			}
-			want = want_found_object(exclude, p);
-			if (!exclude && want > 0)
-				list_move(&p->mru,
-					  get_packed_git_mru(the_repository));
+			want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset);
 			if (want != -1)
 				return want;
 		}
 	}
 
+	list_for_each(pos, get_packed_git_mru(the_repository)) {
+		struct packed_git *p = list_entry(pos, struct packed_git, mru);
+		want = want_object_in_pack_one(p, oid, exclude, found_pack, found_offset);
+		if (!exclude && want > 0)
+			list_move(&p->mru,
+				  get_packed_git_mru(the_repository));
+		if (want != -1)
+			return want;
+	}
+
 	if (uri_protocols.nr) {
 		struct configured_exclusion *ex =
 			oidmap_get(&configured_exclusions, oid);