diff mbox series

[v3,2/2] builtin/pack-objects.c: introduce `pack.recentObjectsHook`

Message ID 2ce8a79fa4bf98947728da4e6e22304a2f203fac.1683847221.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series pack-objects: introduce `pack.extraRecentObjectsHook` | expand

Commit Message

Taylor Blau May 11, 2023, 11:20 p.m. UTC
This patch introduces a new multi-valued configuration option,
`pack.recentObjectsHook` as a means to mark certain objects as recent,
regardless of their age.

Depending on whether or not we are generating a cruft pack, this allows
the caller to do one of two things:

  - If generating a cruft pack, the caller is able to retain additional
    objects via the cruft pack, even if they would have otherwise been
    pruned due to their age.

  - If not generating a cruft pack, the caller is likewise able to
    retain additional objects as loose.

There is currently no option to be able to keep around certain objects
that have otherwise aged out of the grace period. The only way to retain
those objects is:

  - to point a reference at them, which may be undesirable or
    infeasible,

  - to track them via the reflog, which may be undesirable since the
    reflog's lifetime is limited to that of the reference it's tracking
    (and callers may want to keep those unreachable objects around for
    longer)

  - to extend the grace period, which may keep around other objects that
    the caller *does* want to discard,

  - or, to force the caller to construct the pack of objects they want
    to keep themselves, and then mark the pack as kept by adding a
    ".keep" file.

This patch introduces a new configuration, `pack.recentObjectsHook`
which allows the caller to specify a program (or set of programs) whose
output is treated as a set of objects to treat as recent, regardless of
their true age.

The implementation is straightforward. In either case (cruft packs or
not), Git enumerates recent objects via
`add_unseen_recent_objects_to_traversal()`. That enumerates loose and
packed objects, and eventually calls add_recent_object() on any objects
for which `want_recent_object()`'s conditions are met.

This patch modifies the recency condition from simply "is the mtime of
this object more recent than the cutoff?" to "[...] or, is this object
mentioned by at least one `pack.recentObjectsHook`?".

We then add those as tips to another reachability traversal (along with
any recent objects, if pruning), marking every object along the way
(either adding it to the cruft pack, or writing it out as a loose
object).

A potential alternative here is to introduce a new mode to alter the
contents of the reachable pack instead of the cruft one. One could
imagine a new option to `pack-objects`, say `--extra-reachable-tips`
that does the same thing as above, adding the visited set of objects
along the traversal to the pack.

But this has the unfortunate side-effect of altering the reachability
closure of that pack. If parts of the unreachable object graph mentioned
by one or more of the "extra reachable tips" programs is not closed,
then the resulting pack won't be either. This makes it impossible in the
general case to write out reachability bitmaps for that pack, since
closure is a requirement there.

Instead, keep these unreachable objects in the cruft pack (or set of
unreachable, loose objects) instead, to ensure that we can continue to
have a pack containing just reachable objects, which is always safe to
write a bitmap over.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/pack.txt        |  13 ++
 builtin/pack-objects.c               |   1 +
 reachable.c                          |  94 ++++++++++++++-
 t/t5329-pack-objects-cruft.sh        | 171 +++++++++++++++++++++++++++
 t/t7701-repack-unpack-unreachable.sh |  22 ++++
 5 files changed, 297 insertions(+), 4 deletions(-)

Comments

Jeff King May 12, 2023, 4:58 a.m. UTC | #1
On Thu, May 11, 2023 at 07:20:37PM -0400, Taylor Blau wrote:

> +static int run_one_pack_recent_objects_hook(struct oidset *set,
> +					    const char *args)
> +{
> +	struct child_process cmd = CHILD_PROCESS_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	FILE *out;
> +	int ret = 0;
> +
> +	cmd.use_shell = 1;
> +	cmd.out = -1;
> +
> +	strvec_push(&cmd.args, args);
> +
> +	if (start_command(&cmd))
> +		return -1;
> +
> +	out = xfdopen(cmd.out, "r");
> +	while (strbuf_getline(&buf, out) != EOF) {
> +		struct object_id oid;
> +		const char *rest;
> +
> +		if (parse_oid_hex(buf.buf, &oid, &rest) || *rest) {
> +			ret = error(_("invalid extra cruft tip: '%s'"), buf.buf);
> +			goto done;
> +		}
> +
> +		oidset_insert(set, &oid);
> +	}
> +
> +	ret = finish_command(&cmd);
> +
> +done:

I haven't looked closely at this whole patch yet (and I especially want
to look at the new tests since this approach covers more cases), but I
did notice that this version of the function still has the "we don't
reap the child on parse failure" problem I described in:

  https://lore.kernel.org/git/20230505221921.GE3321533@coredump.intra.peff.net/

I'll try to give the whole patch a more thorough review tomorrow.

-Peff
Jeff King May 12, 2023, 9:24 p.m. UTC | #2
On Thu, May 11, 2023 at 07:20:37PM -0400, Taylor Blau wrote:

> This patch introduces a new multi-valued configuration option,
> `pack.recentObjectsHook` as a means to mark certain objects as recent,
> regardless of their age.
> 
> Depending on whether or not we are generating a cruft pack, this allows
> the caller to do one of two things:
> 
>   - If generating a cruft pack, the caller is able to retain additional
>     objects via the cruft pack, even if they would have otherwise been
>     pruned due to their age.
> 
>   - If not generating a cruft pack, the caller is likewise able to
>     retain additional objects as loose.

I think this description suffers a bit from being adapted from the
original patch which was targeting cruft packs. It's not clear to me
what "the caller" means here. And really, I think this is getting into
the details before giving an overview and motivation.

I'd expect something the rationale to be something like:

  1. Git considers the recent-ness of objects when deciding how to handle
     unreachable objects (discarding those that are too old, and keeping
     new-enough ones)

  2. you might want to consider more objects as "recent" because you know
     something Git doesn't (that certain objects are more interesting
     than others; specific examples may help here)

  3. so this patch gives you a way to override the recent-ness check for
     certain objects

And then get into the details of the implementation, why other solutions
don't work, etc.

> There is currently no option to be able to keep around certain objects
> that have otherwise aged out of the grace period. The only way to retain
> those objects is:
> 
>   - to point a reference at them, which may be undesirable or
>     infeasible,
> 
>   - to track them via the reflog, which may be undesirable since the
>     reflog's lifetime is limited to that of the reference it's tracking
>     (and callers may want to keep those unreachable objects around for
>     longer)
> 
>   - to extend the grace period, which may keep around other objects that
>     the caller *does* want to discard,
> 
>   - or, to force the caller to construct the pack of objects they want
>     to keep themselves, and then mark the pack as kept by adding a
>     ".keep" file.

One option I don't see here is: update the mtime on the objects you want
to salvage.

Why would we want this patch instead of just having the caller update
the mtimes of objects (or in a cruft-pack world, call a command that
rewrites the .mtimes file with new values)?

I can think of some possible arguments against it (you might want to
retain the old mtimes, or you might find it a hassle to have to
continually update them before gc kills them). But I think the commit
message should probably make those arguments.

Likewise, I think you'd want to argue why the ".keep" approach isn't as
good (and I guess it is "having extra packs is awkward especially as you
roll forward your cruft packs").

> This patch introduces a new configuration, `pack.recentObjectsHook`
> which allows the caller to specify a program (or set of programs) whose
> output is treated as a set of objects to treat as recent, regardless of
> their true age.

I was going to complain about putting this in the "pack" section,
because I thought by touching reachable.c, we'd also affect git-prune.
But I don't think we do, because it does its own direct mtime check on
the loose objects.

But I'm not sure that's the right behavior.

It feels like even before your patch, this is a huge gap in our
object-retention strategy.  During repacking, we try to avoid dropping
objects which are reachable from recent-but-unreachable things we're
keeping (since otherwise it effectively corrupts those recent objects,
making them less valuable to keep). But git-prune will happily drop them
anyway!

And I think the same thing would apply to your hook. If the hook says
"object XYZ is precious even if unreachable, keep it", then git-prune
ignoring that seems like it would be a source of errors.

I suspect both could be fixed by having git-prune trigger the same
add_unseen_recent_objects_to_traversal() call either as part of
the perform_reachability_traversal() walk, or maybe in its own walk (I
think maybe it has to be its own because the second walk should avoid
complaining about missing objects).

> We then add those as tips to another reachability traversal (along with
> any recent objects, if pruning), marking every object along the way
> (either adding it to the cruft pack, or writing it out as a loose
> object).

I didn't understand this "if pruning" comment. If we are not pruning at
all, wouldn't we skip the extra traversal entirely, since we know we are
saving everything?

> A potential alternative here is to introduce a new mode to alter the
> contents of the reachable pack instead of the cruft one. One could
> imagine a new option to `pack-objects`, say `--extra-reachable-tips`
> that does the same thing as above, adding the visited set of objects
> along the traversal to the pack.
> 
> But this has the unfortunate side-effect of altering the reachability
> closure of that pack. If parts of the unreachable object graph mentioned
> by one or more of the "extra reachable tips" programs is not closed,
> then the resulting pack won't be either. This makes it impossible in the
> general case to write out reachability bitmaps for that pack, since
> closure is a requirement there.

Makes sense.

> +	while (strbuf_getline(&buf, out) != EOF) {
> +		struct object_id oid;
> +		const char *rest;
> +
> +		if (parse_oid_hex(buf.buf, &oid, &rest) || *rest) {
> +			ret = error(_("invalid extra cruft tip: '%s'"), buf.buf);
> +			goto done;
> +		}
> +
> +		oidset_insert(set, &oid);
> +	}
> +
> +	ret = finish_command(&cmd);
> +
> +done:

I mentioned the "goto done" issue in another email.

>  static int obj_is_recent(const struct object_id *oid, timestamp_t mtime,
>  			 struct recent_data *data)
>  {
> -	return mtime > data->timestamp;
> +	if (mtime > data->timestamp)
> +		return 1;
> +
> +	if (!data->extra_recent_oids_loaded)
> +		load_pack_recent_objects(data);
> +	return oidset_contains(&data->extra_recent_oids, oid);
>  }

OK, so this is where we override the timestamp check. Makes sense.

> @@ -126,8 +198,14 @@ static int want_recent_object(struct recent_data *data,
>  			      const struct object_id *oid)
>  {
>  	if (data->ignore_in_core_kept_packs &&
> -	    has_object_kept_pack(oid, IN_CORE_KEEP_PACKS))
> +	    has_object_kept_pack(oid, IN_CORE_KEEP_PACKS)) {
> +		if (!data->extra_recent_oids_loaded)
> +			load_pack_recent_objects(data);
> +		if (oidset_contains(&data->extra_recent_oids, oid))
> +			return 1;
> +
>  		return 0;
> +	}
>  	return 1;
>  }

This hunk I'm less sure about. The purpose of this function is that the
caller has told us about some packs which are "special", and we avoid
adding their objects to the traversal.

This kicks in for cruft packs, when the git-repack caller says "I just
made pack xyz.pack; do not bother saving anything in it to the cruft
pack, since xyz.pack is here to stay". So if a hook says "you should
keep object X", why would we want to override that check? It is already
a reachable object that has been packed into xyz.pack, so we know there
is no point in even considering its recency.

> @@ -199,16 +277,24 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
>  	data.cb = cb;
>  	data.ignore_in_core_kept_packs = ignore_in_core_kept_packs;
>  
> +	oidset_init(&data.extra_recent_oids, 0);
> +	data.extra_recent_oids_loaded = 0;
> +
>  	r = for_each_loose_object(add_recent_loose, &data,
>  				  FOR_EACH_OBJECT_LOCAL_ONLY);
>  	if (r)
> -		return r;
> +		goto done;
>  
>  	flags = FOR_EACH_OBJECT_LOCAL_ONLY | FOR_EACH_OBJECT_PACK_ORDER;
>  	if (ignore_in_core_kept_packs)
>  		flags |= FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS;
>  
> -	return for_each_packed_object(add_recent_packed, &data, flags);
> +	r = for_each_packed_object(add_recent_packed, &data, flags);
> +
> +done:
> +	oidset_clear(&data.extra_recent_oids);
> +
> +	return r;
>  }

This handles the cleanup that I was too lazy to do in my earlier patch.
Good. :)

> --- a/t/t5329-pack-objects-cruft.sh
> +++ b/t/t5329-pack-objects-cruft.sh
> @@ -739,4 +739,175 @@ test_expect_success 'cruft objects are freshend via loose' '
>  	)
>  '
>  
> +test_expect_success 'additional cruft tips may be specified via pack.extraCruftTips' '

This title (and others below) seems out of date. :)

> diff --git a/t/t7701-repack-unpack-unreachable.sh b/t/t7701-repack-unpack-unreachable.sh
> index ebb267855f..d2eea6e754 100755
> --- a/t/t7701-repack-unpack-unreachable.sh
> +++ b/t/t7701-repack-unpack-unreachable.sh
> @@ -113,6 +113,28 @@ test_expect_success 'do not bother loosening old objects' '
>  	test_must_fail git cat-file -p $obj2
>  '
>  
> +test_expect_success 'extra recent tips are kept regardless of age' '
> +	obj1=$(echo one | git hash-object -w --stdin) &&
> +	obj2=$(echo two | git hash-object -w --stdin) &&
> +	pack1=$(echo $obj1 | git pack-objects .git/objects/pack/pack) &&
> +	pack2=$(echo $obj2 | git pack-objects .git/objects/pack/pack) &&
> +	git prune-packed &&
> +
> +	git cat-file -p $obj1 &&
> +	git cat-file -p $obj2 &&
> +
> +	write_script extra-tips <<-EOF &&
> +	echo $obj2
> +	EOF
> +	git config pack.recentObjectsHook ./extra-tips &&
> +
> +	test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
> +	git repack -A -d --unpack-unreachable=1.hour.ago &&
> +
> +	git cat-file -p $obj1 &&
> +	git cat-file -p $obj2
> +'

And this is the new test in this iteration covering the "repack -A"
case.

It is checking that $obj2, which our hook mentions, is saved. It also
checks that $obj1 is saved because it is still recent. But there are two
other possibly interesting cases:

  - an object that is too old and is _not_ saved. It seems useful to
    confirm that the new patch does not simply break the ability to drop
    objects. ;)

  - an object that is reachable from $obj2 is also saved. From a
    white-box perspective this is less interesting, because we should
    already test elsewhere that this works for recent objects, and we
    know the new feature is implemented by faking recency. But it might
    be worth it for completeness, and because it's easy to do (making
    $obj2 a tag pointing to a blob should work).

-Peff
Taylor Blau May 12, 2023, 9:36 p.m. UTC | #3
On Fri, May 12, 2023 at 05:24:56PM -0400, Jeff King wrote:
> > This patch introduces a new configuration, `pack.recentObjectsHook`
> > which allows the caller to specify a program (or set of programs) whose
> > output is treated as a set of objects to treat as recent, regardless of
> > their true age.
>
> I was going to complain about putting this in the "pack" section,
> because I thought by touching reachable.c, we'd also affect git-prune.
> But I don't think we do, because it does its own direct mtime check on
> the loose objects.
>
> But I'm not sure that's the right behavior.
>
> It feels like even before your patch, this is a huge gap in our
> object-retention strategy.  During repacking, we try to avoid dropping
> objects which are reachable from recent-but-unreachable things we're
> keeping (since otherwise it effectively corrupts those recent objects,
> making them less valuable to keep). But git-prune will happily drop them
> anyway!
>
> And I think the same thing would apply to your hook. If the hook says
> "object XYZ is precious even if unreachable, keep it", then git-prune
> ignoring that seems like it would be a source of errors.
>
> I suspect both could be fixed by having git-prune trigger the same
> add_unseen_recent_objects_to_traversal() call either as part of
> the perform_reachability_traversal() walk, or maybe in its own walk (I
> think maybe it has to be its own because the second walk should avoid
> complaining about missing objects).

I might be missing something, but I think we already (kind of) do the
right thing here.

AFAICT, the path is:

  - cmd_prune()
  - for_each_loose_file_in_objdir()
  - prune_object() (as a callback to the above)
  - is_object_reachable()
  - perform_reachability_traversal()
  - mark_reachable_objects()
  - add_unseen_recent_objects_to_traversal()

That only happens when `mark_recent != 0`, though.

Thanks,
Taylor
Jeff King May 12, 2023, 9:45 p.m. UTC | #4
On Fri, May 12, 2023 at 05:24:56PM -0400, Jeff King wrote:

> > This patch introduces a new configuration, `pack.recentObjectsHook`
> > which allows the caller to specify a program (or set of programs) whose
> > output is treated as a set of objects to treat as recent, regardless of
> > their true age.
> 
> I was going to complain about putting this in the "pack" section,
> because I thought by touching reachable.c, we'd also affect git-prune.
> But I don't think we do, because it does its own direct mtime check on
> the loose objects.
> 
> But I'm not sure that's the right behavior.
> 
> It feels like even before your patch, this is a huge gap in our
> object-retention strategy.  During repacking, we try to avoid dropping
> objects which are reachable from recent-but-unreachable things we're
> keeping (since otherwise it effectively corrupts those recent objects,
> making them less valuable to keep). But git-prune will happily drop them
> anyway!
> 
> And I think the same thing would apply to your hook. If the hook says
> "object XYZ is precious even if unreachable, keep it", then git-prune
> ignoring that seems like it would be a source of errors.
> 
> I suspect both could be fixed by having git-prune trigger the same
> add_unseen_recent_objects_to_traversal() call either as part of
> the perform_reachability_traversal() walk, or maybe in its own walk (I
> think maybe it has to be its own because the second walk should avoid
> complaining about missing objects).

<phew> I am happy to say that I was wrong here, and git-prune behaves as
it should, courtesy of d3038d22f9 (prune: keep objects reachable from
recent objects, 2014-10-15). The magic happens in mark_reachable_objects(),
which handles walking the recent objects by calling...you guessed it,
add_unseen_recent_objects_to_traversal().

So it does the right thing now, and your patch should kick in
automatically for git-prune, too. But I think we'd want two things:

  1. Should the config variable name be made more generic to match?
     Maybe "core" is too broad (though certainly I'd expect it to apply
     anywhere in Git where we check recent-ness of objects), but perhaps
     "gc" would make sense (even though it is not strictly part of the
     gc command, it is within that realm of concepts).

  2. We probably want a test covering git-prune in this situation.

The thing that confused me when looking at the code earlier is that
git-prune itself checks the mtime of the objects, too, even if they were
not mentioned in the recent-but-reachable walk. That seems redundant to
me, but somehow it isn't. If I do:

diff --git a/builtin/prune.c b/builtin/prune.c
index 5dc9b20720..22b7ce4b10 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -92,7 +92,7 @@ static int prune_object(const struct object_id *oid, const char *fullpath,
 		return 0;
 	}
 	if (st.st_mtime > expire)
-		return 0;
+		BUG("object should have been saved via is_object_reachable!");
 	if (show_only || verbose) {
 		enum object_type type = oid_object_info(the_repository, oid,
 							NULL);

then t5304 shows some failures. I'm not quite sure what is going on, but
_if_ that mtime check in prune_object() is important, then it probably
should also respect the hook mechanism. So there may be a (3) to add to
your patch there.

-Peff
Jeff King May 12, 2023, 9:46 p.m. UTC | #5
On Fri, May 12, 2023 at 05:36:15PM -0400, Taylor Blau wrote:

> > I suspect both could be fixed by having git-prune trigger the same
> > add_unseen_recent_objects_to_traversal() call either as part of
> > the perform_reachability_traversal() walk, or maybe in its own walk (I
> > think maybe it has to be its own because the second walk should avoid
> > complaining about missing objects).
> 
> I might be missing something, but I think we already (kind of) do the
> right thing here.
> 
> AFAICT, the path is:
> 
>   - cmd_prune()
>   - for_each_loose_file_in_objdir()
>   - prune_object() (as a callback to the above)
>   - is_object_reachable()
>   - perform_reachability_traversal()
>   - mark_reachable_objects()
>   - add_unseen_recent_objects_to_traversal()
> 
> That only happens when `mark_recent != 0`, though.

Yeah, I'm sorry, I was totally wrong here. See the mail I just sent that
crossed paths with yours.

-Peff
Jeff King May 12, 2023, 10:01 p.m. UTC | #6
On Fri, May 12, 2023 at 05:45:43PM -0400, Jeff King wrote:

> The thing that confused me when looking at the code earlier is that
> git-prune itself checks the mtime of the objects, too, even if they were
> not mentioned in the recent-but-reachable walk. That seems redundant to
> me, but somehow it isn't. If I do:
> 
> diff --git a/builtin/prune.c b/builtin/prune.c
> index 5dc9b20720..22b7ce4b10 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -92,7 +92,7 @@ static int prune_object(const struct object_id *oid, const char *fullpath,
>  		return 0;
>  	}
>  	if (st.st_mtime > expire)
> -		return 0;
> +		BUG("object should have been saved via is_object_reachable!");
>  	if (show_only || verbose) {
>  		enum object_type type = oid_object_info(the_repository, oid,
>  							NULL);
> 
> then t5304 shows some failures. I'm not quite sure what is going on, but
> _if_ that mtime check in prune_object() is important, then it probably
> should also respect the hook mechanism. So there may be a (3) to add to
> your patch there.

Ah, I see. The problem is that "git prune --expire=never" relies on this
check. In mark_reachable_objects(), we take a "0" expiration as "do not
bother adding these objects to the traversal", so it's here that we
catch them and say "ah, we are not expiring anything, so keep this".

If my hack above is switched to:

diff --git a/builtin/prune.c b/builtin/prune.c
index 5dc9b20720..108305cd26 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -91,8 +91,11 @@ static int prune_object(const struct object_id *oid, const char *fullpath,
 		error("Could not stat '%s'", fullpath);
 		return 0;
 	}
-	if (st.st_mtime > expire)
-		return 0;
+	if (st.st_mtime > expire) {
+		if (!expire)
+			return 0; /* we are not expiring anything! */
+		BUG("object should have been saved via is_object_reachable!");
+	}
 	if (show_only || verbose) {
 		enum object_type type = oid_object_info(the_repository, oid,
 							NULL);

then all of the tests pass. I do suspect this code could trigger racily
in the real world (we do our traversal, and then a new object is added,
which is of course recent). So we would not want to drop the check.

Is it important there for your patch to kick in and say "we were
specifically asked to consider this object recent, so do so"? I think it
shouldn't matter, because any such object is by definition recent,
having an mtime greater than or equal to when we started (it cannot even
get racily old while we traverse, because we decide the cutoff time as
an absolute value even before starting the traversal). The strict
greater-than in the check makes it feel like an off-by-one problem, but
unless you are saying "now" it must be at least 1 second in the past
anyway.

It does make me wonder why "git prune --expire=never" does not simply
exit immediately. Isn't there by definition nothing useful to do?

-Peff
Junio C Hamano May 12, 2023, 11:21 p.m. UTC | #7
Jeff King <peff@peff.net> writes:

> It does make me wonder why "git prune --expire=never" does not simply
> exit immediately. Isn't there by definition nothing useful to do?

I think the answer to the question is "no", but if I have to guess
why such a low-hanging fruit optimization possibility has not been
exploited so far is because it does not optimize a useful case,
perhaps?  IOW, falling into "if it hurts, then don't do it" category?
Jeff King May 13, 2023, 12:11 a.m. UTC | #8
On Fri, May 12, 2023 at 04:21:44PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It does make me wonder why "git prune --expire=never" does not simply
> > exit immediately. Isn't there by definition nothing useful to do?
> 
> I think the answer to the question is "no", but if I have to guess
> why such a low-hanging fruit optimization possibility has not been
> exploited so far is because it does not optimize a useful case,
> perhaps?  IOW, falling into "if it hurts, then don't do it" category?

I think it would come up any time you run "git gc", if you've set
gc.pruneExpire to "never". And then it wastes time running a full object
walk (which is 30+ seconds for linux.git) even though it won't do
anything useful.

Curiously, git-gc is sprinkled with "if (prune_expire)" logic, including
skipping the call to git-prune entirely. But that only kicks in if you
run "git gc --no-prune". If "never" is truly the same thing (and I
cannot off the top of my head think of a reason that it isn't), then
this:

diff --git a/builtin/gc.c b/builtin/gc.c
index f3942188a6..5118535a4d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -610,6 +610,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	if (prune_expire && parse_expiry_date(prune_expire, &dummy))
 		die(_("failed to parse prune expiry value %s"), prune_expire);
+	if (prune_expire && !dummy)
+		prune_expire = NULL; /* "never" same as --no-prune */
 
 	if (aggressive) {
 		strvec_push(&repack, "-f");

would bring the two cases in sync (I tried to minimize the diff for
illustration; probably some light refactoring would be in order).

I guess nobody cared because it is not that common to set pruneExpire to
"never". We did something like that at GitHub, but we always drove
repack and prune through our own scripts, not through git-gc. I don't
have access to those scripts anymore, but I'm pretty sure they just
skipped calling git-prune entirely for this case.

So yeah, I think it may just be a curiosity that nobody noticed and
bothered to optimize it. I am tempted to work the above into a proper
patch. We even do something similar already for the reflog expiration
variables.

-Peff
Jeff King May 13, 2023, 12:11 a.m. UTC | #9
On Fri, May 12, 2023 at 08:11:18PM -0400, Jeff King wrote:

> So yeah, I think it may just be a curiosity that nobody noticed and
> bothered to optimize it. I am tempted to work the above into a proper
> patch. We even do something similar already for the reflog expiration
> variables.

Oh, and in case it was not clear: this is all well off Taylor's original
topic. Whatever we do (or don't do) should be its own series.

-Peff
Taylor Blau May 15, 2023, 8:15 p.m. UTC | #10
On Fri, May 12, 2023 at 12:58:53AM -0400, Jeff King wrote:
> I haven't looked closely at this whole patch yet (and I especially want
> to look at the new tests since this approach covers more cases), but I
> did notice that this version of the function still has the "we don't
> reap the child on parse failure" problem I described in:
>
>   https://lore.kernel.org/git/20230505221921.GE3321533@coredump.intra.peff.net/

Hmmph. I could have sworn that I remember including that feedback in the
new round, but I must have dropped it on the floor somewhere.

Thanks for pointing it out, I'm 99% sure that it'll be in the next round
;-).

Thanks,
Taylor
Taylor Blau May 15, 2023, 8:38 p.m. UTC | #11
On Fri, May 12, 2023 at 05:24:56PM -0400, Jeff King wrote:
> I think this description suffers a bit from being adapted from the
> original patch which was targeting cruft packs. It's not clear to me
> what "the caller" means here. And really, I think this is getting into
> the details before giving an overview and motivation.
>
> I'd expect something the rationale to be something like:

Re-reading it myself, I tend to agree with you. I modified it quite a
bit, and I'm much happier with the result. Thanks for mentioning it.

> One option I don't see here is: update the mtime on the objects you want
> to salvage.
>
> Why would we want this patch instead of just having the caller update
> the mtimes of objects (or in a cruft-pack world, call a command that
> rewrites the .mtimes file with new values)?
>
> I can think of some possible arguments against it (you might want to
> retain the old mtimes, or you might find it a hassle to have to
> continually update them before gc kills them). But I think the commit
> message should probably make those arguments.

I agree with everything you wrote here.

> > We then add those as tips to another reachability traversal (along with
> > any recent objects, if pruning), marking every object along the way
> > (either adding it to the cruft pack, or writing it out as a loose
> > object).
>
> I didn't understand this "if pruning" comment. If we are not pruning at
> all, wouldn't we skip the extra traversal entirely, since we know we are
> saving everything?

I was talking about the rescuing traversal for generating a cruft pack.
But I ended up dropping this whole paragraph anyway, since I don't think
it's adding anything in the context of the new patch message.

> > @@ -126,8 +198,14 @@ static int want_recent_object(struct recent_data *data,
> >  			      const struct object_id *oid)
> >  {
> >  	if (data->ignore_in_core_kept_packs &&
> > -	    has_object_kept_pack(oid, IN_CORE_KEEP_PACKS))
> > +	    has_object_kept_pack(oid, IN_CORE_KEEP_PACKS)) {
> > +		if (!data->extra_recent_oids_loaded)
> > +			load_pack_recent_objects(data);
> > +		if (oidset_contains(&data->extra_recent_oids, oid))
> > +			return 1;
> > +
> >  		return 0;
> > +	}
> >  	return 1;
> >  }
>
> This hunk I'm less sure about. The purpose of this function is that the
> caller has told us about some packs which are "special", and we avoid
> adding their objects to the traversal.
>
> This kicks in for cruft packs, when the git-repack caller says "I just
> made pack xyz.pack; do not bother saving anything in it to the cruft
> pack, since xyz.pack is here to stay". So if a hook says "you should
> keep object X", why would we want to override that check? It is already
> a reachable object that has been packed into xyz.pack, so we know there
> is no point in even considering its recency.

Yup, you're absolutely right here. Thanks for catching it.

> > --- a/t/t5329-pack-objects-cruft.sh
> > +++ b/t/t5329-pack-objects-cruft.sh
> > @@ -739,4 +739,175 @@ test_expect_success 'cruft objects are freshend via loose' '
> >  	)
> >  '
> >
> > +test_expect_success 'additional cruft tips may be specified via pack.extraCruftTips' '
>
> This title (and others below) seems out of date. :)

Thanks for noticing, fixed.

> > diff --git a/t/t7701-repack-unpack-unreachable.sh b/t/t7701-repack-unpack-unreachable.sh
> > index ebb267855f..d2eea6e754 100755
> > --- a/t/t7701-repack-unpack-unreachable.sh
> > +++ b/t/t7701-repack-unpack-unreachable.sh
> > @@ -113,6 +113,28 @@ test_expect_success 'do not bother loosening old objects' '
> >  	test_must_fail git cat-file -p $obj2
> >  '
> >
> > +test_expect_success 'extra recent tips are kept regardless of age' '
> > +	obj1=$(echo one | git hash-object -w --stdin) &&
> > +	obj2=$(echo two | git hash-object -w --stdin) &&
> > +	pack1=$(echo $obj1 | git pack-objects .git/objects/pack/pack) &&
> > +	pack2=$(echo $obj2 | git pack-objects .git/objects/pack/pack) &&
> > +	git prune-packed &&
> > +
> > +	git cat-file -p $obj1 &&
> > +	git cat-file -p $obj2 &&
> > +
> > +	write_script extra-tips <<-EOF &&
> > +	echo $obj2
> > +	EOF
> > +	git config pack.recentObjectsHook ./extra-tips &&
> > +
> > +	test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
> > +	git repack -A -d --unpack-unreachable=1.hour.ago &&
> > +
> > +	git cat-file -p $obj1 &&
> > +	git cat-file -p $obj2
> > +'
>
> And this is the new test in this iteration covering the "repack -A"
> case.
>
> It is checking that $obj2, which our hook mentions, is saved. It also
> checks that $obj1 is saved because it is still recent. But there are two
> other possibly interesting cases:
>
>   - an object that is too old and is _not_ saved. It seems useful to
>     confirm that the new patch does not simply break the ability to drop
>     objects. ;)
>
>   - an object that is reachable from $obj2 is also saved. From a
>     white-box perspective this is less interesting, because we should
>     already test elsewhere that this works for recent objects, and we
>     know the new feature is implemented by faking recency. But it might
>     be worth it for completeness, and because it's easy to do (making
>     $obj2 a tag pointing to a blob should work).

All very good cases to check for. Here's a patch on top (which I'll
obviously squash into my new version, but figured I'd send it as a
response to you directly, too):

--- 8< ---
S
diff --git a/t/t7701-repack-unpack-unreachable.sh b/t/t7701-repack-unpack-unreachable.sh
index d2eea6e754..fa2df6016b 100755
--- a/t/t7701-repack-unpack-unreachable.sh
+++ b/t/t7701-repack-unpack-unreachable.sh
@@ -116,23 +116,33 @@ test_expect_success 'do not bother loosening old objects' '
 test_expect_success 'extra recent tips are kept regardless of age' '
 	obj1=$(echo one | git hash-object -w --stdin) &&
 	obj2=$(echo two | git hash-object -w --stdin) &&
+	obj3=$(echo three | git hash-object -w --stdin) &&
 	pack1=$(echo $obj1 | git pack-objects .git/objects/pack/pack) &&
 	pack2=$(echo $obj2 | git pack-objects .git/objects/pack/pack) &&
+	pack3=$(echo $obj3 | git pack-objects .git/objects/pack/pack) &&
 	git prune-packed &&

 	git cat-file -p $obj1 &&
 	git cat-file -p $obj2 &&
+	git cat-file -p $obj3 &&

-	write_script extra-tips <<-EOF &&
-	echo $obj2
+	git tag -a -m tag obj2-tag $obj2 &&
+	obj2_tag="$(git rev-parse obj2-tag)" &&
+
+
+	write_script precious-objects <<-EOF &&
+	echo $obj2_tag
 	EOF
-	git config pack.recentObjectsHook ./extra-tips &&
+	git config pack.recentObjectsHook ./precious-objects &&

 	test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
+	test-tool chmtime =-86400 .git/objects/pack/pack-$pack3.pack &&
 	git repack -A -d --unpack-unreachable=1.hour.ago &&

 	git cat-file -p $obj1 &&
-	git cat-file -p $obj2
+	git cat-file -p $obj2 &&
+	git cat-file -p $obj2_tag &&
+	test_must_fail git cat-file -p $obj3
 '

 test_expect_success 'keep packed objects found only in index' '
--- >8 ---

Thanks,
Taylor
Taylor Blau May 15, 2023, 8:49 p.m. UTC | #12
On Fri, May 12, 2023 at 05:45:42PM -0400, Jeff King wrote:
> On Fri, May 12, 2023 at 05:24:56PM -0400, Jeff King wrote:
>
> > > This patch introduces a new configuration, `pack.recentObjectsHook`
> > > which allows the caller to specify a program (or set of programs) whose
> > > output is treated as a set of objects to treat as recent, regardless of
> > > their true age.
> >
> > I was going to complain about putting this in the "pack" section,
> > because I thought by touching reachable.c, we'd also affect git-prune.
> > But I don't think we do, because it does its own direct mtime check on
> > the loose objects.
> >
> > But I'm not sure that's the right behavior.
> >
> > It feels like even before your patch, this is a huge gap in our
> > object-retention strategy.  During repacking, we try to avoid dropping
> > objects which are reachable from recent-but-unreachable things we're
> > keeping (since otherwise it effectively corrupts those recent objects,
> > making them less valuable to keep). But git-prune will happily drop them
> > anyway!
> >
> > And I think the same thing would apply to your hook. If the hook says
> > "object XYZ is precious even if unreachable, keep it", then git-prune
> > ignoring that seems like it would be a source of errors.
> >
> > I suspect both could be fixed by having git-prune trigger the same
> > add_unseen_recent_objects_to_traversal() call either as part of
> > the perform_reachability_traversal() walk, or maybe in its own walk (I
> > think maybe it has to be its own because the second walk should avoid
> > complaining about missing objects).
>
> <phew> I am happy to say that I was wrong here, and git-prune behaves as
> it should, courtesy of d3038d22f9 (prune: keep objects reachable from
> recent objects, 2014-10-15). The magic happens in mark_reachable_objects(),
> which handles walking the recent objects by calling...you guessed it,
> add_unseen_recent_objects_to_traversal().

Phew. Thanks for digging into it before I was able to respond. I'm glad
that this works (though I agree that we should add a test).

> So it does the right thing now, and your patch should kick in
> automatically for git-prune, too. But I think we'd want two things:
>
>   1. Should the config variable name be made more generic to match?
>      Maybe "core" is too broad (though certainly I'd expect it to apply
>      anywhere in Git where we check recent-ness of objects), but perhaps
>      "gc" would make sense (even though it is not strictly part of the
>      gc command, it is within that realm of concepts).

"core" does feel pretty broad. There's some precedence for adding
hook-like configuration there, at least with
`core.alternateRefsCommand`. But I think that was an appropriate choice
given the scope of that feature.

I think that calling it "gc.recentObjectsHook" makes the most sense.

>   2. We probably want a test covering git-prune in this situation.

Yup.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index d4c7c9d4e4..560cb42223 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -67,6 +67,19 @@  pack.deltaCacheLimit::
 	result once the best match for all objects is found.
 	Defaults to 1000. Maximum value is 65535.
 
+pack.recentObjectsHook::
+	When considering the recency of an object (e.g., when generating
+	a cruft pack or storing unreachable objects as loose), use the
+	shell to execute the specified command(s). Interpret their
+	output as object IDs which Git will consider as "recent",
+	regardless of their age.
++
+Output must contain exactly one hex object ID per line, and nothing
+else. Objects which cannot be found in the repository are ignored.
+Multiple hooks are supported, but all must exit successfully, else the
+operation (either generating a cruft pack or unpacking unreachable
+objects) will be halted.
+
 pack.threads::
 	Specifies the number of threads to spawn when searching for best
 	delta matches.  This requires that linkgit:git-pack-objects[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a5b466839b..bd6ad016d6 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -44,6 +44,7 @@ 
 #include "pack-mtimes.h"
 #include "parse-options.h"
 #include "wrapper.h"
+#include "run-command.h"
 
 /*
  * Objects we are going to pack are collected in the `to_pack` structure.
diff --git a/reachable.c b/reachable.c
index 7a42da5d39..4cca6fe93f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -16,6 +16,8 @@ 
 #include "object-store.h"
 #include "pack-bitmap.h"
 #include "pack-mtimes.h"
+#include "config.h"
+#include "run-command.h"
 
 struct connectivity_progress {
 	struct progress *progress;
@@ -67,12 +69,82 @@  struct recent_data {
 	timestamp_t timestamp;
 	report_recent_object_fn *cb;
 	int ignore_in_core_kept_packs;
+
+	struct oidset extra_recent_oids;
+	int extra_recent_oids_loaded;
 };
 
+static int run_one_pack_recent_objects_hook(struct oidset *set,
+					    const char *args)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	FILE *out;
+	int ret = 0;
+
+	cmd.use_shell = 1;
+	cmd.out = -1;
+
+	strvec_push(&cmd.args, args);
+
+	if (start_command(&cmd))
+		return -1;
+
+	out = xfdopen(cmd.out, "r");
+	while (strbuf_getline(&buf, out) != EOF) {
+		struct object_id oid;
+		const char *rest;
+
+		if (parse_oid_hex(buf.buf, &oid, &rest) || *rest) {
+			ret = error(_("invalid extra cruft tip: '%s'"), buf.buf);
+			goto done;
+		}
+
+		oidset_insert(set, &oid);
+	}
+
+	ret = finish_command(&cmd);
+
+done:
+	if (out)
+		fclose(out);
+	strbuf_release(&buf);
+	child_process_clear(&cmd);
+
+	return ret;
+}
+
+static void load_pack_recent_objects(struct recent_data *data)
+{
+	const struct string_list *programs;
+	int ret = 0;
+	size_t i;
+
+	data->extra_recent_oids_loaded = 1;
+
+	if (git_config_get_string_multi("pack.recentobjectshook", &programs))
+		return;
+
+	for (i = 0; i < programs->nr; i++) {
+		ret = run_one_pack_recent_objects_hook(&data->extra_recent_oids,
+						       programs->items[i].string);
+		if (ret)
+			break;
+	}
+
+	if (ret)
+		die(_("unable to enumerate additional cruft tips"));
+}
+
 static int obj_is_recent(const struct object_id *oid, timestamp_t mtime,
 			 struct recent_data *data)
 {
-	return mtime > data->timestamp;
+	if (mtime > data->timestamp)
+		return 1;
+
+	if (!data->extra_recent_oids_loaded)
+		load_pack_recent_objects(data);
+	return oidset_contains(&data->extra_recent_oids, oid);
 }
 
 static void add_recent_object(const struct object_id *oid,
@@ -126,8 +198,14 @@  static int want_recent_object(struct recent_data *data,
 			      const struct object_id *oid)
 {
 	if (data->ignore_in_core_kept_packs &&
-	    has_object_kept_pack(oid, IN_CORE_KEEP_PACKS))
+	    has_object_kept_pack(oid, IN_CORE_KEEP_PACKS)) {
+		if (!data->extra_recent_oids_loaded)
+			load_pack_recent_objects(data);
+		if (oidset_contains(&data->extra_recent_oids, oid))
+			return 1;
+
 		return 0;
+	}
 	return 1;
 }
 
@@ -199,16 +277,24 @@  int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
 	data.cb = cb;
 	data.ignore_in_core_kept_packs = ignore_in_core_kept_packs;
 
+	oidset_init(&data.extra_recent_oids, 0);
+	data.extra_recent_oids_loaded = 0;
+
 	r = for_each_loose_object(add_recent_loose, &data,
 				  FOR_EACH_OBJECT_LOCAL_ONLY);
 	if (r)
-		return r;
+		goto done;
 
 	flags = FOR_EACH_OBJECT_LOCAL_ONLY | FOR_EACH_OBJECT_PACK_ORDER;
 	if (ignore_in_core_kept_packs)
 		flags |= FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS;
 
-	return for_each_packed_object(add_recent_packed, &data, flags);
+	r = for_each_packed_object(add_recent_packed, &data, flags);
+
+done:
+	oidset_clear(&data.extra_recent_oids);
+
+	return r;
 }
 
 static int mark_object_seen(const struct object_id *oid,
diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh
index 303f7a5d84..ffd6c2eeb6 100755
--- a/t/t5329-pack-objects-cruft.sh
+++ b/t/t5329-pack-objects-cruft.sh
@@ -739,4 +739,175 @@  test_expect_success 'cruft objects are freshend via loose' '
 	)
 '
 
+test_expect_success 'additional cruft tips may be specified via pack.extraCruftTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		# Create a handful of objects.
+		#
+		#   - one reachable commit, "base", designated for the reachable
+		#     pack
+		#   - one unreachable commit, "cruft.discard", which is marked
+		#     for deletion
+		#   - one unreachable commit, "cruft.old", which would be marked
+		#     for deletion, but is rescued as an extra cruft tip
+		#   - one unreachable commit, "cruft.new", which is not marked
+		#     for deletion
+		test_commit base &&
+		git branch -M main &&
+
+		git checkout --orphan discard &&
+		git rm -fr . &&
+		test_commit --no-tag cruft.discard &&
+
+		git checkout --orphan old &&
+		git rm -fr . &&
+		test_commit --no-tag cruft.old &&
+		cruft_old="$(git rev-parse HEAD)" &&
+
+		git checkout --orphan new &&
+		git rm -fr . &&
+		test_commit --no-tag cruft.new &&
+		cruft_new="$(git rev-parse HEAD)" &&
+
+		git checkout main &&
+		git branch -D discard old new &&
+		git reflog expire --all --expire=all &&
+
+		# mark cruft.old with an mtime that is many minutes
+		# older than the expiration period, and mark cruft.new
+		# with an mtime that is in the future (and thus not
+		# eligible for pruning).
+		test-tool chmtime -2000 "$objdir/$(test_oid_to_path $cruft_old)" &&
+		test-tool chmtime +1000 "$objdir/$(test_oid_to_path $cruft_new)" &&
+
+		# Write the list of cruft objects we expect to
+		# accumulate, which is comprised of everything reachable
+		# from cruft.old and cruft.new, but not cruft.discard.
+		git rev-list --objects --no-object-names \
+			$cruft_old $cruft_new >cruft.raw &&
+		sort cruft.raw >cruft.expect &&
+
+		# Write the script to list extra tips, which are limited
+		# to cruft.old, in this case.
+		write_script extra-tips <<-EOF &&
+		echo $cruft_old
+		EOF
+		git config pack.recentObjectsHook ./extra-tips &&
+
+		git repack --cruft --cruft-expiration=now -d &&
+
+		mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
+		git show-index <${mtimes%.mtimes}.idx >cruft &&
+		cut -d" " -f2 cruft | sort >cruft.actual &&
+		test_cmp cruft.expect cruft.actual &&
+
+		# Ensure that the "old" objects are removed after
+		# dropping the pack.extraCruftTips hook.
+		git config --unset pack.recentObjectsHook &&
+		git repack --cruft --cruft-expiration=now -d &&
+
+		mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
+		git show-index <${mtimes%.mtimes}.idx >cruft &&
+		cut -d" " -f2 cruft | sort >cruft.actual &&
+
+		git rev-list --objects --no-object-names $cruft_new >cruft.raw &&
+		cp cruft.expect cruft.old &&
+		sort cruft.raw >cruft.expect &&
+		test_cmp cruft.expect cruft.actual &&
+
+		# ensure objects which are no longer in the cruft pack were
+		# removed from the repository
+		for object in $(comm -13 cruft.expect cruft.old)
+		do
+			test_must_fail git cat-file -t $object || return 1
+		done
+	)
+'
+
+test_expect_success 'multi-valued pack.extraCruftTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+		git branch -M main &&
+
+		git checkout --orphan cruft.a &&
+		git rm -fr . &&
+		test_commit --no-tag cruft.a &&
+		cruft_a="$(git rev-parse HEAD)" &&
+
+		git checkout --orphan cruft.b &&
+		git rm -fr . &&
+		test_commit --no-tag cruft.b &&
+		cruft_b="$(git rev-parse HEAD)" &&
+
+		git checkout main &&
+		git branch -D cruft.a cruft.b &&
+		git reflog expire --all --expire=all &&
+
+		echo "echo $cruft_a" | write_script extra-tips.a &&
+		echo "echo $cruft_b" | write_script extra-tips.b &&
+		echo "false" | write_script extra-tips.c &&
+
+		git rev-list --objects --no-object-names $cruft_a $cruft_b \
+			>cruft.raw &&
+		sort cruft.raw >cruft.expect &&
+
+		# ensure that each extra cruft tip is saved by its
+		# respective hook
+		git config --add pack.recentObjectsHook ./extra-tips.a &&
+		git config --add pack.recentObjectsHook ./extra-tips.b &&
+		git repack --cruft --cruft-expiration=now -d &&
+
+		mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
+		git show-index <${mtimes%.mtimes}.idx >cruft &&
+		cut -d" " -f2 cruft | sort >cruft.actual &&
+		test_cmp cruft.expect cruft.actual &&
+
+		# ensure that a dirty exit halts cruft pack generation
+		git config --add pack.recentObjectsHook ./extra-tips.c &&
+		test_must_fail git repack --cruft --cruft-expiration=now -d 2>err &&
+		grep "unable to enumerate additional cruft tips" err &&
+
+		# and that the existing cruft pack is left alone
+		test_path_is_file "$mtimes"
+	)
+'
+
+test_expect_success 'additional cruft blobs via pack.extraCruftTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+
+		blob=$(echo "unreachable" | git hash-object -w --stdin) &&
+
+		# mark the unreachable blob we wrote above as having
+		# aged out of the retention period
+		test-tool chmtime -2000 "$objdir/$(test_oid_to_path $blob)" &&
+
+		# Write the script to list extra tips, which is just the
+		# extra blob as above.
+		write_script extra-tips <<-EOF &&
+		echo $blob
+		EOF
+		git config pack.recentObjectsHook ./extra-tips &&
+
+		git repack --cruft --cruft-expiration=now -d &&
+
+		mtimes="$(ls .git/objects/pack/pack-*.mtimes)" &&
+		git show-index <${mtimes%.mtimes}.idx >cruft &&
+		cut -d" " -f2 cruft >actual &&
+		echo $blob >expect &&
+		test_cmp expect actual
+	)
+'
+
 test_done
diff --git a/t/t7701-repack-unpack-unreachable.sh b/t/t7701-repack-unpack-unreachable.sh
index ebb267855f..d2eea6e754 100755
--- a/t/t7701-repack-unpack-unreachable.sh
+++ b/t/t7701-repack-unpack-unreachable.sh
@@ -113,6 +113,28 @@  test_expect_success 'do not bother loosening old objects' '
 	test_must_fail git cat-file -p $obj2
 '
 
+test_expect_success 'extra recent tips are kept regardless of age' '
+	obj1=$(echo one | git hash-object -w --stdin) &&
+	obj2=$(echo two | git hash-object -w --stdin) &&
+	pack1=$(echo $obj1 | git pack-objects .git/objects/pack/pack) &&
+	pack2=$(echo $obj2 | git pack-objects .git/objects/pack/pack) &&
+	git prune-packed &&
+
+	git cat-file -p $obj1 &&
+	git cat-file -p $obj2 &&
+
+	write_script extra-tips <<-EOF &&
+	echo $obj2
+	EOF
+	git config pack.recentObjectsHook ./extra-tips &&
+
+	test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
+	git repack -A -d --unpack-unreachable=1.hour.ago &&
+
+	git cat-file -p $obj1 &&
+	git cat-file -p $obj2
+'
+
 test_expect_success 'keep packed objects found only in index' '
 	echo my-unique-content >file &&
 	git add file &&