diff mbox series

[v2,7/8] packfile: add kept-pack cache for find_kept_pack_entry()

Message ID f1c07324f62cf4d087c41165cefed98f554cfd78.1612411124.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series repack: support repacking into a geometric sequence | expand

Commit Message

Taylor Blau Feb. 4, 2021, 3:59 a.m. UTC
From: Jeff King <peff@peff.net>

In a recent patch we added a function 'find_kept_pack_entry()' to look
for an object only among kept packs.

While this function avoids doing any lookup work in non-kept packs, it
is still linear in the number of packs, since we have to traverse the
linked list of packs once per object. Let's cache a reduced version of
that list to save us time.

Note that this cache will last the lifetime of the program. We could
invalidate it on reprepare_packed_git(), but there's not much point in
being rigorous here:

  - we might already fail to notice new .keep packs showing up after the
    program starts. We only reprepare_packed_git() when we fail to find
    an object. But adding a new pack won't cause that to happen.
    Somebody repacking could add a new pack and delete an old one, but
    most of the time we'd have a descriptor or mmap open to the old
    pack anyway, so we might not even notice.

  - in pack-objects we already cache the .keep state at startup, since
    56dfeb6263 (pack-objects: compute local/ignore_pack_keep early,
    2016-07-29). So this is just extending that concept further.

  - we don't have to worry about any packed_git being removed; we always
    keep the old structs around, even after reprepare_packed_git()

Here are p5303 results (as always, measured against the kernel):

  Test                                        HEAD^                   HEAD
  ----------------------------------------------------------------------------------------------
  5303.5: repack (1)                          57.44(54.71+10.78)      57.06(54.29+10.96) -0.7%
  5303.6: repack with --stdin-packs (1)       0.01(0.00+0.01)         0.01(0.01+0.00) +0.0%
  5303.10: repack (50)                        71.32(88.38+4.90)       71.47(88.60+5.04) +0.2%
  5303.11: repack with --stdin-packs (50)     3.43(11.81+0.22)        3.49(12.21+0.26) +1.7%
  5303.15: repack (1000)                      215.59(493.75+14.62)    217.41(495.36+14.85) +0.8%
  5303.16: repack with --stdin-packs (1000)   131.44(314.24+8.11)     126.75(309.88+8.09) -3.6%

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |   6 +--
 object-store.h         |  10 ++++
 packfile.c             | 103 +++++++++++++++++++++++------------------
 packfile.h             |   4 --
 revision.c             |   8 ++--
 5 files changed, 76 insertions(+), 55 deletions(-)

Comments

Jeff King Feb. 17, 2021, 5:11 p.m. UTC | #1
On Wed, Feb 03, 2021 at 10:59:21PM -0500, Taylor Blau wrote:

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index fbd7b54d70..b2ba5aa14f 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1225,9 +1225,9 @@ static int want_found_object(const struct object_id *oid, int exclude,
>  		 */
>  		unsigned flags = 0;
>  		if (ignore_packed_keep_on_disk)
> -			flags |= ON_DISK_KEEP_PACKS;
> +			flags |= CACHE_ON_DISK_KEEP_PACKS;
>  		if (ignore_packed_keep_in_core)
> -			flags |= IN_CORE_KEEP_PACKS;
> +			flags |= CACHE_IN_CORE_KEEP_PACKS;

Why are we renaming the constants in this patch?

I know I'm listed as the author, but I think this came out of some
off-list back and forth between us. It seems like the existing constants
would have been fine.

> +static void maybe_invalidate_kept_pack_cache(struct repository *r,
> +					     unsigned flags)
>  {
> -	return find_one_pack_entry(r, oid, e, 0);
> +	if (!r->objects->kept_pack_cache)
> +		return;
> +	if (r->objects->kept_pack_cache->flags == flags)
> +		return;
> +	free(r->objects->kept_pack_cache->packs);
> +	FREE_AND_NULL(r->objects->kept_pack_cache);
> +}

OK, so we keep a single cache based on the flags, and then if somebody
ever asks for different flags, we throw it away. That's probably OK for
our purposes, since we wouldn't expect multiple callers within a single
process.

I wondered if it would be simpler to just keep two lists, one for
in-core keeps and one for on-disk keeps. And then just walk over each
list separately based on the query flags. That makes things more robust
_and_ I think would be less code. It does mean that a pack could appear
in both lists, though, which means we might do a lookup in it twice.
That doesn't seem all that likely, but it is working against our goal
here.

Another option is to keep 3 caches (two separate and one combined),
rather than flipping between them. I'm not sure if that would be less
code or not (it gets rid of the "invalidate" function, but you do have
to pick the right cache depending on the query flags).

Yet another option is to keep a cache of any that are marked as _either_
in core or on-disk keeps, and then decide to look up the object based on
the query flags. Then you just pay the cost to iterate over the list and
check the flags (which really is all this cache is helping with in the
first place).

I dunno. TBH, I kind of wonder if this whole patch is worth doing at
all, giving the underwhelming performance benefit (3% on the
pathological 1000-pack case). When I had timed this strategy initially,
it was more like 15%. I'm not sure where the savings went in the
interim, or if it was a timing fluke.

> +static struct packed_git **kept_pack_cache(struct repository *r, unsigned flags)
> +{
> +	maybe_invalidate_kept_pack_cache(r, flags);
> +
> +	if (!r->objects->kept_pack_cache) {
> +		struct packed_git **packs = NULL;
> +		size_t nr = 0, alloc = 0;
> +		struct packed_git *p;
> +
> +		/*
> +		 * We want "all" packs here, because we need to cover ones that
> +		 * are used by a midx, as well. We need to look in every one of
> +		 * them (instead of the midx itself) to cover duplicates. It's
> +		 * possible that an object is found in two packs that the midx
> +		 * covers, one kept and one not kept, but the midx returns only
> +		 * the non-kept version.
> +		 */
> +		for (p = get_all_packs(r); p; p = p->next) {
> +			if ((p->pack_keep && (flags & CACHE_ON_DISK_KEEP_PACKS)) ||
> +			    (p->pack_keep_in_core && (flags & CACHE_IN_CORE_KEEP_PACKS))) {
> +				ALLOC_GROW(packs, nr + 1, alloc);
> +				packs[nr++] = p;
> +			}
> +		}
> +		ALLOC_GROW(packs, nr + 1, alloc);
> +		packs[nr] = NULL;
> +
> +		r->objects->kept_pack_cache = xmalloc(sizeof(*r->objects->kept_pack_cache));
> +		r->objects->kept_pack_cache->packs = packs;
> +		r->objects->kept_pack_cache->flags = flags;
> +	}

Is there any reason not to just embed the kept_pack_cache struct inside
the object_store? It's one less pointer to deal with. I wonder if this
is a holdover from an attempt to have multiple caches.

(I also think it would be reasonable if we wanted to hide the definition
of the cache struct from callers, but we don't seem do to that).

> @@ -2109,7 +2123,8 @@ int has_object_pack(const struct object_id *oid)
>  	return find_pack_entry(the_repository, oid, &e);
>  }
>  
> -int has_object_kept_pack(const struct object_id *oid, unsigned flags)
> +int has_object_kept_pack(const struct object_id *oid,
> +			 unsigned flags)
>  {
>  	struct pack_entry e;
>  	return find_kept_pack_entry(the_repository, oid, flags, &e);

This seems like a stray change.

> diff --git a/packfile.h b/packfile.h
> index 624327f64d..eb56db2a7b 100644
> --- a/packfile.h
> +++ b/packfile.h
> @@ -161,10 +161,6 @@ int packed_object_info(struct repository *r,
>  void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1);
>  const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1);
>  
> -#define ON_DISK_KEEP_PACKS 1
> -#define IN_CORE_KEEP_PACKS 2
> -#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS)

I notice that when the constants moved, we didn't keep an equivalent of
ALL_KEEP_PACKS. Maybe we didn't need it in the first place in patch 1?

  BTW, I absolutely hate the complication that all of this on-disk
  versus in-core keep distinction brings to this code. And I wondered
  what it was really doing for us and whether we could get rid of it.
  But I think we do need it: a common case may be to avoid using
  --honor-pack-keep (because you don't want to deal with racy .keep
  writes from incoming receive-pack processes), but use in-core ones for
  something like --stdin-packs. So we do need to respect one and not the
  other.

  I do wonder if things would be simpler if pack-objects simply kept its
  own list of "in core" packs in a separate array. But that is really
  just another form of the same problem, I guess.

-Peff
Taylor Blau Feb. 17, 2021, 7:54 p.m. UTC | #2
On Wed, Feb 17, 2021 at 12:11:00PM -0500, Jeff King wrote:
> On Wed, Feb 03, 2021 at 10:59:21PM -0500, Taylor Blau wrote:
>
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index fbd7b54d70..b2ba5aa14f 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -1225,9 +1225,9 @@ static int want_found_object(const struct object_id *oid, int exclude,
> >  		 */
> >  		unsigned flags = 0;
> >  		if (ignore_packed_keep_on_disk)
> > -			flags |= ON_DISK_KEEP_PACKS;
> > +			flags |= CACHE_ON_DISK_KEEP_PACKS;
> >  		if (ignore_packed_keep_in_core)
> > -			flags |= IN_CORE_KEEP_PACKS;
> > +			flags |= CACHE_IN_CORE_KEEP_PACKS;
>
> Why are we renaming the constants in this patch?
>
> I know I'm listed as the author, but I think this came out of some
> off-list back and forth between us. It seems like the existing constants
> would have been fine.

Yeah, they would have been fine. They were renamed because this patch
makes them only used for the kept pack cache, but I agree the existing
names are fine, too.

In any case, they make an easier-to-read diff, so I'm perfectly happy to
un-rename them ;).

> > +static void maybe_invalidate_kept_pack_cache(struct repository *r,
> > +					     unsigned flags)
> >  {
> > -	return find_one_pack_entry(r, oid, e, 0);
> > +	if (!r->objects->kept_pack_cache)
> > +		return;
> > +	if (r->objects->kept_pack_cache->flags == flags)
> > +		return;
> > +	free(r->objects->kept_pack_cache->packs);
> > +	FREE_AND_NULL(r->objects->kept_pack_cache);
> > +}
>
> OK, so we keep a single cache based on the flags, and then if somebody
> ever asks for different flags, we throw it away. That's probably OK for
> our purposes, since we wouldn't expect multiple callers within a single
> process.
>
> I wondered if it would be simpler to just keep two lists, one for
> in-core keeps and one for on-disk keeps. And then just walk over each
> list separately based on the query flags. That makes things more robust
> _and_ I think would be less code. It does mean that a pack could appear
> in both lists, though, which means we might do a lookup in it twice.
> That doesn't seem all that likely, but it is working against our goal
> here.
>
> Another option is to keep 3 caches (two separate and one combined),
> rather than flipping between them. I'm not sure if that would be less
> code or not (it gets rid of the "invalidate" function, but you do have
> to pick the right cache depending on the query flags).
>
> Yet another option is to keep a cache of any that are marked as _either_
> in core or on-disk keeps, and then decide to look up the object based on
> the query flags. Then you just pay the cost to iterate over the list and
> check the flags (which really is all this cache is helping with in the
> first place).

All interesting ideas. In this patch (and by the end of the series)
callers that use the kept pack cache never ask for the cache with a
different set of flags. IOW, there isn't a situation where a caller
would populate the in-core kept pack cache, and then suddenly ask for
both in-core and on-disk packs to be kept.

So all of this code is defensive in case that were to change, and
suddenly we'd be returning subtly wrong results. I could imagine that
being kind of a nasty bug to track down, so detecting and invalidating
the cache would make it a non-issue.

I'll note it in the commit message, though, since it's good for future
readers to be aware, too.

> I dunno. TBH, I kind of wonder if this whole patch is worth doing at
> all, giving the underwhelming performance benefit (3% on the
> pathological 1000-pack case). When I had timed this strategy initially,
> it was more like 15%. I'm not sure where the savings went in the
> interim, or if it was a timing fluke.

Yeah, I dunno. It's certainly not hurting (I don't think the extra code
is all that complex, and the savings is at least non-zero), so I'm
inclined to keep it.

> > +static struct packed_git **kept_pack_cache(struct repository *r, unsigned flags)
> > +{
> > +	maybe_invalidate_kept_pack_cache(r, flags);
> > +
> > +	if (!r->objects->kept_pack_cache) {
> > +		struct packed_git **packs = NULL;
> > +		size_t nr = 0, alloc = 0;
> > +		struct packed_git *p;
> > +
> > +		/*
> > +		 * We want "all" packs here, because we need to cover ones that
> > +		 * are used by a midx, as well. We need to look in every one of
> > +		 * them (instead of the midx itself) to cover duplicates. It's
> > +		 * possible that an object is found in two packs that the midx
> > +		 * covers, one kept and one not kept, but the midx returns only
> > +		 * the non-kept version.
> > +		 */
> > +		for (p = get_all_packs(r); p; p = p->next) {
> > +			if ((p->pack_keep && (flags & CACHE_ON_DISK_KEEP_PACKS)) ||
> > +			    (p->pack_keep_in_core && (flags & CACHE_IN_CORE_KEEP_PACKS))) {
> > +				ALLOC_GROW(packs, nr + 1, alloc);
> > +				packs[nr++] = p;
> > +			}
> > +		}
> > +		ALLOC_GROW(packs, nr + 1, alloc);
> > +		packs[nr] = NULL;
> > +
> > +		r->objects->kept_pack_cache = xmalloc(sizeof(*r->objects->kept_pack_cache));
> > +		r->objects->kept_pack_cache->packs = packs;
> > +		r->objects->kept_pack_cache->flags = flags;
> > +	}
>
> Is there any reason not to just embed the kept_pack_cache struct inside
> the object_store? It's one less pointer to deal with. I wonder if this
> is a holdover from an attempt to have multiple caches.
>
> (I also think it would be reasonable if we wanted to hide the definition
> of the cache struct from callers, but we don't seem do to that).

Not a holdover, just designed to avoid adding too many extra fields to
the object-store. I don't feel strongly, but I do think hiding the
definition is a good idea, so I'll inline it.

> > @@ -2109,7 +2123,8 @@ int has_object_pack(const struct object_id *oid)
> >  	return find_pack_entry(the_repository, oid, &e);
> >  }
> >
> > -int has_object_kept_pack(const struct object_id *oid, unsigned flags)
> > +int has_object_kept_pack(const struct object_id *oid,
> > +			 unsigned flags)
> >  {
> >  	struct pack_entry e;
> >  	return find_kept_pack_entry(the_repository, oid, flags, &e);
>
> This seems like a stray change.

Good eyes, thanks.

>
> > diff --git a/packfile.h b/packfile.h
> > index 624327f64d..eb56db2a7b 100644
> > --- a/packfile.h
> > +++ b/packfile.h
> > @@ -161,10 +161,6 @@ int packed_object_info(struct repository *r,
> >  void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1);
> >  const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1);
> >
> > -#define ON_DISK_KEEP_PACKS 1
> > -#define IN_CORE_KEEP_PACKS 2
> > -#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS)
>
> I notice that when the constants moved, we didn't keep an equivalent of
> ALL_KEEP_PACKS. Maybe we didn't need it in the first place in patch 1?

Yeah, we didn't need it to begin with. I'll drop it accordingly.

>   BTW, I absolutely hate the complication that all of this on-disk
>   versus in-core keep distinction brings to this code. And I wondered
>   what it was really doing for us and whether we could get rid of it.
>   But I think we do need it: a common case may be to avoid using
>   --honor-pack-keep (because you don't want to deal with racy .keep
>   writes from incoming receive-pack processes), but use in-core ones for
>   something like --stdin-packs. So we do need to respect one and not the
>   other.
>
>   I do wonder if things would be simpler if pack-objects simply kept its
>   own list of "in core" packs in a separate array. But that is really
>   just another form of the same problem, I guess.

Yeah, the complexity is awfully hard to reason about, but you're right
that here it is necessary.

> -Peff

Thanks,
Taylor
Jeff King Feb. 17, 2021, 8:25 p.m. UTC | #3
On Wed, Feb 17, 2021 at 02:54:33PM -0500, Taylor Blau wrote:

> > OK, so we keep a single cache based on the flags, and then if somebody
> > ever asks for different flags, we throw it away. That's probably OK for
> > our purposes, since we wouldn't expect multiple callers within a single
> > process.
> > [...some alternatives]
> 
> All interesting ideas. In this patch (and by the end of the series)
> callers that use the kept pack cache never ask for the cache with a
> different set of flags. IOW, there isn't a situation where a caller
> would populate the in-core kept pack cache, and then suddenly ask for
> both in-core and on-disk packs to be kept.
> 
> So all of this code is defensive in case that were to change, and
> suddenly we'd be returning subtly wrong results. I could imagine that
> being kind of a nasty bug to track down, so detecting and invalidating
> the cache would make it a non-issue.

Yeah, I agree that the current crop of callers does not care. And I am
glad we are not leaving a booby-trap for later programmers with respect
to correctness (by virtue of the invalidation function). But it does
feel like we are leaving one for performance, which they very well might
not realize the cache is doing worse-than-nothing.

Would just doing:

  if (cache.packs && cache.flags != flags)
	BUG("kept-pack-cache cannot handle multiple queries in a single process");

be a better solution? That is not helping anyone towards a world where
we gracefully handle back-and-forth queries. But it makes it abundantly
clear when such a thing would become necessary.

> > Is there any reason not to just embed the kept_pack_cache struct inside
> > the object_store? It's one less pointer to deal with. I wonder if this
> > is a holdover from an attempt to have multiple caches.
> >
> > (I also think it would be reasonable if we wanted to hide the definition
> > of the cache struct from callers, but we don't seem do to that).
> 
> Not a holdover, just designed to avoid adding too many extra fields to
> the object-store. I don't feel strongly, but I do think hiding the
> definition is a good idea, so I'll inline it.

This response confuses me a bit. Hiding the definition from callers
would mean _keeping_ it as a pointer, but putting the definition into
packfile.c, where nobody outside that file could see it (at least that
is what I meant by hiding).

But inlining it to me implies embedding the struct (not a pointer to it)
in "struct object_store", defining the struct at the point we define the
struct field which uses it.

I am fine with either, to be clear. I'm just confused which you are
proposing to do. :)

-Peff
Taylor Blau Feb. 17, 2021, 8:29 p.m. UTC | #4
On Wed, Feb 17, 2021 at 03:25:17PM -0500, Jeff King wrote:
> Would just doing:
>
>   if (cache.packs && cache.flags != flags)
> 	BUG("kept-pack-cache cannot handle multiple queries in a single process");
>
> be a better solution? That is not helping anyone towards a world where
> we gracefully handle back-and-forth queries. But it makes it abundantly
> clear when such a thing would become necessary.

I dunno. I can certainly see its merits, but I have to imagine that
anybody who cares enough about the performance will be able to find our
conversation here. Assuming that's the case, I would rather have the
kept-pack cache handle multiple queries before BUG()-ing.

> > > Is there any reason not to just embed the kept_pack_cache struct inside
> > > the object_store? It's one less pointer to deal with. I wonder if this
> > > is a holdover from an attempt to have multiple caches.
> > >
> > > (I also think it would be reasonable if we wanted to hide the definition
> > > of the cache struct from callers, but we don't seem do to that).
> >
> > Not a holdover, just designed to avoid adding too many extra fields to
> > the object-store. I don't feel strongly, but I do think hiding the
> > definition is a good idea, so I'll inline it.
>
> This response confuses me a bit. Hiding the definition from callers
> would mean _keeping_ it as a pointer, but putting the definition into
> packfile.c, where nobody outside that file could see it (at least that
> is what I meant by hiding).
>
> But inlining it to me implies embedding the struct (not a pointer to it)
> in "struct object_store", defining the struct at the point we define the
> struct field which uses it.
>
> I am fine with either, to be clear. I'm just confused which you are
> proposing to do. :)

Probably because I changed my mind in the middle of writing it ;). I'm
proposing embedding the definition of the struct into the definition of
object_store, and then operating on its fields (from within packfile.c).

Thanks,
Taylor
Jeff King Feb. 17, 2021, 9:43 p.m. UTC | #5
On Wed, Feb 17, 2021 at 03:29:50PM -0500, Taylor Blau wrote:

> On Wed, Feb 17, 2021 at 03:25:17PM -0500, Jeff King wrote:
> > Would just doing:
> >
> >   if (cache.packs && cache.flags != flags)
> > 	BUG("kept-pack-cache cannot handle multiple queries in a single process");
> >
> > be a better solution? That is not helping anyone towards a world where
> > we gracefully handle back-and-forth queries. But it makes it abundantly
> > clear when such a thing would become necessary.
> 
> I dunno. I can certainly see its merits, but I have to imagine that
> anybody who cares enough about the performance will be able to find our
> conversation here. Assuming that's the case, I would rather have the
> kept-pack cache handle multiple queries before BUG()-ing.

OK. I am on the fence, and you are the author, so I'm happy to go with
your preference.

I'm not quite as optimistic that somebody would find this conversation,
if only because they have to know to look for it. I could easily see
somebody adding a find_kept_in_pack() without thinking too hard about
it. OTOH, I find it quite unlikely that anybody would use a different
set of flags within the same process, so it would probably Just Work for
them regardless. :)

> > This response confuses me a bit. Hiding the definition from callers
> > would mean _keeping_ it as a pointer, but putting the definition into
> > packfile.c, where nobody outside that file could see it (at least that
> > is what I meant by hiding).
> >
> > But inlining it to me implies embedding the struct (not a pointer to it)
> > in "struct object_store", defining the struct at the point we define the
> > struct field which uses it.
> >
> > I am fine with either, to be clear. I'm just confused which you are
> > proposing to do. :)
> 
> Probably because I changed my mind in the middle of writing it ;). I'm
> proposing embedding the definition of the struct into the definition of
> object_store, and then operating on its fields (from within packfile.c).

OK, that sounds great to me (and arguably produces more efficient code,
since we avoid a pointer dereference, though I doubt it matters in
practice). Thanks for clarifying.

-Peff
diff mbox series

Patch

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index fbd7b54d70..b2ba5aa14f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1225,9 +1225,9 @@  static int want_found_object(const struct object_id *oid, int exclude,
 		 */
 		unsigned flags = 0;
 		if (ignore_packed_keep_on_disk)
-			flags |= ON_DISK_KEEP_PACKS;
+			flags |= CACHE_ON_DISK_KEEP_PACKS;
 		if (ignore_packed_keep_in_core)
-			flags |= IN_CORE_KEEP_PACKS;
+			flags |= CACHE_IN_CORE_KEEP_PACKS;
 
 		if (ignore_packed_keep_on_disk && p->pack_keep)
 			return 0;
@@ -3089,7 +3089,7 @@  static void read_packs_list_from_stdin(void)
 	 * an optimization during delta selection.
 	 */
 	revs.no_kept_objects = 1;
-	revs.keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+	revs.keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS;
 	revs.blob_objects = 1;
 	revs.tree_objects = 1;
 	revs.tag_objects = 1;
diff --git a/object-store.h b/object-store.h
index c4fc9dd74e..4cbe8eae3c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -105,6 +105,14 @@  static inline int pack_map_entry_cmp(const void *unused_cmp_data,
 	return strcmp(pg1->pack_name, key ? key : pg2->pack_name);
 }
 
+#define CACHE_ON_DISK_KEEP_PACKS 1
+#define CACHE_IN_CORE_KEEP_PACKS 2
+
+struct kept_pack_cache {
+	struct packed_git **packs;
+	unsigned flags;
+};
+
 struct raw_object_store {
 	/*
 	 * Set of all object directories; the main directory is first (and
@@ -150,6 +158,8 @@  struct raw_object_store {
 	/* A most-recently-used ordered version of the packed_git list. */
 	struct list_head packed_git_mru;
 
+	struct kept_pack_cache *kept_pack_cache;
+
 	/*
 	 * A map of packfiles to packed_git structs for tracking which
 	 * packs have been loaded already.
diff --git a/packfile.c b/packfile.c
index 5f35cfe788..2a139c907b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2031,10 +2031,7 @@  static int fill_pack_entry(const struct object_id *oid,
 	return 1;
 }
 
-static int find_one_pack_entry(struct repository *r,
-			       const struct object_id *oid,
-			       struct pack_entry *e,
-			       int kept_only)
+int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e)
 {
 	struct list_head *pos;
 	struct multi_pack_index *m;
@@ -2044,49 +2041,64 @@  static int find_one_pack_entry(struct repository *r,
 		return 0;
 
 	for (m = r->objects->multi_pack_index; m; m = m->next) {
-		if (!fill_midx_entry(r, oid, e, m))
-			continue;
-
-		if (!kept_only)
-			return 1;
-
-		if (((kept_only & ON_DISK_KEEP_PACKS) && e->p->pack_keep) ||
-		    ((kept_only & IN_CORE_KEEP_PACKS) && e->p->pack_keep_in_core))
+		if (fill_midx_entry(r, oid, e, m))
 			return 1;
 	}
 
 	list_for_each(pos, &r->objects->packed_git_mru) {
 		struct packed_git *p = list_entry(pos, struct packed_git, mru);
-		if (p->multi_pack_index && !kept_only) {
-			/*
-			 * If this pack is covered by the MIDX, we'd have found
-			 * the object already in the loop above if it was here,
-			 * so don't bother looking.
-			 *
-			 * The exception is if we are looking only at kept
-			 * packs. An object can be present in two packs covered
-			 * by the MIDX, one kept and one not-kept. And as the
-			 * MIDX points to only one copy of each object, it might
-			 * have returned only the non-kept version above. We
-			 * have to check again to be thorough.
-			 */
-			continue;
-		}
-		if (!kept_only ||
-		    (((kept_only & ON_DISK_KEEP_PACKS) && p->pack_keep) ||
-		     ((kept_only & IN_CORE_KEEP_PACKS) && p->pack_keep_in_core))) {
-			if (fill_pack_entry(oid, e, p)) {
-				list_move(&p->mru, &r->objects->packed_git_mru);
-				return 1;
-			}
+		if (!p->multi_pack_index && fill_pack_entry(oid, e, p)) {
+			list_move(&p->mru, &r->objects->packed_git_mru);
+			return 1;
 		}
 	}
 	return 0;
 }
 
-int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e)
+static void maybe_invalidate_kept_pack_cache(struct repository *r,
+					     unsigned flags)
 {
-	return find_one_pack_entry(r, oid, e, 0);
+	if (!r->objects->kept_pack_cache)
+		return;
+	if (r->objects->kept_pack_cache->flags == flags)
+		return;
+	free(r->objects->kept_pack_cache->packs);
+	FREE_AND_NULL(r->objects->kept_pack_cache);
+}
+
+static struct packed_git **kept_pack_cache(struct repository *r, unsigned flags)
+{
+	maybe_invalidate_kept_pack_cache(r, flags);
+
+	if (!r->objects->kept_pack_cache) {
+		struct packed_git **packs = NULL;
+		size_t nr = 0, alloc = 0;
+		struct packed_git *p;
+
+		/*
+		 * We want "all" packs here, because we need to cover ones that
+		 * are used by a midx, as well. We need to look in every one of
+		 * them (instead of the midx itself) to cover duplicates. It's
+		 * possible that an object is found in two packs that the midx
+		 * covers, one kept and one not kept, but the midx returns only
+		 * the non-kept version.
+		 */
+		for (p = get_all_packs(r); p; p = p->next) {
+			if ((p->pack_keep && (flags & CACHE_ON_DISK_KEEP_PACKS)) ||
+			    (p->pack_keep_in_core && (flags & CACHE_IN_CORE_KEEP_PACKS))) {
+				ALLOC_GROW(packs, nr + 1, alloc);
+				packs[nr++] = p;
+			}
+		}
+		ALLOC_GROW(packs, nr + 1, alloc);
+		packs[nr] = NULL;
+
+		r->objects->kept_pack_cache = xmalloc(sizeof(*r->objects->kept_pack_cache));
+		r->objects->kept_pack_cache->packs = packs;
+		r->objects->kept_pack_cache->flags = flags;
+	}
+
+	return r->objects->kept_pack_cache->packs;
 }
 
 int find_kept_pack_entry(struct repository *r,
@@ -2094,13 +2106,15 @@  int find_kept_pack_entry(struct repository *r,
 			 unsigned flags,
 			 struct pack_entry *e)
 {
-	/*
-	 * Load all packs, including midx packs, since our "kept" strategy
-	 * relies on that. We're relying on the side effect of it setting up
-	 * r->objects->packed_git, which is a little ugly.
-	 */
-	get_all_packs(r);
-	return find_one_pack_entry(r, oid, e, flags);
+	struct packed_git **cache;
+
+	for (cache = kept_pack_cache(r, flags); *cache; cache++) {
+		struct packed_git *p = *cache;
+		if (fill_pack_entry(oid, e, p))
+			return 1;
+	}
+
+	return 0;
 }
 
 int has_object_pack(const struct object_id *oid)
@@ -2109,7 +2123,8 @@  int has_object_pack(const struct object_id *oid)
 	return find_pack_entry(the_repository, oid, &e);
 }
 
-int has_object_kept_pack(const struct object_id *oid, unsigned flags)
+int has_object_kept_pack(const struct object_id *oid,
+			 unsigned flags)
 {
 	struct pack_entry e;
 	return find_kept_pack_entry(the_repository, oid, flags, &e);
diff --git a/packfile.h b/packfile.h
index 624327f64d..eb56db2a7b 100644
--- a/packfile.h
+++ b/packfile.h
@@ -161,10 +161,6 @@  int packed_object_info(struct repository *r,
 void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1);
 const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1);
 
-#define ON_DISK_KEEP_PACKS 1
-#define IN_CORE_KEEP_PACKS 2
-#define ALL_KEEP_PACKS (ON_DISK_KEEP_PACKS | IN_CORE_KEEP_PACKS)
-
 /*
  * Iff a pack file in the given repository contains the object named by sha1,
  * return true and store its location to e.
diff --git a/revision.c b/revision.c
index 4c5adb90b1..41c0478705 100644
--- a/revision.c
+++ b/revision.c
@@ -2338,14 +2338,14 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		die(_("--unpacked=<packfile> no longer supported"));
 	} else if (!strcmp(arg, "--no-kept-objects")) {
 		revs->no_kept_objects = 1;
-		revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
-		revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS;
+		revs->keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS;
+		revs->keep_pack_cache_flags |= CACHE_ON_DISK_KEEP_PACKS;
 	} else if (skip_prefix(arg, "--no-kept-objects=", &optarg)) {
 		revs->no_kept_objects = 1;
 		if (!strcmp(optarg, "in-core"))
-			revs->keep_pack_cache_flags |= IN_CORE_KEEP_PACKS;
+			revs->keep_pack_cache_flags |= CACHE_IN_CORE_KEEP_PACKS;
 		if (!strcmp(optarg, "on-disk"))
-			revs->keep_pack_cache_flags |= ON_DISK_KEEP_PACKS;
+			revs->keep_pack_cache_flags |= CACHE_ON_DISK_KEEP_PACKS;
 	} else if (!strcmp(arg, "-r")) {
 		revs->diff = 1;
 		revs->diffopt.flags.recursive = 1;