diff mbox series

[v3,08/16] midx: allow marking a pack as preferred

Message ID 30194a6786bec51e0f41de0e6c855dc2297806c6.1615482270.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series midx: implement a multi-pack reverse index | expand

Commit Message

Taylor Blau March 11, 2021, 5:05 p.m. UTC
When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.

Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).

To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.

No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).

[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt       | 14 ++-
 Documentation/technical/multi-pack-index.txt |  5 +-
 builtin/multi-pack-index.c                   | 18 +++-
 builtin/repack.c                             |  2 +-
 midx.c                                       | 92 ++++++++++++++++++--
 midx.h                                       |  2 +-
 t/t5319-multi-pack-index.sh                  | 39 +++++++++
 7 files changed, 154 insertions(+), 18 deletions(-)

Comments

Jeff King March 29, 2021, noon UTC | #1
On Thu, Mar 11, 2021 at 12:05:07PM -0500, Taylor Blau wrote:

> To encourage the pack selection process to prefer one pack over another
> (the pack to be preferred is the one a caller would like to later use as
> a reuse pack), introduce the concept of a "preferred pack". When
> provided, the MIDX code will always prefer an object found in a
> preferred pack over any other.
> 
> No format changes are required to store the preferred pack, since it
> will be able to be inferred with a corresponding MIDX bitmap, by looking
> up the pack associated with the object in the first bit position (this
> ordering is described in detail in a subsequent commit).

I think in the long run we may want to add a midx chunk that gives the
order of the packs (and likewise allow the caller of "midx write" to
specify the exact order), since that may allow correlating locality
between history and object order within the .rev/.bitmap files.

But I think this is a nice stopping point for this series, since we're
not having to introduce any new on-disk formats to do it, and it seems
to give pretty good results in practice. I guess we'll have to support
--preferred-pack forever, but that's OK. Even if we do eventually
support arbitrary orderings, it's just a simple subset of that
functionality.

>  static int cmd_multi_pack_index_write(int argc, const char **argv)
>  {
> -	struct option *options = common_opts;
> +	struct option *options;
> +	static struct option builtin_multi_pack_index_write_options[] = {
> +		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
> +			   N_("preferred-pack"),
> +			   N_("pack for reuse when computing a multi-pack bitmap")),
> +		OPT_END(),
> +	};
> +
> +	options = parse_options_dup(builtin_multi_pack_index_write_options);
> +	options = add_common_options(options);
>  
>  	trace2_cmd_mode(argv[0]);
>  
> @@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  		usage_with_options(builtin_multi_pack_index_write_usage,
>  				   options);
>  
> -	return write_midx_file(opts.object_dir, opts.flags);
> +	return write_midx_file(opts.object_dir, opts.preferred_pack,
> +			       opts.flags);
>  }

This has the same leak of "options" that I mentioned in the earlier
patch.

> diff --git a/midx.c b/midx.c
> index 971faa8cfc..46f55ff6cf 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
>  	return strcmp(a->pack_name, b->pack_name);
>  }
>  
> +static int lookup_idx_or_pack_name(struct pack_info *info,
> +				   uint32_t nr,
> +				   const char *pack_name)
> +{
> +	uint32_t lo = 0, hi = nr;
> +	while (lo < hi) {
> +		uint32_t mi = lo + (hi - lo) / 2;
> +		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
> +		if (cmp < 0)
> +			hi = mi;
> +		else if (cmp > 0)
> +			lo = mi + 1;
> +		else
> +			return mi;
> +	}
> +	return -1;
> +}

Could this just be replaced with bsearch() in the caller?

> +test_expect_success 'midx picks objects from preferred pack' '
> +	test_when_finished rm -rf preferred.git &&
> +	git init --bare preferred.git &&
> +	(
> +		cd preferred.git &&
> +
> +		a=$(echo "a" | git hash-object -w --stdin) &&
> +		b=$(echo "b" | git hash-object -w --stdin) &&
> +		c=$(echo "c" | git hash-object -w --stdin) &&
> +
> +		# Set up two packs, duplicating the object "B" at different
> +		# offsets.
> +		git pack-objects objects/pack/test-AB <<-EOF &&
> +		$a
> +		$b
> +		EOF
> +		bc=$(git pack-objects objects/pack/test-BC <<-EOF
> +		$b
> +		$c
> +		EOF
> +		) &&

I don't think pack-objects guarantees that the pack ordering matches the
input it received. compute_write_order() uses a variety of heuristics to
reorder things. I think this will work in practice with the current
code, because the objects have the same type, there are no deltas, and
the fallback ordering is input-order (or traversal order, if --revs is
used).

So it's probably OK in practice, though if we wanted to be paranoid we
could check that show-index produces different results for the $b entry
of both packs. That said...

> +		git multi-pack-index --object-dir=objects \
> +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> +		test_must_be_empty err &&
> +
> +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
> +			cut -d" " -f1) &&
> +		midx_expect_object_offset $b $ofs objects
> +	)

...what we really care about is that the object came from BC. And we are
just using the offset as a proxy for that. But doesn't "test-tool
read-midx" give us the actual pack name? We could just be checking that.

I also wondered if we should confirm that without the --preferred-pack
option, we choose the other pack. I think it will always be true because
the default order is to sort them lexically. A comment to that effect
might be worth it (near the "set up two packs" comment).

-Peff
Taylor Blau March 29, 2021, 9:15 p.m. UTC | #2
On Mon, Mar 29, 2021 at 08:00:59AM -0400, Jeff King wrote:
> I think in the long run we may want to add a midx chunk that gives the
> order of the packs (and likewise allow the caller of "midx write" to
> specify the exact order), since that may allow correlating locality
> between history and object order within the .rev/.bitmap files.
>
> But I think this is a nice stopping point for this series, since we're
> not having to introduce any new on-disk formats to do it, and it seems
> to give pretty good results in practice. I guess we'll have to support
> --preferred-pack forever, but that's OK. Even if we do eventually
> support arbitrary orderings, it's just a simple subset of that
> functionality.

To add a little bit of extra detail, I think what you're getting at here
is that it would be nice to let the order of the packs be dictated by
mtime, not the order they appear in the MIDX (which is lexicographic by
their hash, and thus effectively random).

The reason there being the same as you pointed out in

    https://lore.kernel.org/git/YDRdmh8oS5%2Fxq4rB@coredump.intra.peff.net/

which is that it effectively would lay objects out from newest to
oldest.

But, there's a problem, which is that the MIDX doesn't store the packs'
mtimes. That's fine for writing, since we can just look that information
up ourselves. But the reading side can get broken. That's because the
reader also has to know the pack order to go from MIDX- to bit-position.

So if a third party goes and touches some of the packs after the .rev
file was written, then the reader is going to think the packs ought to
appear in a different order than they actually do. So relying on having
to look up the mtimes again later on isn't good enough.

There are two solutions to the problem:

  - You could write the mtimes in the MIDX itself. This would give you a
    single point of reference, and resolve the TOCTOU race I just
    described.

  - Or, you could forget about mtimes entirely and let the MIDX dictate
    the pack ordering itself. That resolves the race in a
    similar-but-different way.

Of the two, I prefer the latter, but I think it introduces functionality
that we don't necessarily need yet. That's because the objects within
the packs are still ordered as such, and so the compression we get in
the packs is just as good as it is for single-pack bitmaps. It's only at
the objects between pack boundaries that any runs of 1s or 0s might be
interrupted, but there are far fewer pack boundaries than objects, so it
doesn't seem to matter in practice.

Anyway, I think that you know all of that already (mostly because we
thought aloud together when I originally brought this up), but I figure
that this detail may be interesting for other readers, too.

> > @@ -74,7 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
> >  		usage_with_options(builtin_multi_pack_index_write_usage,
> >  				   options);
> >
> > -	return write_midx_file(opts.object_dir, opts.flags);
> > +	return write_midx_file(opts.object_dir, opts.preferred_pack,
> > +			       opts.flags);
> >  }
>
> This has the same leak of "options" that I mentioned in the earlier
> patch.

Yup, thanks for pointing it out.

> > diff --git a/midx.c b/midx.c
> > index 971faa8cfc..46f55ff6cf 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -431,6 +431,24 @@ static int pack_info_compare(const void *_a, const void *_b)
> >  	return strcmp(a->pack_name, b->pack_name);
> >  }
> >
> > +static int lookup_idx_or_pack_name(struct pack_info *info,
> > +				   uint32_t nr,
> > +				   const char *pack_name)
> > +{
> > +	uint32_t lo = 0, hi = nr;
> > +	while (lo < hi) {
> > +		uint32_t mi = lo + (hi - lo) / 2;
> > +		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
> > +		if (cmp < 0)
> > +			hi = mi;
> > +		else if (cmp > 0)
> > +			lo = mi + 1;
> > +		else
> > +			return mi;
> > +	}
> > +	return -1;
> > +}
>
> Could this just be replaced with bsearch() in the caller?

Great suggestion. Yes, it can be. FWIW, I think that I may have
originally thought that it couldn't be since we were comparing a fixed
string to an array of structs (each having a field which holds the value
we actually want to compare). But bsearch() always passes the key as the
first argument to the comparator, so this is possible to do.

> > +		git multi-pack-index --object-dir=objects \
> > +			write --preferred-pack=test-BC-$bc.idx 2>err &&
> > +		test_must_be_empty err &&
> > +
> > +		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
> > +			cut -d" " -f1) &&
> > +		midx_expect_object_offset $b $ofs objects
> > +	)
>
> ...what we really care about is that the object came from BC. And we are
> just using the offset as a proxy for that. But doesn't "test-tool
> read-midx" give us the actual pack name? We could just be checking that.
>
> I also wondered if we should confirm that without the --preferred-pack
> option, we choose the other pack. I think it will always be true because
> the default order is to sort them lexically. A comment to that effect
> might be worth it (near the "set up two packs" comment).

Both great points, thanks.

> -Peff

Thanks,
Taylor
Jeff King March 30, 2021, 7:11 a.m. UTC | #3
On Mon, Mar 29, 2021 at 05:15:12PM -0400, Taylor Blau wrote:

> There are two solutions to the problem:
> 
>   - You could write the mtimes in the MIDX itself. This would give you a
>     single point of reference, and resolve the TOCTOU race I just
>     described.
> 
>   - Or, you could forget about mtimes entirely and let the MIDX dictate
>     the pack ordering itself. That resolves the race in a
>     similar-but-different way.
> 
> Of the two, I prefer the latter, but I think it introduces functionality
> that we don't necessarily need yet.

Yeah, I'd strongly favor the latter over the former. The reason to go
with the solution you have in this series is that it doesn't require
changing anything in the on-disk midx format, and we think it is good
enough. But once we are going to change the on-disk format, we might as
well give the writing side as much flexibility as possible.

Of course the mtimes themselves are really just numbers, so in a sense
the two are really equivalent. ;)

> That's because the objects within
> the packs are still ordered as such, and so the compression we get in
> the packs is just as good as it is for single-pack bitmaps. It's only at
> the objects between pack boundaries that any runs of 1s or 0s might be
> interrupted, but there are far fewer pack boundaries than objects, so it
> doesn't seem to matter in practice.

Right. The absolute worst case is a large number of single-object packs,
in which case the bitmap order becomes essentially random with respect
to history (because it would be sorted by sha1 of the packs).

The effect _might_ be measurable in more real-world cases, like say one
big pack and 100 pushes each with a handful of commits. The big pack
would be in good shape, but you have a lot of extra pack boundaries that
hurt the bitmap compression.

But in practice, generating bitmaps is expensive enough that you'd
probably want to roll up some of the packs anyway (and that is certainly
what we are doing at GitHub, using your "repack --geometric"). So you'd
end usually with one big pack representing most of history, and then a
handful of roll-up packs.

So I'm a little curious whether one could even measure the impact of,
say, 100 little packs. But not enough to even run the experiment,
because even that is not a case that is really that interesting.

> Anyway, I think that you know all of that already (mostly because we
> thought aloud together when I originally brought this up), but I figure
> that this detail may be interesting for other readers, too.

Indeed. And I know that you know everything I just wrote, but I agree
it's nice to get a record of these discussions onto the list. :)

-Peff
diff mbox series

Patch

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index eb0caa0439..ffd601bc17 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,8 @@  git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 --------
 [verse]
-'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress] <subcommand>
+'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
+	[--preferred-pack=<pack>] <subcommand>
 
 DESCRIPTION
 -----------
@@ -30,7 +31,16 @@  OPTIONS
 The following subcommands are available:
 
 write::
-	Write a new MIDX file.
+	Write a new MIDX file. The following options are available for
+	the `write` sub-command:
++
+--
+	--preferred-pack=<pack>::
+		Optionally specify the tie-breaking pack used when
+		multiple packs contain the same object. If not given,
+		ties are broken in favor of the pack with the lowest
+		mtime.
+--
 
 verify::
 	Verify the contents of the MIDX file.
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index e8e377a59f..fb688976c4 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -43,8 +43,9 @@  Design Details
   a change in format.
 
 - The MIDX keeps only one record per object ID. If an object appears
-  in multiple packfiles, then the MIDX selects the copy in the most-
-  recently modified packfile.
+  in multiple packfiles, then the MIDX selects the copy in the
+  preferred packfile, otherwise selecting from the most-recently
+  modified packfile.
 
 - If there exist packfiles in the pack directory not registered in
   the MIDX, then those packfiles are loaded into the `packed_git`
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 243b6ccc7c..92f358f212 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -4,9 +4,10 @@ 
 #include "parse-options.h"
 #include "midx.h"
 #include "trace2.h"
+#include "object-store.h"
 
 #define BUILTIN_MIDX_WRITE_USAGE \
-	N_("git multi-pack-index [<options>] write")
+	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
@@ -43,6 +44,7 @@  static char const * const builtin_multi_pack_index_usage[] = {
 
 static struct opts_multi_pack_index {
 	const char *object_dir;
+	const char *preferred_pack;
 	unsigned long batch_size;
 	unsigned flags;
 } opts;
@@ -63,7 +65,16 @@  static struct option *add_common_options(struct option *prev)
 
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
-	struct option *options = common_opts;
+	struct option *options;
+	static struct option builtin_multi_pack_index_write_options[] = {
+		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
+			   N_("preferred-pack"),
+			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_END(),
+	};
+
+	options = parse_options_dup(builtin_multi_pack_index_write_options);
+	options = add_common_options(options);
 
 	trace2_cmd_mode(argv[0]);
 
@@ -74,7 +85,8 @@  static int cmd_multi_pack_index_write(int argc, const char **argv)
 		usage_with_options(builtin_multi_pack_index_write_usage,
 				   options);
 
-	return write_midx_file(opts.object_dir, opts.flags);
+	return write_midx_file(opts.object_dir, opts.preferred_pack,
+			       opts.flags);
 }
 
 static int cmd_multi_pack_index_verify(int argc, const char **argv)
diff --git a/builtin/repack.c b/builtin/repack.c
index 01440de2d5..9f00806805 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -523,7 +523,7 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 	remove_temporary_files();
 
 	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), 0);
+		write_midx_file(get_object_directory(), NULL, 0);
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/midx.c b/midx.c
index 971faa8cfc..46f55ff6cf 100644
--- a/midx.c
+++ b/midx.c
@@ -431,6 +431,24 @@  static int pack_info_compare(const void *_a, const void *_b)
 	return strcmp(a->pack_name, b->pack_name);
 }
 
+static int lookup_idx_or_pack_name(struct pack_info *info,
+				   uint32_t nr,
+				   const char *pack_name)
+{
+	uint32_t lo = 0, hi = nr;
+	while (lo < hi) {
+		uint32_t mi = lo + (hi - lo) / 2;
+		int cmp = cmp_idx_or_pack_name(pack_name, info[mi].pack_name);
+		if (cmp < 0)
+			hi = mi;
+		else if (cmp > 0)
+			lo = mi + 1;
+		else
+			return mi;
+	}
+	return -1;
+}
+
 struct write_midx_context {
 	struct pack_info *info;
 	uint32_t nr;
@@ -445,6 +463,8 @@  struct write_midx_context {
 	uint32_t *pack_perm;
 	unsigned large_offsets_needed:1;
 	uint32_t num_large_offsets;
+
+	int preferred_pack_idx;
 };
 
 static void add_pack_to_midx(const char *full_path, size_t full_path_len,
@@ -489,6 +509,7 @@  struct pack_midx_entry {
 	uint32_t pack_int_id;
 	time_t pack_mtime;
 	uint64_t offset;
+	unsigned preferred : 1;
 };
 
 static int midx_oid_compare(const void *_a, const void *_b)
@@ -500,6 +521,12 @@  static int midx_oid_compare(const void *_a, const void *_b)
 	if (cmp)
 		return cmp;
 
+	/* Sort objects in a preferred pack first when multiple copies exist. */
+	if (a->preferred > b->preferred)
+		return -1;
+	if (a->preferred < b->preferred)
+		return 1;
+
 	if (a->pack_mtime > b->pack_mtime)
 		return -1;
 	else if (a->pack_mtime < b->pack_mtime)
@@ -527,7 +554,8 @@  static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
 static void fill_pack_entry(uint32_t pack_int_id,
 			    struct packed_git *p,
 			    uint32_t cur_object,
-			    struct pack_midx_entry *entry)
+			    struct pack_midx_entry *entry,
+			    int preferred)
 {
 	if (nth_packed_object_id(&entry->oid, p, cur_object) < 0)
 		die(_("failed to locate object %d in packfile"), cur_object);
@@ -536,6 +564,7 @@  static void fill_pack_entry(uint32_t pack_int_id,
 	entry->pack_mtime = p->mtime;
 
 	entry->offset = nth_packed_object_offset(p, cur_object);
+	entry->preferred = !!preferred;
 }
 
 /*
@@ -552,7 +581,8 @@  static void fill_pack_entry(uint32_t pack_int_id,
 static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						  struct pack_info *info,
 						  uint32_t nr_packs,
-						  uint32_t *nr_objects)
+						  uint32_t *nr_objects,
+						  int preferred_pack)
 {
 	uint32_t cur_fanout, cur_pack, cur_object;
 	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
@@ -589,12 +619,17 @@  static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 				nth_midxed_pack_midx_entry(m,
 							   &entries_by_fanout[nr_fanout],
 							   cur_object);
+				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+					entries_by_fanout[nr_fanout].preferred = 1;
+				else
+					entries_by_fanout[nr_fanout].preferred = 0;
 				nr_fanout++;
 			}
 		}
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			uint32_t start = 0, end;
+			int preferred = cur_pack == preferred_pack;
 
 			if (cur_fanout)
 				start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
@@ -602,7 +637,11 @@  static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 
 			for (cur_object = start; cur_object < end; cur_object++) {
 				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
-				fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, &entries_by_fanout[nr_fanout]);
+				fill_pack_entry(cur_pack,
+						info[cur_pack].p,
+						cur_object,
+						&entries_by_fanout[nr_fanout],
+						preferred);
 				nr_fanout++;
 			}
 		}
@@ -777,7 +816,9 @@  static int write_midx_large_offsets(struct hashfile *f,
 }
 
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
-			       struct string_list *packs_to_drop, unsigned flags)
+			       struct string_list *packs_to_drop,
+			       const char *preferred_pack_name,
+			       unsigned flags)
 {
 	char *midx_name;
 	uint32_t i;
@@ -828,7 +869,19 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
 		goto cleanup;
 
-	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
+	ctx.preferred_pack_idx = -1;
+	if (preferred_pack_name) {
+		for (i = 0; i < ctx.nr; i++) {
+			if (!cmp_idx_or_pack_name(preferred_pack_name,
+						  ctx.info[i].pack_name)) {
+				ctx.preferred_pack_idx = i;
+				break;
+			}
+		}
+	}
+
+	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
+					 ctx.preferred_pack_idx);
 
 	ctx.large_offsets_needed = 0;
 	for (i = 0; i < ctx.entries_nr; i++) {
@@ -889,6 +942,24 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 			pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
 	}
 
+	/* Check that the preferred pack wasn't expired (if given). */
+	if (preferred_pack_name) {
+		int preferred_idx = lookup_idx_or_pack_name(ctx.info,
+							    ctx.nr,
+							    preferred_pack_name);
+		if (preferred_idx < 0)
+			warning(_("unknown preferred pack: '%s'"),
+				preferred_pack_name);
+		else {
+			uint32_t orig = ctx.info[preferred_idx].orig_pack_int_id;
+			uint32_t perm = ctx.pack_perm[orig];
+
+			if (perm == PACK_EXPIRED)
+				warning(_("preferred pack '%s' is expired"),
+					preferred_pack_name);
+		}
+	}
+
 	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
 		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
 					(pack_name_concat_len % MIDX_CHUNK_ALIGNMENT);
@@ -947,9 +1018,12 @@  static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	return result;
 }
 
-int write_midx_file(const char *object_dir, unsigned flags)
+int write_midx_file(const char *object_dir,
+		    const char *preferred_pack_name,
+		    unsigned flags)
 {
-	return write_midx_internal(object_dir, NULL, NULL, flags);
+	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
+				   flags);
 }
 
 void clear_midx_file(struct repository *r)
@@ -1184,7 +1258,7 @@  int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	free(count);
 
 	if (packs_to_drop.nr)
-		result = write_midx_internal(object_dir, m, &packs_to_drop, flags);
+		result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags);
 
 	string_list_clear(&packs_to_drop, 0);
 	return result;
@@ -1373,7 +1447,7 @@  int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, m, NULL, flags);
+	result = write_midx_internal(object_dir, m, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
diff --git a/midx.h b/midx.h
index b18cf53bc4..e7fea61109 100644
--- a/midx.h
+++ b/midx.h
@@ -47,7 +47,7 @@  int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
 int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-int write_midx_file(const char *object_dir, unsigned flags);
+int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
 int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index b4afab1dfc..fd94ba9053 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -31,6 +31,14 @@  midx_read_expect () {
 	test_cmp expect actual
 }
 
+midx_expect_object_offset () {
+	OID="$1"
+	OFFSET="$2"
+	OBJECT_DIR="$3"
+	test-tool read-midx --show-objects $OBJECT_DIR >actual &&
+	grep "^$OID $OFFSET" actual
+}
+
 test_expect_success 'setup' '
 	test_oid_cache <<-EOF
 	idxoff sha1:2999
@@ -234,6 +242,37 @@  test_expect_success 'warn on improper hash version' '
 	)
 '
 
+test_expect_success 'midx picks objects from preferred pack' '
+	test_when_finished rm -rf preferred.git &&
+	git init --bare preferred.git &&
+	(
+		cd preferred.git &&
+
+		a=$(echo "a" | git hash-object -w --stdin) &&
+		b=$(echo "b" | git hash-object -w --stdin) &&
+		c=$(echo "c" | git hash-object -w --stdin) &&
+
+		# Set up two packs, duplicating the object "B" at different
+		# offsets.
+		git pack-objects objects/pack/test-AB <<-EOF &&
+		$a
+		$b
+		EOF
+		bc=$(git pack-objects objects/pack/test-BC <<-EOF
+		$b
+		$c
+		EOF
+		) &&
+
+		git multi-pack-index --object-dir=objects \
+			write --preferred-pack=test-BC-$bc.idx 2>err &&
+		test_must_be_empty err &&
+
+		ofs=$(git show-index <objects/pack/test-BC-$bc.idx | grep $b |
+			cut -d" " -f1) &&
+		midx_expect_object_offset $b $ofs objects
+	)
+'
 
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir