diff mbox series

[07/24] midx: implement `--retain-disjoint` mode

Message ID 3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-objects: multi-pack verbatim reuse | expand

Commit Message

Taylor Blau Nov. 28, 2023, 7:08 p.m. UTC
Once multi-pack reachability bitmaps learn how to perform pack reuse
over the set of disjoint packs, we will want to teach `git repack` to
evolve the set of disjoint packs over time.

To evolve the set of disjoint packs means any new packs made by `repack`
should be disjoint with respect to the existing set of disjoint packs so
as to be able to join that set when updating the multi-pack index.

The details of generating such packs will be left to future commits. But
any new pack(s) created by repack as disjoint will be marked as such by
passing them over `--stdin-packs` with the special '+' marker when
generating a new MIDX.

This patch, however, addresses the question of how we retain the
existing set of disjoint packs when updating the multi-pack index. One
option would be for `repack` to keep track of the set of disjoint packs
itself by querying the MIDX, and then adding the special '+' marker
appropriately when generating the input for `--stdin-packs`.

But this is verbose and error-prone, since two different parts of Git
would need to maintain the same notion of the set of disjoint packs.
When one disagrees with the other, the set of so-called disjoint packs
may actually contain two or more packs which have one or more object(s)
in common, making the set non-disjoint.

Instead, introduce a `--retain-disjoint` mode for the `git
multi-pack-index write` sub-command which keeps any packs which are:

  - marked as disjoint in the existing MIDX, and

  - not deleted (e.g., they are not excluded from the input for
    `--stdin-packs`).

This will allow the `repack` command to not have to keep track of the
set of currently-disjoint packs itself, reducing the number of lines of
code necessary to implement this feature, and making the resulting
implementation less error-prone.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  8 +++
 builtin/multi-pack-index.c             |  3 +
 midx.c                                 | 49 +++++++++++++++
 midx.h                                 |  1 +
 t/lib-disjoint.sh                      | 38 ++++++++++++
 t/t5319-multi-pack-index.sh            | 82 ++++++++++++++++++++++++++
 6 files changed, 181 insertions(+)
 create mode 100644 t/lib-disjoint.sh

Comments

Patrick Steinhardt Nov. 30, 2023, 10:18 a.m. UTC | #1
On Tue, Nov 28, 2023 at 02:08:13PM -0500, Taylor Blau wrote:
> Once multi-pack reachability bitmaps learn how to perform pack reuse
> over the set of disjoint packs, we will want to teach `git repack` to
> evolve the set of disjoint packs over time.
> 
> To evolve the set of disjoint packs means any new packs made by `repack`
> should be disjoint with respect to the existing set of disjoint packs so
> as to be able to join that set when updating the multi-pack index.
> 
> The details of generating such packs will be left to future commits. But
> any new pack(s) created by repack as disjoint will be marked as such by
> passing them over `--stdin-packs` with the special '+' marker when
> generating a new MIDX.
> 
> This patch, however, addresses the question of how we retain the
> existing set of disjoint packs when updating the multi-pack index. One
> option would be for `repack` to keep track of the set of disjoint packs
> itself by querying the MIDX, and then adding the special '+' marker
> appropriately when generating the input for `--stdin-packs`.
> 
> But this is verbose and error-prone, since two different parts of Git
> would need to maintain the same notion of the set of disjoint packs.
> When one disagrees with the other, the set of so-called disjoint packs
> may actually contain two or more packs which have one or more object(s)
> in common, making the set non-disjoint.
> 
> Instead, introduce a `--retain-disjoint` mode for the `git
> multi-pack-index write` sub-command which keeps any packs which are:
> 
>   - marked as disjoint in the existing MIDX, and
> 
>   - not deleted (e.g., they are not excluded from the input for
>     `--stdin-packs`).
> 
> This will allow the `repack` command to not have to keep track of the
> set of currently-disjoint packs itself, reducing the number of lines of
> code necessary to implement this feature, and making the resulting
> implementation less error-prone.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  Documentation/git-multi-pack-index.txt |  8 +++
>  builtin/multi-pack-index.c             |  3 +
>  midx.c                                 | 49 +++++++++++++++
>  midx.h                                 |  1 +
>  t/lib-disjoint.sh                      | 38 ++++++++++++
>  t/t5319-multi-pack-index.sh            | 82 ++++++++++++++++++++++++++
>  6 files changed, 181 insertions(+)
>  create mode 100644 t/lib-disjoint.sh
> 
> diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> index d130e65b28..ac0c7b124b 100644
> --- a/Documentation/git-multi-pack-index.txt
> +++ b/Documentation/git-multi-pack-index.txt
> @@ -54,6 +54,14 @@ write::
>  		"disjoint". See the "`DISP` chunk and disjoint packs"
>  		section in linkgit:gitformat-pack[5] for more.
>  
> +	--retain-disjoint::
> +		When writing a multi-pack index with a reachability
> +		bitmap, keep any packs marked as disjoint in the
> +		existing MIDX (if any) as such in the new MIDX. Existing
> +		disjoint packs which are removed (e.g., not listed via
> +		`--stdin-packs`) are ignored. This option works in
> +		addition to the '+' marker for `--stdin-packs`.

I'm trying to understand when you're expected to pass this flag and when
you're expected not to pass it. This documentation could also help in
the documentation here so that the user can make a more informed
decision.

Patrick

>  	--refs-snapshot=<path>::
>  		With `--bitmap`, optionally specify a file which
>  		contains a "refs snapshot" taken prior to repacking.
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 0f1dd4651d..dcfabf2626 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -138,6 +138,9 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
>  			 N_("write multi-pack index containing only given indexes")),
>  		OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot,
>  			     N_("refs snapshot for selecting bitmap commits")),
> +		OPT_BIT(0, "retain-disjoint", &opts.flags,
> +			N_("retain non-deleted disjoint packs"),
> +			MIDX_WRITE_RETAIN_DISJOINT),
>  		OPT_END(),
>  	};
>  
> diff --git a/midx.c b/midx.c
> index 65ba0c70fe..ce67da9f85 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -721,6 +721,12 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
>  					   &fanout->entries[fanout->nr],
>  					   cur_object);
>  		fanout->entries[fanout->nr].preferred = 0;
> +		/*
> +		 * It's OK to set disjoint to 0 here, even with
> +		 * `--retain-disjoint`, since we will always see the disjoint
> +		 * copy of some object below in get_sorted_entries(), causing us
> +		 * to die().
> +		 */
>  		fanout->entries[fanout->nr].disjoint = 0;
>  		fanout->nr++;
>  	}
> @@ -1362,6 +1368,37 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
>  	return result;
>  }
>  
> +static int midx_retain_existing_disjoint(struct repository *r,
> +					 struct multi_pack_index *from,
> +					 struct write_midx_context *ctx)
> +{
> +	struct bitmapped_pack bp;
> +	uint32_t i, midx_pos;
> +
> +	for (i = 0; i < ctx->nr; i++) {
> +		struct pack_info *info = &ctx->info[i];
> +		/*
> +		 * Having to call `midx_locate_pack()` in a loop is
> +		 * sub-optimal, since it is O(n*log(n)) in the number
> +		 * of packs.
> +		 *
> +		 * When reusing an existing MIDX, we know that the first
> +		 * 'n' packs appear in the same order, so we could avoid
> +		 * this when reusing an existing MIDX. But we may be
> +		 * instead relying on the order given to us by
> +		 * for_each_file_in_pack_dir(), in which case we can't
> +		 * make any such guarantees.
> +		 */
> +		if (!midx_locate_pack(from, info->pack_name, &midx_pos))
> +			continue;
> +
> +		if (nth_bitmapped_pack(r, from, &bp, midx_pos) < 0)
> +			return -1;
> +		info->disjoint = bp.disjoint;
> +	}
> +	return 0;
> +}
> +
>  static int write_midx_internal(const char *object_dir,
>  			       struct string_list *packs_to_include,
>  			       struct string_list *packs_to_drop,
> @@ -1444,6 +1481,18 @@ static int write_midx_internal(const char *object_dir,
>  	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
>  	stop_progress(&ctx.progress);
>  
> +	if (flags & MIDX_WRITE_RETAIN_DISJOINT) {
> +		struct multi_pack_index *m = ctx.m;
> +		if (!m)
> +			m = lookup_multi_pack_index(the_repository, object_dir);
> +
> +		if (m) {
> +			result = midx_retain_existing_disjoint(the_repository, m, &ctx);
> +			if (result)
> +				goto cleanup;
> +		}
> +	}
> +
>  	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
>  	    !(packs_to_include || packs_to_drop)) {
>  		struct bitmap_index *bitmap_git;
> diff --git a/midx.h b/midx.h
> index a6e969c2ea..d7ce52ff7b 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -54,6 +54,7 @@ struct multi_pack_index {
>  #define MIDX_WRITE_BITMAP (1 << 2)
>  #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
>  #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
> +#define MIDX_WRITE_RETAIN_DISJOINT (1 << 5)
>  
>  const unsigned char *get_midx_checksum(struct multi_pack_index *m);
>  void get_midx_filename(struct strbuf *out, const char *object_dir);
> diff --git a/t/lib-disjoint.sh b/t/lib-disjoint.sh
> new file mode 100644
> index 0000000000..c6c6e74aba
> --- /dev/null
> +++ b/t/lib-disjoint.sh
> @@ -0,0 +1,38 @@
> +# Helpers for scripts testing disjoint packs; see t5319 for example usage.
> +
> +objdir=.git/objects
> +
> +test_disjoint_1 () {
> +	local pack="$1"
> +	local want="$2"
> +
> +	test-tool read-midx --bitmap $objdir >out &&
> +	grep -A 3 "$pack" out >found &&
> +
> +	if ! test -s found
> +	then
> +		echo >&2 "could not find '$pack' in MIDX"
> +		return 1
> +	fi
> +
> +	if ! grep -q "disjoint: $want" found
> +	then
> +		echo >&2 "incorrect disjoint state for pack '$pack'"
> +		return 1
> +	fi
> +	return 0
> +}
> +
> +# test_must_be_disjoint <pack-$XYZ.pack>
> +#
> +# Ensures that the given pack is marked as disjoint.
> +test_must_be_disjoint () {
> +	test_disjoint_1 "$1" "yes"
> +}
> +
> +# test_must_not_be_disjoint <pack-$XYZ.pack>
> +#
> +# Ensures that the given pack is not marked as disjoint.
> +test_must_not_be_disjoint () {
> +	test_disjoint_1 "$1" "no"
> +}
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index fd24e0c952..02cfddf151 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -3,6 +3,7 @@
>  test_description='multi-pack-indexes'
>  . ./test-lib.sh
>  . "$TEST_DIRECTORY"/lib-chunk.sh
> +. "$TEST_DIRECTORY"/lib-disjoint.sh
>  
>  GIT_TEST_MULTI_PACK_INDEX=0
>  objdir=.git/objects
> @@ -1215,4 +1216,85 @@ test_expect_success 'non-disjoint packs are detected' '
>  	)
>  '
>  
> +test_expect_success 'retain disjoint packs while writing' '
> +	test_when_finished "rm -fr repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		for i in 1 2
> +		do
> +			test_commit "$i" && git repack -d || return 1
> +		done &&
> +
> +		find $objdir/pack -type f -name "pack-*.idx" |
> +		sed -e "s/^.*\/\(.*\)/\1/g" | sort >packs.old &&
> +
> +		test_line_count = 2 packs.old &&
> +		disjoint="$(head -n 1 packs.old)" &&
> +		non_disjoint="$(tail -n 1 packs.old)" &&
> +
> +		cat >in <<-EOF &&
> +		+$disjoint
> +		$non_disjoint
> +		EOF
> +		git multi-pack-index write --stdin-packs --bitmap <in &&
> +
> +		test_must_be_disjoint "${disjoint%.idx}.pack" &&
> +		test_must_not_be_disjoint "${non_disjoint%.idx}.pack" &&
> +
> +		test_commit 3 &&
> +		git repack -d &&
> +
> +		find $objdir/pack -type f -name "pack-*.idx" |
> +		sed -e "s/^.*\/\(.*\)/\1/g" | sort >packs.new &&
> +
> +		new_disjoint="$(comm -13 packs.old packs.new)" &&
> +		cat >in <<-EOF &&
> +		$disjoint
> +		$non_disjoint
> +		+$new_disjoint
> +		EOF
> +		git multi-pack-index write --stdin-packs --bitmap \
> +			--retain-disjoint <in &&
> +
> +		test_must_be_disjoint "${disjoint%.idx}.pack" &&
> +		test_must_be_disjoint "${new_disjoint%.idx}.pack" &&
> +		test_must_not_be_disjoint "${non_disjoint%.idx}.pack"
> +
> +	)
> +'
> +
> +test_expect_success 'non-disjoint packs are detected via --retain-disjoint' '
> +	test_when_finished "rm -fr repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		packdir=.git/objects/pack &&
> +
> +		test_commit base &&
> +		base="$(echo base | git pack-objects --revs $packdir/pack)" &&
> +
> +		cat >in <<-EOF &&
> +		+pack-$base.idx
> +		EOF
> +		git multi-pack-index write --stdin-packs --bitmap <in &&
> +
> +		test_must_be_disjoint "pack-$base.pack" &&
> +
> +		test_commit other &&
> +		other="$(echo other | git pack-objects --revs $packdir/pack)" &&
> +
> +		cat >in <<-EOF &&
> +		pack-$base.idx
> +		+pack-$other.idx
> +		EOF
> +		test_must_fail git multi-pack-index write --stdin-packs --retain-disjoint --bitmap <in 2>err &&
> +		grep "duplicate object.* among disjoint packs" err &&
> +
> +		test_must_fail git multi-pack-index write --retain-disjoint --bitmap 2>err &&
> +		grep "duplicate object.* among disjoint packs" err
> +	)
> +'
> +
>  test_done
> -- 
> 2.43.0.24.g980b318f98
>
Taylor Blau Nov. 30, 2023, 7:29 p.m. UTC | #2
On Thu, Nov 30, 2023 at 11:18:51AM +0100, Patrick Steinhardt wrote:
> > diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> > index d130e65b28..ac0c7b124b 100644
> > --- a/Documentation/git-multi-pack-index.txt
> > +++ b/Documentation/git-multi-pack-index.txt
> > @@ -54,6 +54,14 @@ write::
> >  		"disjoint". See the "`DISP` chunk and disjoint packs"
> >  		section in linkgit:gitformat-pack[5] for more.
> >
> > +	--retain-disjoint::
> > +		When writing a multi-pack index with a reachability
> > +		bitmap, keep any packs marked as disjoint in the
> > +		existing MIDX (if any) as such in the new MIDX. Existing
> > +		disjoint packs which are removed (e.g., not listed via
> > +		`--stdin-packs`) are ignored. This option works in
> > +		addition to the '+' marker for `--stdin-packs`.
>
> I'm trying to understand when you're expected to pass this flag and when
> you're expected not to pass it. This documentation could also help in
> the documentation here so that the user can make a more informed
> decision.

I think there are multiple reasons that you may or may not want to pass
that flag. Certainly if you're not using disjoint packs (and instead
only care about single-pack verbatim reuse over the MIDX's preferred
packfile), then you don't need to pass it.

But if you are using disjoint packs, you may want to pass it if you are
adding packs to the MIDX which are disjoint, _and_ you want to hold onto
the existing set of disjoint packs.

But if you want to change the set of disjoint packs entirely, you would
want to omit this flag (unless you knew a-priori that you were going to
drop all of the currently marked disjoint packs from the new MIDX you
are writing, e.g. with --stdin-packs).

If you think it would be useful, I could try and distill some of this
down, but I think that there is likely too much detail here for it to be
useful in user-facing documentation.

Thanks,
Taylor
Patrick Steinhardt Dec. 1, 2023, 8:02 a.m. UTC | #3
On Thu, Nov 30, 2023 at 02:29:52PM -0500, Taylor Blau wrote:
> On Thu, Nov 30, 2023 at 11:18:51AM +0100, Patrick Steinhardt wrote:
> > > diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
> > > index d130e65b28..ac0c7b124b 100644
> > > --- a/Documentation/git-multi-pack-index.txt
> > > +++ b/Documentation/git-multi-pack-index.txt
> > > @@ -54,6 +54,14 @@ write::
> > >  		"disjoint". See the "`DISP` chunk and disjoint packs"
> > >  		section in linkgit:gitformat-pack[5] for more.
> > >
> > > +	--retain-disjoint::
> > > +		When writing a multi-pack index with a reachability
> > > +		bitmap, keep any packs marked as disjoint in the
> > > +		existing MIDX (if any) as such in the new MIDX. Existing
> > > +		disjoint packs which are removed (e.g., not listed via
> > > +		`--stdin-packs`) are ignored. This option works in
> > > +		addition to the '+' marker for `--stdin-packs`.
> >
> > I'm trying to understand when you're expected to pass this flag and when
> > you're expected not to pass it. This documentation could also help in
> > the documentation here so that the user can make a more informed
> > decision.
> 
> I think there are multiple reasons that you may or may not want to pass
> that flag. Certainly if you're not using disjoint packs (and instead
> only care about single-pack verbatim reuse over the MIDX's preferred
> packfile), then you don't need to pass it.
> 
> But if you are using disjoint packs, you may want to pass it if you are
> adding packs to the MIDX which are disjoint, _and_ you want to hold onto
> the existing set of disjoint packs.
> 
> But if you want to change the set of disjoint packs entirely, you would
> want to omit this flag (unless you knew a-priori that you were going to
> drop all of the currently marked disjoint packs from the new MIDX you
> are writing, e.g. with --stdin-packs).
> 
> If you think it would be useful, I could try and distill some of this
> down, but I think that there is likely too much detail here for it to be
> useful in user-facing documentation.

Yeah, this indeed feels too detailed to be added here. I was hoping for
a simple "Never do this if"-style rule that points out why it is unwise
under some circumstances, but seems like it's not as simple as that.

Well, so be it. Thanks!

Patrick
diff mbox series

Patch

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index d130e65b28..ac0c7b124b 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -54,6 +54,14 @@  write::
 		"disjoint". See the "`DISP` chunk and disjoint packs"
 		section in linkgit:gitformat-pack[5] for more.
 
+	--retain-disjoint::
+		When writing a multi-pack index with a reachability
+		bitmap, keep any packs marked as disjoint in the
+		existing MIDX (if any) as such in the new MIDX. Existing
+		disjoint packs which are removed (e.g., not listed via
+		`--stdin-packs`) are ignored. This option works in
+		addition to the '+' marker for `--stdin-packs`.
+
 	--refs-snapshot=<path>::
 		With `--bitmap`, optionally specify a file which
 		contains a "refs snapshot" taken prior to repacking.
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 0f1dd4651d..dcfabf2626 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -138,6 +138,9 @@  static int cmd_multi_pack_index_write(int argc, const char **argv,
 			 N_("write multi-pack index containing only given indexes")),
 		OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot,
 			     N_("refs snapshot for selecting bitmap commits")),
+		OPT_BIT(0, "retain-disjoint", &opts.flags,
+			N_("retain non-deleted disjoint packs"),
+			MIDX_WRITE_RETAIN_DISJOINT),
 		OPT_END(),
 	};
 
diff --git a/midx.c b/midx.c
index 65ba0c70fe..ce67da9f85 100644
--- a/midx.c
+++ b/midx.c
@@ -721,6 +721,12 @@  static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 					   &fanout->entries[fanout->nr],
 					   cur_object);
 		fanout->entries[fanout->nr].preferred = 0;
+		/*
+		 * It's OK to set disjoint to 0 here, even with
+		 * `--retain-disjoint`, since we will always see the disjoint
+		 * copy of some object below in get_sorted_entries(), causing us
+		 * to die().
+		 */
 		fanout->entries[fanout->nr].disjoint = 0;
 		fanout->nr++;
 	}
@@ -1362,6 +1368,37 @@  static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 	return result;
 }
 
+static int midx_retain_existing_disjoint(struct repository *r,
+					 struct multi_pack_index *from,
+					 struct write_midx_context *ctx)
+{
+	struct bitmapped_pack bp;
+	uint32_t i, midx_pos;
+
+	for (i = 0; i < ctx->nr; i++) {
+		struct pack_info *info = &ctx->info[i];
+		/*
+		 * Having to call `midx_locate_pack()` in a loop is
+		 * sub-optimal, since it is O(n*log(n)) in the number
+		 * of packs.
+		 *
+		 * When reusing an existing MIDX, we know that the first
+		 * 'n' packs appear in the same order, so we could avoid
+		 * this when reusing an existing MIDX. But we may be
+		 * instead relying on the order given to us by
+		 * for_each_file_in_pack_dir(), in which case we can't
+		 * make any such guarantees.
+		 */
+		if (!midx_locate_pack(from, info->pack_name, &midx_pos))
+			continue;
+
+		if (nth_bitmapped_pack(r, from, &bp, midx_pos) < 0)
+			return -1;
+		info->disjoint = bp.disjoint;
+	}
+	return 0;
+}
+
 static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_include,
 			       struct string_list *packs_to_drop,
@@ -1444,6 +1481,18 @@  static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
+	if (flags & MIDX_WRITE_RETAIN_DISJOINT) {
+		struct multi_pack_index *m = ctx.m;
+		if (!m)
+			m = lookup_multi_pack_index(the_repository, object_dir);
+
+		if (m) {
+			result = midx_retain_existing_disjoint(the_repository, m, &ctx);
+			if (result)
+				goto cleanup;
+		}
+	}
+
 	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
 	    !(packs_to_include || packs_to_drop)) {
 		struct bitmap_index *bitmap_git;
diff --git a/midx.h b/midx.h
index a6e969c2ea..d7ce52ff7b 100644
--- a/midx.h
+++ b/midx.h
@@ -54,6 +54,7 @@  struct multi_pack_index {
 #define MIDX_WRITE_BITMAP (1 << 2)
 #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
 #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
+#define MIDX_WRITE_RETAIN_DISJOINT (1 << 5)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 void get_midx_filename(struct strbuf *out, const char *object_dir);
diff --git a/t/lib-disjoint.sh b/t/lib-disjoint.sh
new file mode 100644
index 0000000000..c6c6e74aba
--- /dev/null
+++ b/t/lib-disjoint.sh
@@ -0,0 +1,38 @@ 
+# Helpers for scripts testing disjoint packs; see t5319 for example usage.
+
+objdir=.git/objects
+
+test_disjoint_1 () {
+	local pack="$1"
+	local want="$2"
+
+	test-tool read-midx --bitmap $objdir >out &&
+	grep -A 3 "$pack" out >found &&
+
+	if ! test -s found
+	then
+		echo >&2 "could not find '$pack' in MIDX"
+		return 1
+	fi
+
+	if ! grep -q "disjoint: $want" found
+	then
+		echo >&2 "incorrect disjoint state for pack '$pack'"
+		return 1
+	fi
+	return 0
+}
+
+# test_must_be_disjoint <pack-$XYZ.pack>
+#
+# Ensures that the given pack is marked as disjoint.
+test_must_be_disjoint () {
+	test_disjoint_1 "$1" "yes"
+}
+
+# test_must_not_be_disjoint <pack-$XYZ.pack>
+#
+# Ensures that the given pack is not marked as disjoint.
+test_must_not_be_disjoint () {
+	test_disjoint_1 "$1" "no"
+}
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index fd24e0c952..02cfddf151 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -3,6 +3,7 @@ 
 test_description='multi-pack-indexes'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-chunk.sh
+. "$TEST_DIRECTORY"/lib-disjoint.sh
 
 GIT_TEST_MULTI_PACK_INDEX=0
 objdir=.git/objects
@@ -1215,4 +1216,85 @@  test_expect_success 'non-disjoint packs are detected' '
 	)
 '
 
+test_expect_success 'retain disjoint packs while writing' '
+	test_when_finished "rm -fr repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		for i in 1 2
+		do
+			test_commit "$i" && git repack -d || return 1
+		done &&
+
+		find $objdir/pack -type f -name "pack-*.idx" |
+		sed -e "s/^.*\/\(.*\)/\1/g" | sort >packs.old &&
+
+		test_line_count = 2 packs.old &&
+		disjoint="$(head -n 1 packs.old)" &&
+		non_disjoint="$(tail -n 1 packs.old)" &&
+
+		cat >in <<-EOF &&
+		+$disjoint
+		$non_disjoint
+		EOF
+		git multi-pack-index write --stdin-packs --bitmap <in &&
+
+		test_must_be_disjoint "${disjoint%.idx}.pack" &&
+		test_must_not_be_disjoint "${non_disjoint%.idx}.pack" &&
+
+		test_commit 3 &&
+		git repack -d &&
+
+		find $objdir/pack -type f -name "pack-*.idx" |
+		sed -e "s/^.*\/\(.*\)/\1/g" | sort >packs.new &&
+
+		new_disjoint="$(comm -13 packs.old packs.new)" &&
+		cat >in <<-EOF &&
+		$disjoint
+		$non_disjoint
+		+$new_disjoint
+		EOF
+		git multi-pack-index write --stdin-packs --bitmap \
+			--retain-disjoint <in &&
+
+		test_must_be_disjoint "${disjoint%.idx}.pack" &&
+		test_must_be_disjoint "${new_disjoint%.idx}.pack" &&
+		test_must_not_be_disjoint "${non_disjoint%.idx}.pack"
+
+	)
+'
+
+test_expect_success 'non-disjoint packs are detected via --retain-disjoint' '
+	test_when_finished "rm -fr repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		packdir=.git/objects/pack &&
+
+		test_commit base &&
+		base="$(echo base | git pack-objects --revs $packdir/pack)" &&
+
+		cat >in <<-EOF &&
+		+pack-$base.idx
+		EOF
+		git multi-pack-index write --stdin-packs --bitmap <in &&
+
+		test_must_be_disjoint "pack-$base.pack" &&
+
+		test_commit other &&
+		other="$(echo other | git pack-objects --revs $packdir/pack)" &&
+
+		cat >in <<-EOF &&
+		pack-$base.idx
+		+pack-$other.idx
+		EOF
+		test_must_fail git multi-pack-index write --stdin-packs --retain-disjoint --bitmap <in 2>err &&
+		grep "duplicate object.* among disjoint packs" err &&
+
+		test_must_fail git multi-pack-index write --retain-disjoint --bitmap 2>err &&
+		grep "duplicate object.* among disjoint packs" err
+	)
+'
+
 test_done