diff mbox series

[09/24] repack: implement `--extend-disjoint` mode

Message ID b75869befba26899d88d6c6d413cc756aeadbd80.1701198172.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-objects: multi-pack verbatim reuse | expand

Commit Message

Taylor Blau Nov. 28, 2023, 7:08 p.m. UTC
Now that we can generate packs which are disjoint with respect to the
set of currently-disjoint packs, implement a mode of `git repack` which
extends the set of disjoint packs with any new (non-cruft) pack(s)
generated during the repack.

The idea is mostly straightforward, with a couple of gotcha's. The
straightforward part is to make sure that any new packs are disjoint
with respect to the set of currently disjoint packs which are _not_
being removed from the repository as a result of the repack.

If a pack which is currently marked as disjoint is, on the other hand,
about to be removed from the repository, it is OK (and expected) that
new pack(s) will contain some or all of its objects. Since the pack
originally marked as disjoint will be removed, it will necessarily leave
the disjoint set, making room for new packs with its same objects to
take its place. In other words, the resulting set of disjoint packs will
be disjoint with respect to one another.

The gotchas mostly have to do with making sure that we do not generate a
disjoint pack in the following scenarios:

  - promisor packs
  - cruft packs (which may necessarily need to include an object from a
    disjoint pack in order to freshen it in certain circumstances)
  - all-into-one repacks without '-d'
  - `--filter-to`, which conceptually could work with the new
    `--extend-disjoint` option, but only in limited circumstances

Otherwise, we mark which packs were created as disjoint by using a new
bit in the `generated_pack_data` struct, and then marking those pack(s)
as disjoint accordingly when generating the MIDX. Non-deleted packs
which are marked as disjoint are retained as such by passing the
equivalent of `--retain-disjoint` when calling the MIDX API to update
the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-repack.txt      |  12 +++
 builtin/repack.c                  |  57 +++++++++---
 t/t7700-repack.sh                 |   4 +-
 t/t7705-repack-extend-disjoint.sh | 142 ++++++++++++++++++++++++++++++
 4 files changed, 203 insertions(+), 12 deletions(-)
 create mode 100755 t/t7705-repack-extend-disjoint.sh

Comments

Patrick Steinhardt Dec. 7, 2023, 1:13 p.m. UTC | #1
On Tue, Nov 28, 2023 at 02:08:18PM -0500, Taylor Blau wrote:
> Now that we can generate packs which are disjoint with respect to the
> set of currently-disjoint packs, implement a mode of `git repack` which
> extends the set of disjoint packs with any new (non-cruft) pack(s)
> generated during the repack.
> 
> The idea is mostly straightforward, with a couple of gotcha's. The
> straightforward part is to make sure that any new packs are disjoint
> with respect to the set of currently disjoint packs which are _not_
> being removed from the repository as a result of the repack.
> 
> If a pack which is currently marked as disjoint is, on the other hand,
> about to be removed from the repository, it is OK (and expected) that
> new pack(s) will contain some or all of its objects. Since the pack
> originally marked as disjoint will be removed, it will necessarily leave
> the disjoint set, making room for new packs with its same objects to
> take its place. In other words, the resulting set of disjoint packs will
> be disjoint with respect to one another.
> 
> The gotchas mostly have to do with making sure that we do not generate a
> disjoint pack in the following scenarios:

Okay, let me verify whether I understand the reasons:

>   - promisor packs

Which is because promisor packs actually don't contain any objects?

>   - cruft packs (which may necessarily need to include an object from a
>     disjoint pack in order to freshen it in certain circumstances)

This one took me a while to figure out. If we'd mark crufts as disjoint,
then it would mean that new packfiles cannot be marked as disjoint if
objects which were previously unreachable do become reachable again.
So we'd be pessimizing packfiles for live objects in favor of others
which aren't.

>   - all-into-one repacks without '-d'

Because here the old packfiles that this would make redundant aren't
deleted and thus the objects are duplicate now.

>   - `--filter-to`, which conceptually could work with the new
>     `--extend-disjoint` option, but only in limited circumstances

We're probably also not properly set up to handle the new alternate
object directory and exclude objects that are part of a potentially
disjoint packfile that exists already. Also, the current MIDX may not
even cover the alternate.

> Otherwise, we mark which packs were created as disjoint by using a new
> bit in the `generated_pack_data` struct, and then marking those pack(s)
> as disjoint accordingly when generating the MIDX. Non-deleted packs
> which are marked as disjoint are retained as such by passing the
> equivalent of `--retain-disjoint` when calling the MIDX API to update
> the MIDX.

Okay. I had a bit of trouble to sift through the various different
flags like "--retain-disjoint", "--extend-disjoint", "--ignore-disjoint"
and so on. But well, they do different things and it's been a few days
since I've reviewed the preceding patches, so this is probably fine.

One thing I wondered: do we need to consider the `-l` flag? When using
an alternate object directory it is totally feasible that the alternate
may be creating new disjoint packages without us knowing, and thus we
may not be able to guarantee the disjoint property anymore.

Patrick
Taylor Blau Dec. 7, 2023, 8:28 p.m. UTC | #2
On Thu, Dec 07, 2023 at 02:13:08PM +0100, Patrick Steinhardt wrote:
> > The gotchas mostly have to do with making sure that we do not generate a
> > disjoint pack in the following scenarios:
>
> Okay, let me verify whether I understand the reasons:
>
> >   - promisor packs
>
> Which is because promisor packs actually don't contain any objects?

Right.

> >   - cruft packs (which may necessarily need to include an object from a
> >     disjoint pack in order to freshen it in certain circumstances)
>
> This one took me a while to figure out. If we'd mark crufts as disjoint,
> then it would mean that new packfiles cannot be marked as disjoint if
> objects which were previously unreachable do become reachable again.
> So we'd be pessimizing packfiles for live objects in favor of others
> which aren't.

Yeah, that's right, too. There are a couple of cases where more than one
cruft pack may contain the same object, one of them being the
flip-flopping between reachable and unreachable as you suggest above.
Another is that you have a non-prunable unreachable object which is
already in a cruft pack. If the object's mtime gets updated (and still
cannot be pruned), we'll end up freshening the object loose, and then
packing it again (with the more recent mtime) into a new cruft pack.

That aside, I actually think that there are ways to mark cruft packs
disjoint. But they're complicated, and moreover, I don't think you'd
ever *want* to mark a cruft pack as disjoint. Cruft packs usually
contain garbage, which is unlikely to be useful to any fetches/clones.

If we did mark them as disjoint, it would mean that we could reuse
verbatim sections of the cruft pack in our output, but we would likely
end up with very few such sections.

> >   - all-into-one repacks without '-d'
>
> Because here the old packfiles that this would make redundant aren't
> deleted and thus the objects are duplicate now.

Yep.

> > Otherwise, we mark which packs were created as disjoint by using a new
> > bit in the `generated_pack_data` struct, and then marking those pack(s)
> > as disjoint accordingly when generating the MIDX. Non-deleted packs
> > which are marked as disjoint are retained as such by passing the
> > equivalent of `--retain-disjoint` when calling the MIDX API to update
> > the MIDX.
>
> Okay. I had a bit of trouble to sift through the various different
> flags like "--retain-disjoint", "--extend-disjoint", "--ignore-disjoint"
> and so on. But well, they do different things and it's been a few days
> since I've reviewed the preceding patches, so this is probably fine.

Yeah, I am definitely open to better naming conventions here? I figured
that:

  - --retain-disjoint was a good name for the MIDX option, since it is
    retaining existing disjoint packs in the new MIDX
  - --extend-disjoint was a good name for the repack option, since it is
    extending the set of disjoint packs
  - --ignore-disjoint was a good name for the pack-objects option, since
    it is ignoring objects in disjoint packs

Writing this out, I think that you could make an argument that
`--exclude-disjoint` is a better name for the last option. So I'm
definitely open to suggestions here, but I don't want to get too bogged
down on command-line option naming (so long as we're all reasonably
happy with the result).

> One thing I wondered: do we need to consider the `-l` flag? When using
> an alternate object directory it is totally feasible that the alternate
> may be creating new disjoint packages without us knowing, and thus we
> may not be able to guarantee the disjoint property anymore.

I don't think so. We'd only care about one direction of this (that
alternates do not create disjoint packs which overlap with ours, instead
of the other way around), but since we don't put non-local packs in the
MIDX, I think we're OK.

I suppose that you might run into trouble if you use the chained MIDX
thing (via its `->next` pointer). I haven't used that feature myself, so
I'd have to play around with it.

Thanks,
Taylor
Patrick Steinhardt Dec. 8, 2023, 8:19 a.m. UTC | #3
On Thu, Dec 07, 2023 at 03:28:18PM -0500, Taylor Blau wrote:
> On Thu, Dec 07, 2023 at 02:13:08PM +0100, Patrick Steinhardt wrote:
> > >   - cruft packs (which may necessarily need to include an object from a
> > >     disjoint pack in order to freshen it in certain circumstances)
> >
> > This one took me a while to figure out. If we'd mark crufts as disjoint,
> > then it would mean that new packfiles cannot be marked as disjoint if
> > objects which were previously unreachable do become reachable again.
> > So we'd be pessimizing packfiles for live objects in favor of others
> > which aren't.
> 
> Yeah, that's right, too. There are a couple of cases where more than one
> cruft pack may contain the same object, one of them being the
> flip-flopping between reachable and unreachable as you suggest above.
> Another is that you have a non-prunable unreachable object which is
> already in a cruft pack. If the object's mtime gets updated (and still
> cannot be pruned), we'll end up freshening the object loose, and then
> packing it again (with the more recent mtime) into a new cruft pack.
> 
> That aside, I actually think that there are ways to mark cruft packs
> disjoint. But they're complicated, and moreover, I don't think you'd
> ever *want* to mark a cruft pack as disjoint. Cruft packs usually
> contain garbage, which is unlikely to be useful to any fetches/clones.
> 
> If we did mark them as disjoint, it would mean that we could reuse
> verbatim sections of the cruft pack in our output, but we would likely
> end up with very few such sections.

Makes sense. It also doesn't feel worth it to introduce additional
complexity for objects that for most of the part wouldn't ever be served
on a fetch anyway.

[snip]
> > Okay. I had a bit of trouble to sift through the various different
> > flags like "--retain-disjoint", "--extend-disjoint", "--ignore-disjoint"
> > and so on. But well, they do different things and it's been a few days
> > since I've reviewed the preceding patches, so this is probably fine.
> 
> Yeah, I am definitely open to better naming conventions here? I figured
> that:
> 
>   - --retain-disjoint was a good name for the MIDX option, since it is
>     retaining existing disjoint packs in the new MIDX
>   - --extend-disjoint was a good name for the repack option, since it is
>     extending the set of disjoint packs
>   - --ignore-disjoint was a good name for the pack-objects option, since
>     it is ignoring objects in disjoint packs
> 
> Writing this out, I think that you could make an argument that
> `--exclude-disjoint` is a better name for the last option. So I'm
> definitely open to suggestions here, but I don't want to get too bogged
> down on command-line option naming (so long as we're all reasonably
> happy with the result).

Yeah, as said, I don't mind it too much. It's a complex area and the
flags all do different things, so it's expected that you may have to do
some research on what exactly they do. That being said, I do like your
proposed `--exclude-disjoint` a lot more than `--ignore-disjoint`.

> > One thing I wondered: do we need to consider the `-l` flag? When using
> > an alternate object directory it is totally feasible that the alternate
> > may be creating new disjoint packages without us knowing, and thus we
> > may not be able to guarantee the disjoint property anymore.
> 
> I don't think so. We'd only care about one direction of this (that
> alternates do not create disjoint packs which overlap with ours, instead
> of the other way around), but since we don't put non-local packs in the
> MIDX, I think we're OK.
> 
> I suppose that you might run into trouble if you use the chained MIDX
> thing (via its `->next` pointer). I haven't used that feature myself, so
> I'd have to play around with it.

We do use this feature at GitLab for forks, where forks connect to a
common alternate object directory to deduplicate objects. As both the
fork repository and the alternate object directory use an MIDX I think
they would be set up exactly like that.

I guess the only really viable solution here is to ignore disjoint packs
in the main repo that connects to the alternate in the case where the
alternate has any disjoint packs itself.

Patrick
Taylor Blau Dec. 8, 2023, 10:48 p.m. UTC | #4
On Fri, Dec 08, 2023 at 09:19:25AM +0100, Patrick Steinhardt wrote:
> > Writing this out, I think that you could make an argument that
> > `--exclude-disjoint` is a better name for the last option. So I'm
> > definitely open to suggestions here, but I don't want to get too bogged
> > down on command-line option naming (so long as we're all reasonably
> > happy with the result).
>
> Yeah, as said, I don't mind it too much. It's a complex area and the
> flags all do different things, so it's expected that you may have to do
> some research on what exactly they do. That being said, I do like your
> proposed `--exclude-disjoint` a lot more than `--ignore-disjoint`.

I think that's fair, I renamed the option to be "--exclude-disjoint"
instead of "--ignore-disjoint" for any subsequent round(s) of this
series.

> > > One thing I wondered: do we need to consider the `-l` flag? When using
> > > an alternate object directory it is totally feasible that the alternate
> > > may be creating new disjoint packages without us knowing, and thus we
> > > may not be able to guarantee the disjoint property anymore.
> >
> > I don't think so. We'd only care about one direction of this (that
> > alternates do not create disjoint packs which overlap with ours, instead
> > of the other way around), but since we don't put non-local packs in the
> > MIDX, I think we're OK.
> >
> > I suppose that you might run into trouble if you use the chained MIDX
> > thing (via its `->next` pointer). I haven't used that feature myself, so
> > I'd have to play around with it.
>
> We do use this feature at GitLab for forks, where forks connect to a
> common alternate object directory to deduplicate objects. As both the
> fork repository and the alternate object directory use an MIDX I think
> they would be set up exactly like that.

Yep, that's right. I wasn't sure whether or not this feature had been
used extensively in production or not (we don't use it at GitHub, since
objects only live in their fork repositories for a short while before
moving to the fork network repository).

> I guess the only really viable solution here is to ignore disjoint packs
> in the main repo that connects to the alternate in the case where the
> alternate has any disjoint packs itself.

I think the behavior you'd get here is that we'd only look for disjoint
packs in the first MIDX in the chain (which is always the local one),
and we'd only recognizes packs from that MIDX as being potentially
disjoint.

If you have the bulk of your repositories in the alternate, then I think
you might want to consider how we combine the two. My sense is that
you'd want to be disjoint with respect to anything downstream of you.

Whether or not this is a feature that you/others need, I definitely
think we should leave it out of this series, since I am (a) fairly
certain that this is possible to do, and (b) already feel like this
series on its own is complicated enough.

Thanks,
Taylor
Patrick Steinhardt Dec. 11, 2023, 8:18 a.m. UTC | #5
On Fri, Dec 08, 2023 at 05:48:28PM -0500, Taylor Blau wrote:
> On Fri, Dec 08, 2023 at 09:19:25AM +0100, Patrick Steinhardt wrote:
> > > > One thing I wondered: do we need to consider the `-l` flag? When using
> > > > an alternate object directory it is totally feasible that the alternate
> > > > may be creating new disjoint packages without us knowing, and thus we
> > > > may not be able to guarantee the disjoint property anymore.
> > >
> > > I don't think so. We'd only care about one direction of this (that
> > > alternates do not create disjoint packs which overlap with ours, instead
> > > of the other way around), but since we don't put non-local packs in the
> > > MIDX, I think we're OK.
> > >
> > > I suppose that you might run into trouble if you use the chained MIDX
> > > thing (via its `->next` pointer). I haven't used that feature myself, so
> > > I'd have to play around with it.
> >
> > We do use this feature at GitLab for forks, where forks connect to a
> > common alternate object directory to deduplicate objects. As both the
> > fork repository and the alternate object directory use an MIDX I think
> > they would be set up exactly like that.
> 
> Yep, that's right. I wasn't sure whether or not this feature had been
> used extensively in production or not (we don't use it at GitHub, since
> objects only live in their fork repositories for a short while before
> moving to the fork network repository).
> 
> > I guess the only really viable solution here is to ignore disjoint packs
> > in the main repo that connects to the alternate in the case where the
> > alternate has any disjoint packs itself.
> 
> I think the behavior you'd get here is that we'd only look for disjoint
> packs in the first MIDX in the chain (which is always the local one),
> and we'd only recognizes packs from that MIDX as being potentially
> disjoint.
> 
> If you have the bulk of your repositories in the alternate, then I think
> you might want to consider how we combine the two. 

Yes, in the general case the bulk of objects is indeed contained in the
alternate.

> My sense is that you'd want to be disjoint with respect to anything
> downstream of you.

Ideally yes, but this is unfortunately not easily achievable in the
general case. It's one of the many painpoints that alternates bring with
them.

Suppose two forks A and B are connected to the same alternate. Both A
and B now gain the same set of objects via whatever means. At this point
these objects can be stored in disjoint packs in each of the repos as
they are not yet deduplicated via the alternate. But if you were to pull
objects from either A or B into the alternate then you cannot ensure
disjointedness at all anymore because you would first have to repack
objects in both A and B.

For two forks this might still seem manageable. But as soon as your fork
network grows larger it's clear that this becomes almost impossible to
do. So ultimately, I don't see an alternative to ignoring disjointedness
bits in either of the two linked-together repos.

> Whether or not this is a feature that you/others need, I definitely
> think we should leave it out of this series, since I am (a) fairly
> certain that this is possible to do, and (b) already feel like this
> series on its own is complicated enough.

I'm perfectly fine if we say that the benefits of your patch series
cannot yet be applied to repositories with alternates. But from my point
of view it's a requirement that this patch series does not silently
break this usecase due to Git starting to generate disjointed packs
where it cannot ensure that the disjointedness property holds.

As I haven't yet read through this whole patch series, the question is
boils down to whether the end result is opt-in or opt-out. If it was
opt-out then I could see the above usecase breaking silently. If it was
opt-in then things should be fine and we can address this ommission in a
follow up patch series. We at GitLab would definitely be interested in
helping out with this given that it directly affects us and that the
demonstrated savings seem very promising.

Patrick
Taylor Blau Dec. 11, 2023, 7:59 p.m. UTC | #6
On Mon, Dec 11, 2023 at 09:18:32AM +0100, Patrick Steinhardt wrote:
> > If you have the bulk of your repositories in the alternate, then I think
> > you might want to consider how we combine the two.
>
> Yes, in the general case the bulk of objects is indeed contained in the
> alternate.

I thought about this for a while this morning, and think that while it
is possible, I'm not sure I can think of a compelling use-case where
you'd want to reuse objects from packs across an alternate boundary.

On the "I think it's possible front":

The challenge is making sure that the set of disjoint packs among each
object store is globally disjoint in one direction along the alternate
path. I think the rule would require you to honor the disjointed-ness of
any packs in alternate(s) you might have when constructing new disjoint
packs.

So if repository fork.git is an alternate of network.git (and both had
long-lived MIDXs), network.git is free to make any set of disjoint packs
it chooses, and fork.git can only create disjoint packs which are
disjoint with respect to (a) the other disjoint packs in fork.git (if
any), and (b) the disjoint packs in network.git (and recursively for any
repositories that network.git is an alternate of in the general case).

Now on the "I can't think of a compelling use-case front":

I think the only reason you'd want to be able to reuse objects from
MIDXs across the alternates boundary is if you have MIDX bitmaps in both
the repository and its alternate. Indeed, the only time that we kick in
pack-reuse in general is when we have a bitmap, so in order to reuse
objects from both the repo and its alternate, you'd have to have a
bitmap in both repositories.

But having a MIDX bitmap means that any packs in the MIDX for which
you're generating a bitmap have to be closed over object reachability.
So unless the repository and its alternate have totally distinct lines
of history (in which case, I'm not sure you would want to share objects
between the two in the first place), any pack you bitmap in the child
repository fundamentally couldn't be disjoint with respect to its
parent.

This is because if it were to be disjoint, it would have to be repacked
with '-l' (or some equivalent '-l'-like flag that only ignores non-local
packs which are marked as disjoint). But if you exclude those objects
and any one (or more) of them is reachable from some object(s) you
didn't exclude, you wouldn't be able to generate a bitmap in the first
place.

It's very possible that there's something about your setup that I'm not
fully grokking, but I don't think in general this is something that we'd
want to do (even if it is theoretically possible).

> > Whether or not this is a feature that you/others need, I definitely
> > think we should leave it out of this series, since I am (a) fairly
> > certain that this is possible to do, and (b) already feel like this
> > series on its own is complicated enough.
>
> I'm perfectly fine if we say that the benefits of your patch series
> cannot yet be applied to repositories with alternates. But from my point
> of view it's a requirement that this patch series does not silently
> break this usecase due to Git starting to generate disjointed packs
> where it cannot ensure that the disjointedness property holds.

I think one thing you could reasonably do is use *only* the non-local
MIDX bitmaps when doing pack reuse.

Currently we'll use the first MIDX we find, which is guaranteed to be
the local one, if it exists. This was the case before this series, and
this series does not change that behavior. Unless you had a pack bitmap
in the alternated repository (which I think is unlikely, since it would
require a full reachability closure, thus eliminating any de-duplication
benefits you'd otherwise get when using alternates), you'd be find
before and after this series.

> As I haven't yet read through this whole patch series, the question is
> boils down to whether the end result is opt-in or opt-out. If it was
> opt-out then I could see the above usecase breaking silently. If it was
> opt-in then things should be fine and we can address this ommission in a
> follow up patch series. We at GitLab would definitely be interested in
> helping out with this given that it directly affects us and that the
> demonstrated savings seem very promising.

The end result is opt-in, you have to change the `pack.allowPackReuse`
configuration from its default value of "true" (or the alternate
spelling taught in this series, "single") to "multi" in order to enable
the new behavior.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index c902512a9e..50ba5e7f9c 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -249,6 +249,18 @@  linkgit:git-multi-pack-index[1]).
 	Write a multi-pack index (see linkgit:git-multi-pack-index[1])
 	containing the non-redundant packs.
 
+--extend-disjoint::
+	Extends the set of disjoint packs. All new non-cruft pack(s)
+	generated are constructed to be disjoint with respect to the set
+	of currently disjoint packs, excluding any packs that will be
+	removed as a result of the repack operation. For more on
+	disjoint packs, see the details in linkgit:gitformat-pack[5],
+	under the section "`DISP` chunk and disjoint packs".
++
+Useful only with the combination of `--write-midx` and
+`--write-bitmap-index`. Incompatible with `--filter-to`. Incompatible
+with `-A`, `-a`, or `--cruft` unless `-d` is given.
+
 CONFIGURATION
 -------------
 
diff --git a/builtin/repack.c b/builtin/repack.c
index edaee4dbec..0601bd16c4 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -58,6 +58,7 @@  struct pack_objects_args {
 	int no_reuse_object;
 	int quiet;
 	int local;
+	int ignore_disjoint;
 	struct list_objects_filter_options filter_options;
 };
 
@@ -293,6 +294,8 @@  static void prepare_pack_objects(struct child_process *cmd,
 		strvec_push(&cmd->args,  "--local");
 	if (args->quiet)
 		strvec_push(&cmd->args,  "--quiet");
+	if (args->ignore_disjoint)
+		strvec_push(&cmd->args,  "--ignore-disjoint");
 	if (delta_base_offset)
 		strvec_push(&cmd->args,  "--delta-base-offset");
 	strvec_push(&cmd->args, out);
@@ -334,9 +337,11 @@  static struct {
 
 struct generated_pack_data {
 	struct tempfile *tempfiles[ARRAY_SIZE(exts)];
+	unsigned disjoint : 1;
 };
 
-static struct generated_pack_data *populate_pack_exts(const char *name)
+static struct generated_pack_data *populate_pack_exts(const char *name,
+						      unsigned disjoint)
 {
 	struct stat statbuf;
 	struct strbuf path = STRBUF_INIT;
@@ -353,6 +358,8 @@  static struct generated_pack_data *populate_pack_exts(const char *name)
 		data->tempfiles[i] = register_tempfile(path.buf);
 	}
 
+	data->disjoint = disjoint;
+
 	strbuf_release(&path);
 	return data;
 }
@@ -379,6 +386,8 @@  static void repack_promisor_objects(const struct pack_objects_args *args,
 	prepare_pack_objects(&cmd, args, packtmp);
 	cmd.in = -1;
 
+	strvec_pushf(&cmd.args, "--no-ignore-disjoint");
+
 	/*
 	 * NEEDSWORK: Giving pack-objects only the OIDs without any ordering
 	 * hints may result in suboptimal deltas in the resulting pack. See if
@@ -421,7 +430,7 @@  static void repack_promisor_objects(const struct pack_objects_args *args,
 					  line.buf);
 		write_promisor_file(promisor_name, NULL, 0);
 
-		item->util = populate_pack_exts(item->string);
+		item->util = populate_pack_exts(item->string, 0);
 
 		free(promisor_name);
 	}
@@ -731,8 +740,13 @@  static void midx_included_packs(struct string_list *include,
 
 	for_each_string_list_item(item, &existing->kept_packs)
 		string_list_insert(include, xstrfmt("%s.idx", item->string));
-	for_each_string_list_item(item, names)
-		string_list_insert(include, xstrfmt("pack-%s.idx", item->string));
+	for_each_string_list_item(item, names) {
+		const char *marker = "";
+		struct generated_pack_data *data = item->util;
+		if (data->disjoint)
+			marker = "+";
+		string_list_insert(include, xstrfmt("%spack-%s.idx", marker, item->string));
+	}
 	if (geometry->split_factor) {
 		struct strbuf buf = STRBUF_INIT;
 		uint32_t i;
@@ -788,7 +802,8 @@  static int write_midx_included_packs(struct string_list *include,
 				     struct pack_geometry *geometry,
 				     struct string_list *names,
 				     const char *refs_snapshot,
-				     int show_progress, int write_bitmaps)
+				     int show_progress, int write_bitmaps,
+				     int exclude_disjoint)
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
@@ -852,6 +867,9 @@  static int write_midx_included_packs(struct string_list *include,
 	if (refs_snapshot)
 		strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot);
 
+	if (exclude_disjoint)
+		strvec_push(&cmd.args, "--retain-disjoint");
+
 	ret = start_command(&cmd);
 	if (ret)
 		return ret;
@@ -895,7 +913,7 @@  static void remove_redundant_bitmaps(struct string_list *include,
 
 static int finish_pack_objects_cmd(struct child_process *cmd,
 				   struct string_list *names,
-				   int local)
+				   int local, int disjoint)
 {
 	FILE *out;
 	struct strbuf line = STRBUF_INIT;
@@ -913,7 +931,7 @@  static int finish_pack_objects_cmd(struct child_process *cmd,
 		 */
 		if (local) {
 			item = string_list_append(names, line.buf);
-			item->util = populate_pack_exts(line.buf);
+			item->util = populate_pack_exts(line.buf, disjoint);
 		}
 	}
 	fclose(out);
@@ -970,7 +988,7 @@  static int write_filtered_pack(const struct pack_objects_args *args,
 		fprintf(in, "%s%s.pack\n", caret, item->string);
 	fclose(in);
 
-	return finish_pack_objects_cmd(&cmd, names, local);
+	return finish_pack_objects_cmd(&cmd, names, local, 0);
 }
 
 static int existing_cruft_pack_cmp(const void *va, const void *vb)
@@ -1098,7 +1116,7 @@  static int write_cruft_pack(const struct pack_objects_args *args,
 		fprintf(in, "%s.pack\n", item->string);
 	fclose(in);
 
-	return finish_pack_objects_cmd(&cmd, names, local);
+	return finish_pack_objects_cmd(&cmd, names, local, 0);
 }
 
 static const char *find_pack_prefix(const char *packdir, const char *packtmp)
@@ -1190,6 +1208,8 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 			   N_("pack prefix to store a pack containing pruned objects")),
 		OPT_STRING(0, "filter-to", &filter_to, N_("dir"),
 			   N_("pack prefix to store a pack containing filtered out objects")),
+		OPT_BOOL(0, "extend-disjoint", &po_args.ignore_disjoint,
+			 N_("add new packs to the set of disjoint ones")),
 		OPT_END()
 	};
 
@@ -1255,6 +1275,16 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 		strbuf_release(&path);
 	}
 
+	if (po_args.ignore_disjoint) {
+		if (filter_to)
+			die(_("options '%s' and '%s' cannot be used together"),
+			    "--filter-to", "--extend-disjoint");
+		if (pack_everything && !delete_redundant)
+			die(_("cannot use '--extend-disjoint' with '%s' but not '-d'"),
+			    pack_everything & LOOSEN_UNREACHABLE ? "-A" :
+			    pack_everything & PACK_CRUFT ? "--cruft" : "-a");
+	}
+
 	packdir = mkpathdup("%s/pack", get_object_directory());
 	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
 	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
@@ -1308,6 +1338,9 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (pack_everything & ALL_INTO_ONE) {
 		repack_promisor_objects(&po_args, &names);
 
+		if (delete_redundant)
+			strvec_pushf(&cmd.args, "--no-ignore-disjoint");
+
 		if (has_existing_non_kept_packs(&existing) &&
 		    delete_redundant &&
 		    !(pack_everything & PACK_CRUFT)) {
@@ -1364,7 +1397,7 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 		fclose(in);
 	}
 
-	ret = finish_pack_objects_cmd(&cmd, &names, 1);
+	ret = finish_pack_objects_cmd(&cmd, &names, 1, po_args.ignore_disjoint);
 	if (ret)
 		goto cleanup;
 
@@ -1387,6 +1420,7 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 
 		cruft_po_args.local = po_args.local;
 		cruft_po_args.quiet = po_args.quiet;
+		cruft_po_args.ignore_disjoint = 0;
 
 		ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix,
 				       cruft_expiration, &names,
@@ -1487,7 +1521,8 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 
 		ret = write_midx_included_packs(&include, &geometry, &names,
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
-						show_progress, write_bitmaps > 0);
+						show_progress, write_bitmaps > 0,
+						po_args.ignore_disjoint);
 
 		if (!ret && write_bitmaps)
 			remove_redundant_bitmaps(&include, packdir);
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index d2975e6c93..277f1ff1d7 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -6,6 +6,7 @@  test_description='git repack works correctly'
 . "${TEST_DIRECTORY}/lib-bitmap.sh"
 . "${TEST_DIRECTORY}/lib-midx.sh"
 . "${TEST_DIRECTORY}/lib-terminal.sh"
+. "${TEST_DIRECTORY}/lib-disjoint.sh"
 
 commit_and_pack () {
 	test_commit "$@" 1>&2 &&
@@ -525,7 +526,8 @@  test_expect_success '--filter works with --max-pack-size' '
 '
 
 objdir=.git/objects
-midx=$objdir/pack/multi-pack-index
+packdir=$objdir/pack
+midx=$packdir/multi-pack-index
 
 test_expect_success 'setup for --write-midx tests' '
 	git init midx &&
diff --git a/t/t7705-repack-extend-disjoint.sh b/t/t7705-repack-extend-disjoint.sh
new file mode 100755
index 0000000000..0c8be1cb3f
--- /dev/null
+++ b/t/t7705-repack-extend-disjoint.sh
@@ -0,0 +1,142 @@ 
+#!/bin/sh
+
+test_description='git repack --extend-disjoint works correctly'
+
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-disjoint.sh"
+
+packdir=.git/objects/pack
+
+GIT_TEST_MULTI=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+test_expect_success 'repack --extend-disjoint creates new disjoint packs' '
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit B &&
+
+		A="$(echo A | git pack-objects --revs $packdir/pack)" &&
+		B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed &&
+
+		cat >in <<-EOF &&
+		pack-$A.idx
+		+pack-$B.idx
+		EOF
+		git multi-pack-index write --bitmap --stdin-packs <in &&
+
+		test_must_not_be_disjoint "pack-$A.pack" &&
+		test_must_be_disjoint "pack-$B.pack" &&
+
+		test_commit C &&
+
+		find $packdir -type f -name "*.idx" | sort >packs.before &&
+		git repack --write-midx --write-bitmap-index --extend-disjoint &&
+		find $packdir -type f -name "*.idx" | sort >packs.after &&
+
+		comm -13 packs.before packs.after >packs.new &&
+
+		test_line_count = 1 packs.new &&
+
+		test_must_not_be_disjoint "pack-$A.pack" &&
+		test_must_be_disjoint "pack-$B.pack" &&
+		test_must_be_disjoint "$(basename $(cat packs.new) .idx).pack"
+	)
+'
+
+test_expect_success 'repack --extend-disjoint combines existing disjoint packs' '
+	(
+		cd repo &&
+
+		test_commit D &&
+
+		git repack -a -d --write-midx --write-bitmap-index --extend-disjoint &&
+
+		find $packdir -type f -name "*.pack" >packs &&
+		test_line_count = 1 packs &&
+
+		test_must_be_disjoint "$(basename $(cat packs))"
+
+	)
+'
+
+test_expect_success 'repack --extend-disjoint with --geometric' '
+	git init disjoint-geometric &&
+	(
+		cd disjoint-geometric &&
+
+		test_commit_bulk 8 &&
+		base="$(basename $(ls $packdir/pack-*.idx))" &&
+		echo "+$base" >>in &&
+
+		test_commit A &&
+		A="$(echo HEAD^.. | git pack-objects --revs $packdir/pack)" &&
+		test_commit B &&
+		B="$(echo HEAD^.. | git pack-objects --revs $packdir/pack)" &&
+
+		git prune-packed &&
+
+		cat >>in <<-EOF &&
+		+pack-$A.idx
+		+pack-$B.idx
+		EOF
+		git multi-pack-index write --bitmap --stdin-packs <in &&
+
+		test_must_be_disjoint "pack-$A.pack" &&
+		test_must_be_disjoint "pack-$B.pack" &&
+		test_must_be_disjoint "${base%.idx}.pack" &&
+
+		test_commit C &&
+
+		find $packdir -type f -name "*.pack" | sort >packs.before &&
+		git repack --geometric=2 -d --write-midx --write-bitmap-index --extend-disjoint &&
+		find $packdir -type f -name "*.pack" | sort >packs.after &&
+
+		comm -12 packs.before packs.after >packs.unchanged &&
+		comm -23 packs.before packs.after >packs.removed &&
+		comm -13 packs.before packs.after >packs.new &&
+
+		cat >expect <<-EOF &&
+		$packdir/${base%.idx}.pack
+		EOF
+		test_cmp expect packs.unchanged &&
+
+		sort >expect <<-EOF &&
+		$packdir/pack-$A.pack
+		$packdir/pack-$B.pack
+		EOF
+		test_cmp expect packs.removed &&
+
+		test_line_count = 1 packs.new &&
+
+		test_must_be_disjoint "$(basename $(cat packs.new))" &&
+		test_must_be_disjoint "${base%.idx}.pack"
+	)
+'
+
+for flag in "-A" "-a" "--cruft"
+do
+	test_expect_success "repack --extend-disjoint incompatible with $flag without -d" '
+		test_must_fail git repack $flag --extend-disjoint \
+			--write-midx --write-bitmap-index 2>actual &&
+		cat >expect <<-EOF &&
+		fatal: cannot use $SQ--extend-disjoint$SQ with $SQ$flag$SQ but not $SQ-d$SQ
+		EOF
+		test_cmp expect actual
+	'
+done
+
+test_expect_success 'repack --extend-disjoint is incompatible with --filter-to' '
+	test_must_fail git repack --extend-disjoint --filter-to=dir 2>actual &&
+
+	cat >expect <<-EOF &&
+	fatal: options $SQ--filter-to$SQ and $SQ--extend-disjoint$SQ cannot be used together
+	EOF
+	test_cmp expect actual
+'
+
+test_done