diff mbox series

[v2,8/8] repack: disable writing bitmaps when doing a local geometric repack

Message ID b74d0a037b07706d07fad7b438fa0cb211726575.1681294715.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series repack: fix geometric repacking with gitalternates | expand

Commit Message

Patrick Steinhardt April 12, 2023, 10:23 a.m. UTC
In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.

This is not always the case though. When writing multi-pack-indices in a
repository that is connected to an alternate object directory we may end
up writing a multi-pack-index that only has partial coverage of objects.
The end result is that writing the bitmap will fail:

    $ git multi-pack-index write --stdin-packs --bitmap <packfiles
    warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    error: could not write multi-pack bitmap

Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage. But we ain't really got enough
information there, and seeing that it is a low-level plumbing command it
does not feel like the right place to fix this.

We can easily fix it at a higher level in git-repack(1) though. When all
of the following conditions are true:

    - We are asked to write a multi-pack index with bitmaps.

    - We are asked to only include local objects via `-l`.

    - We are connected to an alternate repository that has packfiles.

Then we will override the user's ask and disable writing bitmaps with a
warning. This is similar to what we do in git-pack-objects(1), where we
also disable writing bitmaps in case we omit an object from the pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 20 ++++++++++++++++++++
 t/t7703-repack-geometric.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

Comments

Taylor Blau April 12, 2023, 10:01 p.m. UTC | #1
On Wed, Apr 12, 2023 at 12:23:01PM +0200, Patrick Steinhardt wrote:
> Now there are two different ways to fix this. The first one would be to
> amend git-multi-pack-index(1) to disable writing bitmaps when we notice
> that we don't have full object coverage. But we ain't really got enough
> information there, and seeing that it is a low-level plumbing command it
> does not feel like the right place to fix this.

I might actually advocate that we either fix this in both places, or fix
it at the lower level only. I think that we would still be able to
trigger this problem by invoking `git multi-pack-index write
--bitmap --stdin-packs` directly.

> ---
>  builtin/repack.c            | 20 ++++++++++++++++++++
>  t/t7703-repack-geometric.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index f57869f14a..07d92fdf87 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -881,6 +881,26 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	if (pack_kept_objects < 0)
>  		pack_kept_objects = write_bitmaps > 0 && !write_midx;
>
> +	if (write_midx && write_bitmaps && geometric_factor && po_args.local) {
> +		struct packed_git *p;
> +
> +		for (p = get_all_packs(the_repository); p; p = p->next) {
> +			if (p->pack_local)
> +				continue;
> +
> +			/*
> +			 * When asked to do a local repack, but we have
> +			 * packfiles that are inherited from an alternate, then
> +			 * we cannot guarantee that the multi-pack-index would
> +			 * have full coverage of all objects. We thus disable
> +			 * writing bitmaps in that case.
> +			 */
> +			warning(_("disabling bitmap writing, as some objects are not being packed"));
> +			write_bitmaps = 0;
> +			break;
> +		}
> +	}
> +

In terms of the higher-level fix here, though, I think that you could
reasonably assume that the alternate repository has at least one pack,
and that the combination of "write_midx && write_bitmaps &&
po.args_local" and "has any alternate(s)" is banned (or, at least, emits
the above warning and disables writing bitmaps).

But certainly ensuring that there are indeed packs in at least one of
the alternate(s) doesn't hurt either, so I don't mind this approach at
all.

One thing that I don't quite follow with this logic is why we need to
have geometric_factor set. You could (somewhat unreasonably) write a
MIDX containing a single pack (git repack -[A|a] --write-midx
--write-bitmap-index), or a MIDX containing just the new pack along with
all of the existing (local) packs, (git repack --write-midx
--write-bitmap-index).

So I think we'd want to drop the geometric_factor from the above
conditional. (And in the future, I think we typically refer to whether
or not the "geometry" pointer is NULL or not to indicate whether or not
we are doing a geometric repack, though the diff context doesn't give me
enough to know whether we have even attempted to set up that instance
yet, so this is fine, too).

Thanks,
Taylor
Patrick Steinhardt April 13, 2023, 9:54 a.m. UTC | #2
On Wed, Apr 12, 2023 at 06:01:06PM -0400, Taylor Blau wrote:
> On Wed, Apr 12, 2023 at 12:23:01PM +0200, Patrick Steinhardt wrote:
> > Now there are two different ways to fix this. The first one would be to
> > amend git-multi-pack-index(1) to disable writing bitmaps when we notice
> > that we don't have full object coverage. But we ain't really got enough
> > information there, and seeing that it is a low-level plumbing command it
> > does not feel like the right place to fix this.
> 
> I might actually advocate that we either fix this in both places, or fix
> it at the lower level only. I think that we would still be able to
> trigger this problem by invoking `git multi-pack-index write
> --bitmap --stdin-packs` directly.

The problem I see with implementing the fix is that we're just not in a
good position to judge whether we have full coverage of objects or not.
All we see is a set of packfiles, and those packfiles _could_ have full
coverage, but they may just as well not have full coverage. And whether
they do is not easy to figure out in git-multi-pack-index(1).

So in order to fix this we'd likely have to use heuristics, like whether
or not there are alternates or alternate packfiles. But unconditionally
disabling bitmaps when there are feels overly restrictive to me as it
would break perfectly-valid usecases.

I'm thus still not convinced we should implement it at the lowest level
possible. While it would be nice to deduplicate the logic around this,
I wouldn't want to close doors we don't necessarily have to.

> > ---
> >  builtin/repack.c            | 20 ++++++++++++++++++++
> >  t/t7703-repack-geometric.sh | 27 +++++++++++++++++++++++++++
> >  2 files changed, 47 insertions(+)
> >
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index f57869f14a..07d92fdf87 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -881,6 +881,26 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  	if (pack_kept_objects < 0)
> >  		pack_kept_objects = write_bitmaps > 0 && !write_midx;
> >
> > +	if (write_midx && write_bitmaps && geometric_factor && po_args.local) {
> > +		struct packed_git *p;
> > +
> > +		for (p = get_all_packs(the_repository); p; p = p->next) {
> > +			if (p->pack_local)
> > +				continue;
> > +
> > +			/*
> > +			 * When asked to do a local repack, but we have
> > +			 * packfiles that are inherited from an alternate, then
> > +			 * we cannot guarantee that the multi-pack-index would
> > +			 * have full coverage of all objects. We thus disable
> > +			 * writing bitmaps in that case.
> > +			 */
> > +			warning(_("disabling bitmap writing, as some objects are not being packed"));
> > +			write_bitmaps = 0;
> > +			break;
> > +		}
> > +	}
> > +
> 
> In terms of the higher-level fix here, though, I think that you could
> reasonably assume that the alternate repository has at least one pack,
> and that the combination of "write_midx && write_bitmaps &&
> po.args_local" and "has any alternate(s)" is banned (or, at least, emits
> the above warning and disables writing bitmaps).
> 
> But certainly ensuring that there are indeed packs in at least one of
> the alternate(s) doesn't hurt either, so I don't mind this approach at
> all.

It's an edge case for sure. I don't quite mind which way we go either.
For now I'll just keep the current way of doing things, but am happy to
change it.

> One thing that I don't quite follow with this logic is why we need to
> have geometric_factor set. You could (somewhat unreasonably) write a
> MIDX containing a single pack (git repack -[A|a] --write-midx
> --write-bitmap-index), or a MIDX containing just the new pack along with
> all of the existing (local) packs, (git repack --write-midx
> --write-bitmap-index).
> 
> So I think we'd want to drop the geometric_factor from the above
> conditional. (And in the future, I think we typically refer to whether
> or not the "geometry" pointer is NULL or not to indicate whether or not
> we are doing a geometric repack, though the diff context doesn't give me
> enough to know whether we have even attempted to set up that instance
> yet, so this is fine, too).

Mh. Yeah, I think you're right. I'll change it.

Patrick
Patrick Steinhardt April 13, 2023, 10:14 a.m. UTC | #3
On Thu, Apr 13, 2023 at 11:54:55AM +0200, Patrick Steinhardt wrote:
> On Wed, Apr 12, 2023 at 06:01:06PM -0400, Taylor Blau wrote:
> > On Wed, Apr 12, 2023 at 12:23:01PM +0200, Patrick Steinhardt wrote:
[snip]
> > One thing that I don't quite follow with this logic is why we need to
> > have geometric_factor set. You could (somewhat unreasonably) write a
> > MIDX containing a single pack (git repack -[A|a] --write-midx
> > --write-bitmap-index), or a MIDX containing just the new pack along with
> > all of the existing (local) packs, (git repack --write-midx
> > --write-bitmap-index).
> > 
> > So I think we'd want to drop the geometric_factor from the above
> > conditional. (And in the future, I think we typically refer to whether
> > or not the "geometry" pointer is NULL or not to indicate whether or not
> > we are doing a geometric repack, though the diff context doesn't give me
> > enough to know whether we have even attempted to set up that instance
> > yet, so this is fine, too).
> 
> Mh. Yeah, I think you're right. I'll change it.

Actually, do we even have to care about the `write_midx` case? Right now
we have:

```
if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
    die(_(incremental_bitmap_conflict_error));
```

The intent is to die when not repacking all objects into a single pack
(potentially with a cruft pack), but to allow this when writing a
multi-pack-index because they can have a bitmap that spans across
multiple packs.

Now, whether or not we write a multi-pack-index, as soon as the user
passes `-l` we cannot guarantee that we have all objects available
locally either in a single packfile nor in multiple packfiles when the
repository is connected to an alternate object directory. So in the
spirit of the preexisting check, couldn't we do the following:

```
if (write_bitmaps && po_args.local && has_alternates(repo))
    die(_("Repacking local objects is incompatible with bitmap indexes.");
```

So in words, we die when the user asks us to write a bitmap index
for a repack that is supposed to only include local objects when there
are objects that could have been inherited from an alternate object
directory.

I'm not sure whether we are okay with retroactively tightening checks
though. I'd argue it's likely fine, because it wouldn't have worked
before this check either. And I'd rather fail with explicit reasons that
are user-actionable rather than implicitly somewhere deep down in the
callstack.

Patrick
diff mbox series

Patch

diff --git a/builtin/repack.c b/builtin/repack.c
index f57869f14a..07d92fdf87 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -881,6 +881,26 @@  int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0 && !write_midx;
 
+	if (write_midx && write_bitmaps && geometric_factor && po_args.local) {
+		struct packed_git *p;
+
+		for (p = get_all_packs(the_repository); p; p = p->next) {
+			if (p->pack_local)
+				continue;
+
+			/*
+			 * When asked to do a local repack, but we have
+			 * packfiles that are inherited from an alternate, then
+			 * we cannot guarantee that the multi-pack-index would
+			 * have full coverage of all objects. We thus disable
+			 * writing bitmaps in that case.
+			 */
+			warning(_("disabling bitmap writing, as some objects are not being packed"));
+			write_bitmaps = 0;
+			break;
+		}
+	}
+
 	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
 		die(_(incremental_bitmap_conflict_error));
 
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 96c8d4cdfa..0aaec9f853 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -420,4 +420,31 @@  test_expect_success '--geometric -l with non-intact geometric sequence across OD
 	test_cmp expected-objects actual-objects.sorted
 '
 
+test_expect_success '--geometric -l disables writing bitmaps with non-local packfiles' '
+	test_when_finished "rm -fr shared member" &&
+
+	git init shared &&
+	test_commit_bulk -C shared --start=1 1 &&
+
+	git clone --shared shared member &&
+	test_commit_bulk -C member --start=2 1 &&
+
+	# When performing a geometric repack with `-l` while connecting to an
+	# alternate object database that has a packfile we do not have full
+	# coverage of objects. As a result, we expect that writing the bitmap
+	# will be disabled.
+	git -C member repack -l --geometric=2 --write-midx --write-bitmap-index 2>err &&
+	cat >expect <<-EOF &&
+	warning: disabling bitmap writing, as some objects are not being packed
+	EOF
+	test_cmp expect err &&
+	test ! -f member/.git/objects/pack/multi-pack-index-*.bitmap &&
+
+	# On the other hand, when we repack without `-l`, we should see that
+	# the bitmap gets created.
+	git -C member repack --geometric=2 --write-midx --write-bitmap-index 2>err &&
+	test_must_be_empty err &&
+	test -f member/.git/objects/pack/multi-pack-index-*.bitmap
+'
+
 test_done