[v2,08/24] midx: respect 'core.multiPackIndex' when writing

Message ID	dfd1daacc5b12d470bb6deec3448cf7dbde2bf0f.1624314293.git.me@ttaylorr.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Mon, 21 Jun 2021 18:25:18 -0400 From: Taylor Blau <me@ttaylorr.com> To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com, gitster@pobox.com, jonathantanmy@google.com Subject: [PATCH v2 08/24] midx: respect 'core.multiPackIndex' when writing Message-ID: <dfd1daacc5b12d470bb6deec3448cf7dbde2bf0f.1624314293.git.me@ttaylorr.com> References: <cover.1617991824.git.me@ttaylorr.com> <cover.1624314293.git.me@ttaylorr.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <cover.1624314293.git.me@ttaylorr.com> Precedence: bulk
Series	multi-pack reachability bitmaps \| expand [v2,00/24] multi-pack reachability bitmaps [v2,01/24] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps [v2,02/24] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps [v2,03/24] pack-bitmap-write.c: free existing bitmaps [v2,04/24] Documentation: build 'technical/bitmap-format' by default [v2,05/24] Documentation: describe MIDX-based bitmaps [v2,06/24] midx: make a number of functions non-static [v2,07/24] midx: clear auxiliary .rev after replacing the MIDX [v2,08/24] midx: respect 'core.multiPackIndex' when writing [v2,09/24] midx: infer preferred pack when not given one [v2,10/24] pack-bitmap.c: introduce 'bitmap_num_objects()' [v2,11/24] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' [v2,12/24] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' [v2,13/24] pack-bitmap: read multi-pack bitmaps [v2,14/24] pack-bitmap: write multi-pack bitmaps [v2,15/24] t5310: move some tests to lib-bitmap.sh [v2,16/24] t/helper/test-read-midx.c: add --checksum mode [v2,17/24] t5326: test multi-pack bitmap behavior [v2,18/24] t0410: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP [v2,19/24] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP [v2,20/24] t5319: don't write MIDX bitmaps in t5319 [v2,21/24] t7700: update to work with MIDX bitmap test knob [v2,22/24] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' [v2,23/24] p5310: extract full and partial bitmap tests [v2,24/24] p5326: perf tests for MIDX bitmaps

Taylor Blau June 21, 2021, 10:25 p.m. UTC

When writing a new multi-pack index, write_midx_internal() attempts to
load any existing one to fill in some pieces of information. But it uses
load_multi_pack_index(), which ignores the configuration
"core.multiPackIndex", which indicates whether or not Git is allowed to
read an existing multi-pack-index.

Replace this with a routine that does respect that setting, to avoid
reading multi-pack-index files when told not to.

This avoids a problem that would arise in subsequent patches due to the
combination of 'git repack' reopening the object store in-process and
the multi-pack index code not checking whether a pack already exists in
the object store when calling add_pack_to_midx().

This would ultimately lead to a cycle being created along the
'packed_git' struct's '->next' pointer. That is obviously bad, but it
has hard-to-debug downstream effects like saying a bitmap can't be
loaded for a pack because one already exists (for the same pack).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Ævar Arnfjörð Bjarmason June 24, 2021, 11:43 p.m. UTC | #1

On Mon, Jun 21 2021, Taylor Blau wrote:

> When writing a new multi-pack index, write_midx_internal() attempts to
> load any existing one to fill in some pieces of information. But it uses
> load_multi_pack_index(), which ignores the configuration
> "core.multiPackIndex", which indicates whether or not Git is allowed to
> read an existing multi-pack-index.
>
> Replace this with a routine that does respect that setting, to avoid
> reading multi-pack-index files when told not to.
>
> This avoids a problem that would arise in subsequent patches due to the
> combination of 'git repack' reopening the object store in-process and
> the multi-pack index code not checking whether a pack already exists in
> the object store when calling add_pack_to_midx().
>
> This would ultimately lead to a cycle being created along the
> 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> has hard-to-debug downstream effects like saying a bitmap can't be
> loaded for a pack because one already exists (for the same pack).
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 40eb7974ba..759007d5a8 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -908,8 +908,18 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  
>  	if (m)
>  		ctx.m = m;
> -	else
> -		ctx.m = load_multi_pack_index(object_dir, 1);
> +	else {

Style nit: leaves the initial "if" braceless now that the "else" gained
braces.

Jeff King July 21, 2021, 10:23 a.m. UTC | #2

On Mon, Jun 21, 2021 at 06:25:18PM -0400, Taylor Blau wrote:

> When writing a new multi-pack index, write_midx_internal() attempts to
> load any existing one to fill in some pieces of information. But it uses
> load_multi_pack_index(), which ignores the configuration
> "core.multiPackIndex", which indicates whether or not Git is allowed to
> read an existing multi-pack-index.
> 
> Replace this with a routine that does respect that setting, to avoid
> reading multi-pack-index files when told not to.
> 
> This avoids a problem that would arise in subsequent patches due to the
> combination of 'git repack' reopening the object store in-process and
> the multi-pack index code not checking whether a pack already exists in
> the object store when calling add_pack_to_midx().
> 
> This would ultimately lead to a cycle being created along the
> 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> has hard-to-debug downstream effects like saying a bitmap can't be
> loaded for a pack because one already exists (for the same pack).

I'm not sure I completely understand the bug that this causes.

But another question: does this impact how

  git -c core.multipackindex=false multi-pack-index write

behaves? I.e., do we still write, but just avoid reading the existing
midx? That itself seems like a more sensible behavior (e.g., trying to
recover from a broken midx state).

-Peff

Taylor Blau July 21, 2021, 7:22 p.m. UTC | #3

On Wed, Jul 21, 2021 at 06:23:23AM -0400, Jeff King wrote:
> On Mon, Jun 21, 2021 at 06:25:18PM -0400, Taylor Blau wrote:
>
> > When writing a new multi-pack index, write_midx_internal() attempts to
> > load any existing one to fill in some pieces of information. But it uses
> > load_multi_pack_index(), which ignores the configuration
> > "core.multiPackIndex", which indicates whether or not Git is allowed to
> > read an existing multi-pack-index.
> >
> > Replace this with a routine that does respect that setting, to avoid
> > reading multi-pack-index files when told not to.
> >
> > This avoids a problem that would arise in subsequent patches due to the
> > combination of 'git repack' reopening the object store in-process and
> > the multi-pack index code not checking whether a pack already exists in
> > the object store when calling add_pack_to_midx().
> >
> > This would ultimately lead to a cycle being created along the
> > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > has hard-to-debug downstream effects like saying a bitmap can't be
> > loaded for a pack because one already exists (for the same pack).
>
> I'm not sure I completely understand the bug that this causes.

Off-hand, I can't quite remember either. But it is important; I do have
a distinct memory of dropping this patch and then watching a 'git repack
--write-midx' (that option will be introduced in a later series) fail
horribly.

If I remember correctly, the bug has to do with loading a MIDX twice in
the same process. When we call add_packed_git() from within
prepare_midx_pack(), we load the pack without caring whether or not it's
already loaded. So loading a MIDX twice in the same process will fail.

So really I think that this is papering over that bug: we're just
removing one of the times that we happened to load a MIDX from during
the writing phase.

What I do remember is that this bug was a huge pain to figure out ;).
I'm happy to look further if you aren't satisfied with my vague
explanation here (and I wouldn't blame you).

> But another question: does this impact how
>
>   git -c core.multipackindex=false multi-pack-index write
>
> behaves? I.e., do we still write, but just avoid reading the existing
> midx? That itself seems like a more sensible behavior (e.g., trying to
> recover from a broken midx state).

Yes. Before this patch, that invocation would still load and use any
existing MIDX to write a new one. Now we don't, because (unlike
load_multi_pack_index()) prepare_multi_pack_index_one() does check
core.multiPackIndex before returning anything.

> -Peff

Thanks,
Taylor

Jeff King July 23, 2021, 8:29 a.m. UTC | #4

On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote:

> > > This avoids a problem that would arise in subsequent patches due to the
> > > combination of 'git repack' reopening the object store in-process and
> > > the multi-pack index code not checking whether a pack already exists in
> > > the object store when calling add_pack_to_midx().
> > >
> > > This would ultimately lead to a cycle being created along the
> > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > > has hard-to-debug downstream effects like saying a bitmap can't be
> > > loaded for a pack because one already exists (for the same pack).
> >
> > I'm not sure I completely understand the bug that this causes.
> 
> Off-hand, I can't quite remember either. But it is important; I do have
> a distinct memory of dropping this patch and then watching a 'git repack
> --write-midx' (that option will be introduced in a later series) fail
> horribly.
> 
> If I remember correctly, the bug has to do with loading a MIDX twice in
> the same process. When we call add_packed_git() from within
> prepare_midx_pack(), we load the pack without caring whether or not it's
> already loaded. So loading a MIDX twice in the same process will fail.
> 
> So really I think that this is papering over that bug: we're just
> removing one of the times that we happened to load a MIDX from during
> the writing phase.

Hmm, after staring at this for a bit, I've unconfused and re-confused
myself several times.

Here are some interesting bits:

  - calling load_multi_pack_index() directly creates a new midx object.
    None of its m->packs[] array will be filled in. Nor is it reachable
    as r->objects->multi_pack_index.

  - in using that midx, we end up calling prepare_midx_pack() for
    various packs, which creates a new packed_git struct and adds it to
    r->objects->packed_git (via install_packed_git()).

So that's a bit weird already, because we have packed_git structs in
r->objects that came from a midx that isn't r->objects->multi_pack_index.
And then if we later call prepare_multi_pack_index(), for example as
part of a pack reprepare, then we'd end up with duplicates.

Whereas normally, when a direct load_multi_pack_index() was not called,
our only midx would be r->objects->multi_pack_index, and so we'd avoid
re-loading it.

That seems wrong and wasteful, but I don't see how it results in a
circular linked list. And it seems like it would already be the case for
this write path, independent of your series. Either way, the solution is
probably for prepare_midx_pack() to check for duplicates (which we can
do pretty cheaply these days due to the hashmap; see prepare_pack).

But I'm worried there is something else going on. Your commit message
mentions add_pack_to_midx(). That's something we call as part of
write_midx_internal(), and it does create other packed_git structs. But
it never calls install_packed_git() on them; they just live in the
write_midx_context. So I'm not sure how they'd interfere with things.

And then there's one final oddity. Your patch assigns to ctx.m from
r->objects->multi_pack_index. But later in write_midx_internal(), we
call close_midx(). In the original, it's in the middle of the function,
but one of your patches puts it at the end of the function. But that
means we are closing r->objects->multi_pack_index.

Looking at close_midx(), it does not actually zero the struct. So we'd
still have r->objects->multi_pack_index->data pointed to memory which
has been unmapped. That seems like an accident waiting to happen. I
guess it doesn't usually cause problems because we'd typically write a
midx near the end of the process, and then not look up other objects?

So I'm concerned this is introducing a subtle bug that will bite us
later. And we should figure out what the actual thing it's fixing is, so
we can understand if there is a better way to fix it (e.g., by removing
duplicates in prepare_midx_pack(), or if it is some interaction with the
writing code).

I guess a good thing to try would be dropping this patch and seeing if
the tests break. ;)

-Peff

Taylor Blau July 26, 2021, 6:59 p.m. UTC | #5

On Fri, Jul 23, 2021 at 04:29:37AM -0400, Jeff King wrote:
> On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote:
>
> > > > This avoids a problem that would arise in subsequent patches due to the
> > > > combination of 'git repack' reopening the object store in-process and
> > > > the multi-pack index code not checking whether a pack already exists in
> > > > the object store when calling add_pack_to_midx().
> > > >
> > > > This would ultimately lead to a cycle being created along the
> > > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > > > has hard-to-debug downstream effects like saying a bitmap can't be
> > > > loaded for a pack because one already exists (for the same pack).
> > >
> > > I'm not sure I completely understand the bug that this causes.
> >
> > Off-hand, I can't quite remember either. But it is important; I do have
> > a distinct memory of dropping this patch and then watching a 'git repack
> > --write-midx' (that option will be introduced in a later series) fail
> > horribly.
> >
> > If I remember correctly, the bug has to do with loading a MIDX twice in
> > the same process. When we call add_packed_git() from within
> > prepare_midx_pack(), we load the pack without caring whether or not it's
> > already loaded. So loading a MIDX twice in the same process will fail.
> >
> > So really I think that this is papering over that bug: we're just
> > removing one of the times that we happened to load a MIDX from during
> > the writing phase.
>
> Hmm, after staring at this for a bit, I've unconfused and re-confused
> myself several times.
>
> Here are some interesting bits:
>
>   - calling load_multi_pack_index() directly creates a new midx object.
>     None of its m->packs[] array will be filled in. Nor is it reachable
>     as r->objects->multi_pack_index.
>
>   - in using that midx, we end up calling prepare_midx_pack() for
>     various packs, which creates a new packed_git struct and adds it to
>     r->objects->packed_git (via install_packed_git()).
>
> So that's a bit weird already, because we have packed_git structs in
> r->objects that came from a midx that isn't r->objects->multi_pack_index.
> And then if we later call prepare_multi_pack_index(), for example as
> part of a pack reprepare, then we'd end up with duplicates.

Ah, this jogged my memory: this is a relic from when we generated MIDX
bitmaps in-process with the rest of the `repack` code. And when we did
that, we did have to call `reprepare_packed_git()` after writing the new
packs but before moving them into place.

So that's where the `reprepare_packed_git()` came from, but we don't
have any of that code anymore, since we now generate MIDX bitmaps by
invoking:

    git multi-pack-index write --bitmap --stdin-packs --refs-snapshot

as a sub-process of `git repack`; no need for any reprepare which is
what was triggering this bug.

To be sure, I reverted this patch out of GitHub's fork, and reran the
tests both in normal mode (just `make test`) and then once more with the
`GIT_TEST_MULTI_PACK_INDEX{,_WRITE_BITMAP}` environment variables set.
Unsurprisingly, it passed both times.

I'm happy to keep digging further, but I think that I'm 99% satisfied
here. Digging further involves resurrecting a much older version of this
series (and others adjacent to it), and there are probably other bugs
lurking that would be annoying to tease out.

In any case, let's drop this patch from the series. It's disappointing
that we can't run:

    git -c core.multiPackIndex= multi-pack-index write

anymore, but I guess that's no worse than the state we were in before
this patch, so I'm content to let it live on.

Thanks,
Taylor

Taylor Blau July 26, 2021, 10:14 p.m. UTC | #6

On Mon, Jul 26, 2021 at 02:59:02PM -0400, Taylor Blau wrote:
> On Fri, Jul 23, 2021 at 04:29:37AM -0400, Jeff King wrote:
> > On Wed, Jul 21, 2021 at 03:22:34PM -0400, Taylor Blau wrote:
> >
> > > > > This avoids a problem that would arise in subsequent patches due to the
> > > > > combination of 'git repack' reopening the object store in-process and
> > > > > the multi-pack index code not checking whether a pack already exists in
> > > > > the object store when calling add_pack_to_midx().
> > > > >
> > > > > This would ultimately lead to a cycle being created along the
> > > > > 'packed_git' struct's '->next' pointer. That is obviously bad, but it
> > > > > has hard-to-debug downstream effects like saying a bitmap can't be
> > > > > loaded for a pack because one already exists (for the same pack).
> > > >
> > > > I'm not sure I completely understand the bug that this causes.
> > >
> > > Off-hand, I can't quite remember either. But it is important; I do have
> > > a distinct memory of dropping this patch and then watching a 'git repack
> > > --write-midx' (that option will be introduced in a later series) fail
> > > horribly.
> > >
> > > If I remember correctly, the bug has to do with loading a MIDX twice in
> > > the same process. When we call add_packed_git() from within
> > > prepare_midx_pack(), we load the pack without caring whether or not it's
> > > already loaded. So loading a MIDX twice in the same process will fail.
> > >
> > > So really I think that this is papering over that bug: we're just
> > > removing one of the times that we happened to load a MIDX from during
> > > the writing phase.
> >
> > Hmm, after staring at this for a bit, I've unconfused and re-confused
> > myself several times.
> >
> > Here are some interesting bits:
> >
> >   - calling load_multi_pack_index() directly creates a new midx object.
> >     None of its m->packs[] array will be filled in. Nor is it reachable
> >     as r->objects->multi_pack_index.
> >
> >   - in using that midx, we end up calling prepare_midx_pack() for
> >     various packs, which creates a new packed_git struct and adds it to
> >     r->objects->packed_git (via install_packed_git()).
> >
> > So that's a bit weird already, because we have packed_git structs in
> > r->objects that came from a midx that isn't r->objects->multi_pack_index.
> > And then if we later call prepare_multi_pack_index(), for example as
> > part of a pack reprepare, then we'd end up with duplicates.
>
> Ah, this jogged my memory: this is a relic from when we generated MIDX
> bitmaps in-process with the rest of the `repack` code. And when we did
> that, we did have to call `reprepare_packed_git()` after writing the new
> packs but before moving them into place.

Actually, I take that back. You were right from the start: the way the
code is written we *can* end up calling both:

  - load_multi_pack_index, from write_midx_internal(), which sets up a
    MIDX, but does not update r->objects->multi_pack_index to point at
    it.

  - ...and prepare_multi_pack_index_one (via prepare_bitmap_git ->
    open_bitmap -> open_midx_bitmap -> get_multi_pack_index ->
    prepare_packed_git) which *also* creates a new MIDX, *and*
    updates the_repository->objects->multi_pack_index to point at it.

(The latter codepath is from the check in write_midx_internal() to see
if we already have a MIDX bitmap when the MIDX we are trying to write
already exists on disk.)

So in this scenario, we have two copies of the same MIDX open, and the
repository's single pack is opened in one of the MIDXs, but not both.
One copy of the pack is pointed at via r->objects->packed_git. Then when
we fall back to open_pack_bitmap(), we call get_all_packs(), which calls
prepare_midx_pack(), which installs the second MIDX's copy of the same
pack into the r->objects->packed_git, and we have a cycle.

I think there are a few ways to fix this bug. The most obvious is to
make install_packed_git() check for the existence of the pack in the
hashmap of installed packs before (re-)installing it. But that could be
quadratic if the hashmap has too many collisions (and the lookup tends
towards being linear in the number of keys rather than constant).

But I think that a more straightforward way would be to open the MIDX we
use when generating the MIDX with prepare_multi_pack_index_one() instead
of load_multi_pack_index() so that the resulting MIDX is pointed at by
r->objects->multi_pack_index. That would prevent the latter call from
deep within the callstack of prepare_bitmap_git() from opening another
copy and then (mistakenly) re-installing the same pack twice.

Thoughts?

Thanks,
Taylor

Jeff King July 27, 2021, 5:17 p.m. UTC | #7

On Mon, Jul 26, 2021 at 02:59:02PM -0400, Taylor Blau wrote:

> > Hmm, after staring at this for a bit, I've unconfused and re-confused
> > myself several times.
> >
> > Here are some interesting bits:
> >
> >   - calling load_multi_pack_index() directly creates a new midx object.
> >     None of its m->packs[] array will be filled in. Nor is it reachable
> >     as r->objects->multi_pack_index.
> >
> >   - in using that midx, we end up calling prepare_midx_pack() for
> >     various packs, which creates a new packed_git struct and adds it to
> >     r->objects->packed_git (via install_packed_git()).
> >
> > So that's a bit weird already, because we have packed_git structs in
> > r->objects that came from a midx that isn't r->objects->multi_pack_index.
> > And then if we later call prepare_multi_pack_index(), for example as
> > part of a pack reprepare, then we'd end up with duplicates.
> 
> Ah, this jogged my memory: this is a relic from when we generated MIDX
> bitmaps in-process with the rest of the `repack` code. And when we did
> that, we did have to call `reprepare_packed_git()` after writing the new
> packs but before moving them into place.
> 
> So that's where the `reprepare_packed_git()` came from, but we don't
> have any of that code anymore, since we now generate MIDX bitmaps by
> invoking:
> 
>     git multi-pack-index write --bitmap --stdin-packs --refs-snapshot
> 
> as a sub-process of `git repack`; no need for any reprepare which is
> what was triggering this bug.

OK, that makes sense, especially given the "close_midx() leaves the
pointer bogus" stuff discussed elsewhere.

> To be sure, I reverted this patch out of GitHub's fork, and reran the
> tests both in normal mode (just `make test`) and then once more with the
> `GIT_TEST_MULTI_PACK_INDEX{,_WRITE_BITMAP}` environment variables set.
> Unsurprisingly, it passed both times.
> 
> I'm happy to keep digging further, but I think that I'm 99% satisfied
> here. Digging further involves resurrecting a much older version of this
> series (and others adjacent to it), and there are probably other bugs
> lurking that would be annoying to tease out.
> 
> In any case, let's drop this patch from the series. It's disappointing
> that we can't run:
> 
>     git -c core.multiPackIndex= multi-pack-index write
> 
> anymore, but I guess that's no worse than the state we were in before
> this patch, so I'm content to let it live on.

Great. If we can drop it, I think that is the best path forward. I think
that may simplify things for the writing patch, too, then. It should not
matter if we move close_midx() anymore, because we will not be closing
the main r->objects->multi_pack_index struct.

I do suspect we could be skipping the load _and_ close of the midx
entirely in write_midx_internal(), and just using whatever the caller
has passed in (and arguably just having most callers pass in the regular
midx struct if they want us to reuse parts of it). That might be a
cleanup we can leave for later, but it might be necessary to touch these
bits anyway (if there is still some kind of close_midx() ordering gotcha
in the function).

-Peff

Jeff King July 27, 2021, 5:29 p.m. UTC | #8

On Mon, Jul 26, 2021 at 06:14:09PM -0400, Taylor Blau wrote:

> > Ah, this jogged my memory: this is a relic from when we generated MIDX
> > bitmaps in-process with the rest of the `repack` code. And when we did
> > that, we did have to call `reprepare_packed_git()` after writing the new
> > packs but before moving them into place.
> 
> Actually, I take that back. You were right from the start: the way the
> code is written we *can* end up calling both:
> 
>   - load_multi_pack_index, from write_midx_internal(), which sets up a
>     MIDX, but does not update r->objects->multi_pack_index to point at
>     it.
> 
>   - ...and prepare_multi_pack_index_one (via prepare_bitmap_git ->
>     open_bitmap -> open_midx_bitmap -> get_multi_pack_index ->
>     prepare_packed_git) which *also* creates a new MIDX, *and*
>     updates the_repository->objects->multi_pack_index to point at it.
> 
> (The latter codepath is from the check in write_midx_internal() to see
> if we already have a MIDX bitmap when the MIDX we are trying to write
> already exists on disk.)
> 
> So in this scenario, we have two copies of the same MIDX open, and the
> repository's single pack is opened in one of the MIDXs, but not both.
> One copy of the pack is pointed at via r->objects->packed_git. Then when
> we fall back to open_pack_bitmap(), we call get_all_packs(), which calls
> prepare_midx_pack(), which installs the second MIDX's copy of the same
> pack into the r->objects->packed_git, and we have a cycle.

Right, I understand how that ends up with duplicate structs for each
pack. But how do we get a cycle out of that?

> I think there are a few ways to fix this bug. The most obvious is to
> make install_packed_git() check for the existence of the pack in the
> hashmap of installed packs before (re-)installing it. But that could be
> quadratic if the hashmap has too many collisions (and the lookup tends
> towards being linear in the number of keys rather than constant).

I think it may be worth doing that anyway. You can assume the hashmap
will behave reasonably.

But it would mean that the "multi_pack_index" flag in packed_git does
not specify _which_ midx is pointing to it. At the very least, it would
need to become a ref-count (so when one midx goes away, it does not lose
its "I am part of a midx" flag).  And possibly it would need to actually
know the complete list of midx structs it's associated with (I haven't
looked at all of the uses of that flag).

That makes things sufficiently tricky that I would prefer not to
untangle it as part of this series.

> But I think that a more straightforward way would be to open the MIDX we
> use when generating the MIDX with prepare_multi_pack_index_one() instead
> of load_multi_pack_index() so that the resulting MIDX is pointed at by
> r->objects->multi_pack_index. That would prevent the latter call from
> deep within the callstack of prepare_bitmap_git() from opening another
> copy and then (mistakenly) re-installing the same pack twice.

But now the internal midx writing code can never call close_midx() on
that, because it does not own it to close. Can we simply drop the
close_midx() call there?

This would all make much more sense to me if write_midx_internal()
simply took a conceptually read-only midx as a parameter, and the caller
passed in the appropriate one (probably even using
prepare_multi_pack_index_one() to get it).

-Peff

Taylor Blau July 27, 2021, 5:36 p.m. UTC | #9

On Tue, Jul 27, 2021 at 01:29:49PM -0400, Jeff King wrote:
> On Mon, Jul 26, 2021 at 06:14:09PM -0400, Taylor Blau wrote:
> > So in this scenario, we have two copies of the same MIDX open, and the
> > repository's single pack is opened in one of the MIDXs, but not both.
> > One copy of the pack is pointed at via r->objects->packed_git. Then when
> > we fall back to open_pack_bitmap(), we call get_all_packs(), which calls
> > prepare_midx_pack(), which installs the second MIDX's copy of the same
> > pack into the r->objects->packed_git, and we have a cycle.
>
> Right, I understand how that ends up with duplicate structs for each
> pack. But how do we get a cycle out of that?

Sorry, it isn't a true cycle where p->next == p.

> But now the internal midx writing code can never call close_midx() on
> that, because it does not own it to close. Can we simply drop the
> close_midx() call there?
>
> This would all make much more sense to me if write_midx_internal()
> simply took a conceptually read-only midx as a parameter, and the caller
> passed in the appropriate one (probably even using
> prepare_multi_pack_index_one() to get it).

No, we can't drop the close_midx() call there because we must close the
MIDX file on Windows before moving a new one into place. My feeling is
we should always be working on the r->objects->multi_pack_index pointer,
and calling close_object_store() there instead of close_midx().

Does that seem like a reasonable approach to you?

Thanks,
Taylor

Jeff King July 27, 2021, 5:42 p.m. UTC | #10

On Tue, Jul 27, 2021 at 01:36:34PM -0400, Taylor Blau wrote:

> > But now the internal midx writing code can never call close_midx() on
> > that, because it does not own it to close. Can we simply drop the
> > close_midx() call there?
> >
> > This would all make much more sense to me if write_midx_internal()
> > simply took a conceptually read-only midx as a parameter, and the caller
> > passed in the appropriate one (probably even using
> > prepare_multi_pack_index_one() to get it).
> 
> No, we can't drop the close_midx() call there because we must close the
> MIDX file on Windows before moving a new one into place. My feeling is
> we should always be working on the r->objects->multi_pack_index pointer,
> and calling close_object_store() there instead of close_midx().
> 
> Does that seem like a reasonable approach to you?

Yes, though I'd have said that it is the responsibility of the caller
(who knows we are operating with r->objects->multi_pack_index) to do
that closing. But maybe it's not possible if the rename-into-place
happens at too low a level.

BTW, yet another weirdness: close_object_store() will call close_midx()
on the outermost midx struct, ignoring o->multi_pack_index->next
entirely. So that's a leak, but also means we may not be closing the
midx we're interested in (since write_midx_internal() takes an
object-dir parameter, and we could be pointing to some other
object-dir's midx).

-Peff

Taylor Blau July 27, 2021, 5:47 p.m. UTC | #11

On Tue, Jul 27, 2021 at 01:42:35PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 01:36:34PM -0400, Taylor Blau wrote:
>
> > > But now the internal midx writing code can never call close_midx() on
> > > that, because it does not own it to close. Can we simply drop the
> > > close_midx() call there?
> > >
> > > This would all make much more sense to me if write_midx_internal()
> > > simply took a conceptually read-only midx as a parameter, and the caller
> > > passed in the appropriate one (probably even using
> > > prepare_multi_pack_index_one() to get it).
> >
> > No, we can't drop the close_midx() call there because we must close the
> > MIDX file on Windows before moving a new one into place. My feeling is
> > we should always be working on the r->objects->multi_pack_index pointer,
> > and calling close_object_store() there instead of close_midx().
> >
> > Does that seem like a reasonable approach to you?
>
> Yes, though I'd have said that it is the responsibility of the caller
> (who knows we are operating with r->objects->multi_pack_index) to do
> that closing. But maybe it's not possible if the rename-into-place
> happens at too low a level.

Right; write_midx_internal() needs to access the MIDX right up until the
point that we write the new one into place, so the only place to close
it is in write_midx_internal().

> BTW, yet another weirdness: close_object_store() will call close_midx()
> on the outermost midx struct, ignoring o->multi_pack_index->next
> entirely. So that's a leak, but also means we may not be closing the
> midx we're interested in (since write_midx_internal() takes an
> object-dir parameter, and we could be pointing to some other
> object-dir's midx).

Yuck, this is a mess. I'm tempted to say that we should be closing the
MIDX that we're operating on inside of write_midx_internal() so we can
write, but then declaring the whole object store to be bunk and calling
close_object_store() before leaving the function. Of course, one of
those steps should be closing the inner-most MIDX before closing the
next one and so on.

> -Peff

Thanks,
Taylor

Jeff King July 27, 2021, 5:55 p.m. UTC | #12

On Tue, Jul 27, 2021 at 01:47:40PM -0400, Taylor Blau wrote:

> > BTW, yet another weirdness: close_object_store() will call close_midx()
> > on the outermost midx struct, ignoring o->multi_pack_index->next
> > entirely. So that's a leak, but also means we may not be closing the
> > midx we're interested in (since write_midx_internal() takes an
> > object-dir parameter, and we could be pointing to some other
> > object-dir's midx).
> 
> Yuck, this is a mess. I'm tempted to say that we should be closing the
> MIDX that we're operating on inside of write_midx_internal() so we can
> write, but then declaring the whole object store to be bunk and calling
> close_object_store() before leaving the function. Of course, one of
> those steps should be closing the inner-most MIDX before closing the
> next one and so on.

That gets even weirder when you look at other callers of
write_midx_internal(). For instance, expire_midx_packs() is calling
load_multi_pack_index() directly, and then passing it to
write_midx_internal().

So closing the whole object store there is likewise weird.

I actually think having write_midx_internal() open up a new midx is
reasonable-ish. It's just that:

  - it's weird when it stuffs duplicate packs into the
    r->objects->packed_git list. But AFAICT that's not actually hurting
    anything?

  - we do need to make sure that the midx is closed (not just our copy,
    but any other open copies that happen to be in the same process) in
    order for things to work on Windows.

So I guess because of the second point, the internal midx write probably
needs to be calling close_object_store(). But because other callers use
load_multi_pack_index(), it probably needs to be closing the one that is
passed in, too! But of course not double-closing it if it did come from
the regular object store. One easy easy way to avoid that is to just
open a separate one.

I have some spicy takes on how midx's should have been designed, but I
think it's probably not productive to rant about it at this point. ;)

-Peff

Taylor Blau July 27, 2021, 8:05 p.m. UTC | #13

On Tue, Jul 27, 2021 at 01:55:52PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 01:47:40PM -0400, Taylor Blau wrote:
>
> > > BTW, yet another weirdness: close_object_store() will call close_midx()
> > > on the outermost midx struct, ignoring o->multi_pack_index->next
> > > entirely. So that's a leak, but also means we may not be closing the
> > > midx we're interested in (since write_midx_internal() takes an
> > > object-dir parameter, and we could be pointing to some other
> > > object-dir's midx).
> >
> > Yuck, this is a mess. I'm tempted to say that we should be closing the
> > MIDX that we're operating on inside of write_midx_internal() so we can
> > write, but then declaring the whole object store to be bunk and calling
> > close_object_store() before leaving the function. Of course, one of
> > those steps should be closing the inner-most MIDX before closing the
> > next one and so on.
>
> That gets even weirder when you look at other callers of
> write_midx_internal(). For instance, expire_midx_packs() is calling
> load_multi_pack_index() directly, and then passing it to
> write_midx_internal().
>
> So closing the whole object store there is likewise weird.
>
> I actually think having write_midx_internal() open up a new midx is
> reasonable-ish. It's just that:
>
>   - it's weird when it stuffs duplicate packs into the
>     r->objects->packed_git list. But AFAICT that's not actually hurting
>     anything?

It is hurting us when we try to write a MIDX bitmap, because we try to
see if one already exists. And to do that, we call prepare_bitmap_git(),
which tries to call open_pack_bitmap_1 on *each* pack in the packed_git
list. Critically, prepare_bitmap_git() errors out if it is called with a
bitmap_git that has a non-NULL `->pack` pointer.

So they aren't a cycle in the sense that `p->next == p`, but it does
cause problems for us nonetheless.

I stepped away from my computer for an hour or so and thought about
this, and I think that the solution is two-fold:

  - We should be more careful about freeing up the ->next pointers of a
    MIDX, and releasing the memory we allocated to hold each MIDX struct
    in the first place.

  - We should always be operating on the repository's
    r->objects->multi_pack_index, or any other MIDX that can be reached
    via walking the `->next` pointers. If we do that consistently, then
    we'll only have at most one instance of a MIDX struct corresponding
    to each MIDX file on disk.

In the reroll that I'll send shortly, those are:

  - https://github.com/ttaylorr/git/commit/61a617715f3827401522c7b08b50bb6866f2a4e9
  - https://github.com/ttaylorr/git/commit/fd15ecf47c57ce4ff0d31621c2c9f61ff7a74939

respectively. It resolves my issue locally which was that I was
previously unable to run:

    GIT_TEST_MULTI_PACK_INDEX=1 \
    GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1 \
      t7700-repack.sh

without t7700.13 failing on me (it previously complained about a
"duplicate" .bitmap file, which is a side-effect of placing duplicates
in the packed_git list, not a true duplicate .bitmap on disk).

I'm waiting on a CI run [1], but I'm relatively happy with the result. I
think we could do something similar to the MIDX code like we did to the
commit-graph code in [2], but I'm reasonably happy with where we are
now.

Thanks,
Taylor

[1]: https://github.com/ttaylorr/git/actions/runs/1072513087
[2]: https://lore.kernel.org/git/cover.1580424766.git.me@ttaylorr.com/

Jeff King July 28, 2021, 5:46 p.m. UTC | #14

On Tue, Jul 27, 2021 at 04:05:39PM -0400, Taylor Blau wrote:

> > I actually think having write_midx_internal() open up a new midx is
> > reasonable-ish. It's just that:
> >
> >   - it's weird when it stuffs duplicate packs into the
> >     r->objects->packed_git list. But AFAICT that's not actually hurting
> >     anything?
> 
> It is hurting us when we try to write a MIDX bitmap, because we try to
> see if one already exists. And to do that, we call prepare_bitmap_git(),
> which tries to call open_pack_bitmap_1 on *each* pack in the packed_git
> list. Critically, prepare_bitmap_git() errors out if it is called with a
> bitmap_git that has a non-NULL `->pack` pointer.

It doesn't error out. It does produce a warning(), though, if it ignores
a bitmap (and that warning is doubly confusing because it is ignoring
bitmap X because it has already loaded and will use that exact same X!).

This causes t7700.13 to fail because it is being picky about stderr
being empty.

So the overall behavior is correct, but I agree it's sufficiently ugly
that we should make sure it doesn't happen.

  Side note: IMHO the "check all packs to see if there are any other
  bitmaps to warn about" behavior is kind of pointless, and we should
  consider just returning as soon as we have one. This is already
  somewhat the case after your midx-bitmap patches, as we will not even
  bother to look for a pack bitmap after finding a midx bitmap. That is
  a good thing, because it means you can keep pack bitmaps around for
  flexibility. But let's leave any changes to the pack-only behavior out
  of this series for simplicity.

> I stepped away from my computer for an hour or so and thought about
> this, and I think that the solution is two-fold:
> 
>   - We should be more careful about freeing up the ->next pointers of a
>     MIDX, and releasing the memory we allocated to hold each MIDX struct
>     in the first place.

Yeah. This is a bug already before your series. I suspect nobody noticed
because it's very rare for us to call close_midx() at all, and it only
matters if there's an alternate odb with a midx. (The common call to
close_midx() is in these write paths, but it is always using a single
midx file).

>   - We should always be operating on the repository's
>     r->objects->multi_pack_index, or any other MIDX that can be reached
>     via walking the `->next` pointers. If we do that consistently, then
>     we'll only have at most one instance of a MIDX struct corresponding
>     to each MIDX file on disk.

Certainly that makes sense to me in terms of the Windows "must close the
current midx before writing" behavior. We have to realize that we're
operating in the current repo.

But we do allow an "--object-dir" option to "multi-pack-index write",
and I don't see any other code explicitly requiring that it be part of
the current repository. What I'm wondering is whether this would be
breaking:

  cd $REPO/..
  git multi-pack-index --object-dir $REPO/.git/objects write

or:

  cd /some/other/repo
  git multi-pack-index --object-dir $REPO/.git/objects write

The latter does seem to work, but the former segfaults (usually -- if
there's already a midx it is OK).

If it is broken now, this may be a good time to explicitly forbid it.
It does seem to make the --object-dir mostly pointless, though it would
still work for operating on a midx in one of your alternates. I'm not
sure I understand the original point of that option, and if the current
behavior is sufficient.

If it turns out that we can't forbid writing midx's besides the ones in
r->objects, it may be sufficient to just make any assumptions
conditional. I.e., _if_ it's one of the ones mentioned by r->objects,
then close it, but otherwise leave it open. But if we can get away with
restricting ourselves as you described, I think the result will be much
simpler, and we should prefer that.

-Peff

Taylor Blau July 29, 2021, 7:44 p.m. UTC | #15

On Wed, Jul 28, 2021 at 01:46:21PM -0400, Jeff King wrote:
> On Tue, Jul 27, 2021 at 04:05:39PM -0400, Taylor Blau wrote:
>
> > > I actually think having write_midx_internal() open up a new midx is
> > > reasonable-ish. It's just that:
> > >
> > >   - it's weird when it stuffs duplicate packs into the
> > >     r->objects->packed_git list. But AFAICT that's not actually hurting
> > >     anything?
> >
> > It is hurting us when we try to write a MIDX bitmap, because we try to
> > see if one already exists. And to do that, we call prepare_bitmap_git(),
> > which tries to call open_pack_bitmap_1 on *each* pack in the packed_git
> > list. Critically, prepare_bitmap_git() errors out if it is called with a
> > bitmap_git that has a non-NULL `->pack` pointer.
>
> It doesn't error out. It does produce a warning(), though, if it ignores
> a bitmap (and that warning is doubly confusing because it is ignoring
> bitmap X because it has already loaded and will use that exact same X!).
>
> This causes t7700.13 to fail because it is being picky about stderr
> being empty.

Right, sorry for suggesting that the error was more severe than it
actually is.

> So the overall behavior is correct, but I agree it's sufficiently ugly
> that we should make sure it doesn't happen.

100% agreed. I think the most unfortunate thing is the state of
r->objects->packed_git, since it's utterly bizarre to have the same pack
opened twice and have both of those copies in the list. That is
definitely worth preventing.

>   Side note: IMHO the "check all packs to see if there are any other
>   bitmaps to warn about" behavior is kind of pointless, and we should
>   consider just returning as soon as we have one. This is already
>   somewhat the case after your midx-bitmap patches, as we will not even
>   bother to look for a pack bitmap after finding a midx bitmap. That is
>   a good thing, because it means you can keep pack bitmaps around for
>   flexibility. But let's leave any changes to the pack-only behavior out
>   of this series for simplicity.

I agree. I'd be in favor of something like

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f599646e19..5450ffb04c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -378,13 +378,6 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}

-	if (bitmap_git->pack || bitmap_git->midx) {
-		/* ignore extra bitmap file; we can only handle one */
-		warning("ignoring extra bitmap file: %s", packfile->pack_name);
-		close(fd);
-		return -1;
-	}
-
 	bitmap_git->pack = packfile;
 	bitmap_git->map_size = xsize_t(st.st_size);
 	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -465,16 +458,15 @@ static int open_pack_bitmap(struct repository *r,
 			    struct bitmap_index *bitmap_git)
 {
 	struct packed_git *p;
-	int ret = -1;

 	assert(!bitmap_git->map);

 	for (p = get_all_packs(r); p; p = p->next) {
-		if (open_pack_bitmap_1(bitmap_git, p) == 0)
-			ret = 0;
+		if (!open_pack_bitmap_1(bitmap_git, p))
+			return 0;
 	}

-	return ret;
+	return -1;
 }

 static int open_midx_bitmap(struct repository *r,

...but agree that we should wait until after the dust has settled on
this already-complex series.

> >   - We should always be operating on the repository's
> >     r->objects->multi_pack_index, or any other MIDX that can be reached
> >     via walking the `->next` pointers. If we do that consistently, then
> >     we'll only have at most one instance of a MIDX struct corresponding
> >     to each MIDX file on disk.
>
> Certainly that makes sense to me in terms of the Windows "must close the
> current midx before writing" behavior. We have to realize that we're
> operating in the current repo.
>
> But we do allow an "--object-dir" option to "multi-pack-index write",
> and I don't see any other code explicitly requiring that it be part of
> the current repository. What I'm wondering is whether this would be
> breaking:
>
>   cd $REPO/..
>   git multi-pack-index --object-dir $REPO/.git/objects write
>
> or:
>
>   cd /some/other/repo
>   git multi-pack-index --object-dir $REPO/.git/objects write
>
> The latter does seem to work, but the former segfaults (usually -- if
> there's already a midx it is OK).

The former should work, but doesn't, because (as you pointed out to me
in our regular weekly discussion off-list) that the "multi-pack-index"
entry in git.c's commands array has the RUN_SETUP_GENTLY option, and
probably should have RUN_SETUP so that we complain with die() instead of
BUG.

And the latter will continue to work, but only if in your scenario that
$REPO is an alternate of /some/other/repo.

I wrote a little bit more in [1] about this behavior, but the upshot is
that we used to technically support passing *any* directory to
`--object-dir`, including directories that didn't belong to an
alternated repository.

And that will cease to work after the patch that [1] is in response to
is applied. But for the reasons that I explain there, I think that is a
sufficient outcome, because the behavior is kind of bizarre to begin
with.

[1]: https://lore.kernel.org/git/YQMB32fvSiH9julg@nand.local/

Thanks,
Taylor

Jeff King Aug. 12, 2021, 7:59 p.m. UTC | #16

On Thu, Jul 29, 2021 at 03:44:34PM -0400, Taylor Blau wrote:

> >   Side note: IMHO the "check all packs to see if there are any other
> >   bitmaps to warn about" behavior is kind of pointless, and we should
> >   consider just returning as soon as we have one. This is already
> >   somewhat the case after your midx-bitmap patches, as we will not even
> >   bother to look for a pack bitmap after finding a midx bitmap. That is
> >   a good thing, because it means you can keep pack bitmaps around for
> >   flexibility. But let's leave any changes to the pack-only behavior out
> >   of this series for simplicity.
> 
> I agree. I'd be in favor of something like
> [...patch...]

Yep, that looks good. I'd be quite happy if you sent that once the dust
is settled.

> > But we do allow an "--object-dir" option to "multi-pack-index write",
> > and I don't see any other code explicitly requiring that it be part of
> > the current repository. What I'm wondering is whether this would be
> > breaking:
> >
> >   cd $REPO/..
> >   git multi-pack-index --object-dir $REPO/.git/objects write
> >
> > or:
> >
> >   cd /some/other/repo
> >   git multi-pack-index --object-dir $REPO/.git/objects write
> >
> > The latter does seem to work, but the former segfaults (usually -- if
> > there's already a midx it is OK).
> 
> The former should work, but doesn't, because (as you pointed out to me
> in our regular weekly discussion off-list) that the "multi-pack-index"
> entry in git.c's commands array has the RUN_SETUP_GENTLY option, and
> probably should have RUN_SETUP so that we complain with die() instead of
> BUG.
> 
> And the latter will continue to work, but only if in your scenario that
> $REPO is an alternate of /some/other/repo.
> 
> I wrote a little bit more in [1] about this behavior, but the upshot is
> that we used to technically support passing *any* directory to
> `--object-dir`, including directories that didn't belong to an
> alternated repository.
> 
> And that will cease to work after the patch that [1] is in response to
> is applied. But for the reasons that I explain there, I think that is a
> sufficient outcome, because the behavior is kind of bizarre to begin
> with.

Yeah, I think I am comfortable with the change at this point. The only
case that will be broken is one that is quite ridiculous, and I am
surprised worked in the first place. Thanks for talking it through.

-Peff

[v2,08/24] midx: respect 'core.multiPackIndex' when writing

Commit Message

Comments

Patch