Message ID | c753bc379b005ecaf131f8f1ae9c5b80b2712759.1716482279.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | midx-write: miscellaneous clean-ups for incremental MIDXs | expand |
On Thu, May 23, 2024 at 12:38:03PM -0400, Taylor Blau wrote: > When passing a preferred pack to the MIDX write machinery, we ensure > that the given preferred pack is non-empty since 5d3cd09a808 (midx: > reject empty `--preferred-pack`'s, 2021-08-31). > > However packs are only loaded (via `write_midx_internal()`, though a > subsequent patch will refactor this code out to its own function) when > the `MIDX_WRITE_REV_INDEX` flag is set. > > So if a caller runs: > > $ git multi-pack-index write --preferred-pack=... > > with both (a) an existing MIDX, and (b) specifies a pack from that MIDX > as the preferred one, without passing `--bitmap`, then the check added > in 5d3cd09a808 will result in a segfault. The check you're talking about is the following one, right? if (ctx.preferred_pack_idx > -1) { struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p; if (!preferred->num_objects) { error(_("cannot select preferred pack %s with no objects"), preferred->pack_name); result = 1; goto cleanup; } } And the segfault is because the index wasn't populated, and thus `ctx.info[ctx.preferred_pack_idx].p == NULL`? > Note that packs loaded from disk which don't appear in an existing MIDX > do not trigger this issue, as those packs are loaded unconditionally. We > conditionally load packs from a MIDX since we tolerate MIDXs whose > packs do not resolve (i.e., via the MIDX write after removing > unreferenced packs via 'git multi-pack-index expire'). > > In practice, this isn't possible to trigger when running `git > multi-pack-index write` from via `git repack`, as the latter always s/from via/via/ > passes `--stdin-packs`, which prevents us from loading an existing MIDX, > as it forces all packs to be read from disk. > > But a future commit in this series will change that behavior to > unconditionally load an existing MIDX, even with `--stdin-packs`, making > this behavior trigger-able from 'repack' much more easily. > > Prevent this from being an issue by removing the segfault altogether by Removing segfaults is always good :) > calling `prepare_midx_pack()` on packs loaded from an existing MIDX when > either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a > `--preferred-pack`. > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > midx-write.c | 8 +++++++- > t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++ > 2 files changed, 30 insertions(+), 1 deletion(-) > > diff --git a/midx-write.c b/midx-write.c > index 9d096d5a28..03e95ae821 100644 > --- a/midx-write.c > +++ b/midx-write.c > @@ -930,11 +930,17 @@ static int write_midx_internal(const char *object_dir, > for (i = 0; i < ctx.m->num_packs; i++) { > ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); > > - if (flags & MIDX_WRITE_REV_INDEX) { > + if (flags & MIDX_WRITE_REV_INDEX || > + preferred_pack_name) { > /* > * If generating a reverse index, need to have > * packed_git's loaded to compare their > * mtimes and object count. > + * > + * If a preferred pack is specified, > + * need to have packed_git's loaded to > + * ensure the chosen preferred pack has > + * a non-zero object count. > */ > if (prepare_midx_pack(the_repository, ctx.m, i)) { > error(_("could not load pack")); We now end up loading all packs, but in practice it should be sufficient to only load the preferred pack, right? Is there a particular reason why we now load all packs? Patrick
On Wed, May 29, 2024 at 09:47:52AM +0200, Patrick Steinhardt wrote: > On Thu, May 23, 2024 at 12:38:03PM -0400, Taylor Blau wrote: > > When passing a preferred pack to the MIDX write machinery, we ensure > > that the given preferred pack is non-empty since 5d3cd09a808 (midx: > > reject empty `--preferred-pack`'s, 2021-08-31). > > > > However packs are only loaded (via `write_midx_internal()`, though a > > subsequent patch will refactor this code out to its own function) when > > the `MIDX_WRITE_REV_INDEX` flag is set. > > > > So if a caller runs: > > > > $ git multi-pack-index write --preferred-pack=... > > > > with both (a) an existing MIDX, and (b) specifies a pack from that MIDX > > as the preferred one, without passing `--bitmap`, then the check added > > in 5d3cd09a808 will result in a segfault. > > The check you're talking about is the following one, right? > > if (ctx.preferred_pack_idx > -1) { > struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p; > if (!preferred->num_objects) { > error(_("cannot select preferred pack %s with no objects"), > preferred->pack_name); > result = 1; > goto cleanup; > } > } > > And the segfault is because the index wasn't populated, and thus > `ctx.info[ctx.preferred_pack_idx].p == NULL`? Exactly. > > Note that packs loaded from disk which don't appear in an existing MIDX > > do not trigger this issue, as those packs are loaded unconditionally. We > > conditionally load packs from a MIDX since we tolerate MIDXs whose > > packs do not resolve (i.e., via the MIDX write after removing > > unreferenced packs via 'git multi-pack-index expire'). > > > > In practice, this isn't possible to trigger when running `git > > multi-pack-index write` from via `git repack`, as the latter always > > s/from via/via/ Ah, oops, thanks. > > passes `--stdin-packs`, which prevents us from loading an existing MIDX, > > as it forces all packs to be read from disk. > > > > But a future commit in this series will change that behavior to > > unconditionally load an existing MIDX, even with `--stdin-packs`, making > > this behavior trigger-able from 'repack' much more easily. > > > > Prevent this from being an issue by removing the segfault altogether by > > Removing segfaults is always good :) > > > calling `prepare_midx_pack()` on packs loaded from an existing MIDX when > > either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a > > `--preferred-pack`. > > > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > > --- > > midx-write.c | 8 +++++++- > > t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++ > > 2 files changed, 30 insertions(+), 1 deletion(-) > > > > diff --git a/midx-write.c b/midx-write.c > > index 9d096d5a28..03e95ae821 100644 > > --- a/midx-write.c > > +++ b/midx-write.c > > @@ -930,11 +930,17 @@ static int write_midx_internal(const char *object_dir, > > for (i = 0; i < ctx.m->num_packs; i++) { > > ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); > > > > - if (flags & MIDX_WRITE_REV_INDEX) { > > + if (flags & MIDX_WRITE_REV_INDEX || > > + preferred_pack_name) { > > /* > > * If generating a reverse index, need to have > > * packed_git's loaded to compare their > > * mtimes and object count. > > + * > > + * If a preferred pack is specified, > > + * need to have packed_git's loaded to > > + * ensure the chosen preferred pack has > > + * a non-zero object count. > > */ > > if (prepare_midx_pack(the_repository, ctx.m, i)) { > > error(_("could not load pack")); > > We now end up loading all packs, but in practice it should be sufficient > to only load the preferred pack, right? Is there a particular reason why > we now load all packs? I had originally considered that, but discarded the idea because it seemed like it might be fragile to depend on those two points in the code having the exact same notion of what the preferred pack is. Since it's cheap enough to load all of these packs in unconditionally anyways, that's the route that I ended up taking here. Thanks, Taylor
diff --git a/midx-write.c b/midx-write.c index 9d096d5a28..03e95ae821 100644 --- a/midx-write.c +++ b/midx-write.c @@ -930,11 +930,17 @@ static int write_midx_internal(const char *object_dir, for (i = 0; i < ctx.m->num_packs; i++) { ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); - if (flags & MIDX_WRITE_REV_INDEX) { + if (flags & MIDX_WRITE_REV_INDEX || + preferred_pack_name) { /* * If generating a reverse index, need to have * packed_git's loaded to compare their * mtimes and object count. + * + * If a preferred pack is specified, + * need to have packed_git's loaded to + * ensure the chosen preferred pack has + * a non-zero object count. */ if (prepare_midx_pack(the_repository, ctx.m, i)) { error(_("could not load pack")); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index dd09134db0..10d2a6bf92 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -350,6 +350,29 @@ test_expect_success 'preferred packs must be non-empty' ' ) ' +test_expect_success 'preferred pack from existing MIDX without bitmaps' ' + git init preferred-without-bitmaps && + ( + cd preferred-without-bitmaps && + + test_commit one && + pack="$(git pack-objects --all $objdir/pack/pack </dev/null)" && + + git multi-pack-index write && + + # make another pack so that the subsequent MIDX write + # has something to do + test_commit two && + git repack -d && + + # write a new MIDX without bitmaps reusing the singular + # pack from the existing MIDX as the preferred pack in + # the new MIDX + git multi-pack-index write --preferred-pack=pack-$pack.pack + ) + +' + test_expect_success 'verify multi-pack-index success' ' git multi-pack-index verify --object-dir=$objdir '
When passing a preferred pack to the MIDX write machinery, we ensure that the given preferred pack is non-empty since 5d3cd09a808 (midx: reject empty `--preferred-pack`'s, 2021-08-31). However packs are only loaded (via `write_midx_internal()`, though a subsequent patch will refactor this code out to its own function) when the `MIDX_WRITE_REV_INDEX` flag is set. So if a caller runs: $ git multi-pack-index write --preferred-pack=... with both (a) an existing MIDX, and (b) specifies a pack from that MIDX as the preferred one, without passing `--bitmap`, then the check added in 5d3cd09a808 will result in a segfault. Note that packs loaded from disk which don't appear in an existing MIDX do not trigger this issue, as those packs are loaded unconditionally. We conditionally load packs from a MIDX since we tolerate MIDXs whose packs do not resolve (i.e., via the MIDX write after removing unreferenced packs via 'git multi-pack-index expire'). In practice, this isn't possible to trigger when running `git multi-pack-index write` from via `git repack`, as the latter always passes `--stdin-packs`, which prevents us from loading an existing MIDX, as it forces all packs to be read from disk. But a future commit in this series will change that behavior to unconditionally load an existing MIDX, even with `--stdin-packs`, making this behavior trigger-able from 'repack' much more easily. Prevent this from being an issue by removing the segfault altogether by calling `prepare_midx_pack()` on packs loaded from an existing MIDX when either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a `--preferred-pack`. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx-write.c | 8 +++++++- t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++ 2 files changed, 30 insertions(+), 1 deletion(-)