diff mbox series

[1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps

Message ID c753bc379b005ecaf131f8f1ae9c5b80b2712759.1716482279.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series midx-write: miscellaneous clean-ups for incremental MIDXs | expand

Commit Message

Taylor Blau May 23, 2024, 4:38 p.m. UTC
When passing a preferred pack to the MIDX write machinery, we ensure
that the given preferred pack is non-empty since 5d3cd09a808 (midx:
reject empty `--preferred-pack`'s, 2021-08-31).

However packs are only loaded (via `write_midx_internal()`, though a
subsequent patch will refactor this code out to its own function) when
the `MIDX_WRITE_REV_INDEX` flag is set.

So if a caller runs:

    $ git multi-pack-index write --preferred-pack=...

with both (a) an existing MIDX, and (b) specifies a pack from that MIDX
as the preferred one, without passing `--bitmap`, then the check added
in 5d3cd09a808 will result in a segfault.

Note that packs loaded from disk which don't appear in an existing MIDX
do not trigger this issue, as those packs are loaded unconditionally. We
conditionally load packs from a MIDX since we tolerate MIDXs whose
packs do not resolve (i.e., via the MIDX write after removing
unreferenced packs via 'git multi-pack-index expire').

In practice, this isn't possible to trigger when running `git
multi-pack-index write` from via `git repack`, as the latter always
passes `--stdin-packs`, which prevents us from loading an existing MIDX,
as it forces all packs to be read from disk.

But a future commit in this series will change that behavior to
unconditionally load an existing MIDX, even with `--stdin-packs`, making
this behavior trigger-able from 'repack' much more easily.

Prevent this from being an issue by removing the segfault altogether by
calling `prepare_midx_pack()` on packs loaded from an existing MIDX when
either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a
`--preferred-pack`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c                |  8 +++++++-
 t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

Comments

Patrick Steinhardt May 29, 2024, 7:47 a.m. UTC | #1
On Thu, May 23, 2024 at 12:38:03PM -0400, Taylor Blau wrote:
> When passing a preferred pack to the MIDX write machinery, we ensure
> that the given preferred pack is non-empty since 5d3cd09a808 (midx:
> reject empty `--preferred-pack`'s, 2021-08-31).
> 
> However packs are only loaded (via `write_midx_internal()`, though a
> subsequent patch will refactor this code out to its own function) when
> the `MIDX_WRITE_REV_INDEX` flag is set.
> 
> So if a caller runs:
> 
>     $ git multi-pack-index write --preferred-pack=...
> 
> with both (a) an existing MIDX, and (b) specifies a pack from that MIDX
> as the preferred one, without passing `--bitmap`, then the check added
> in 5d3cd09a808 will result in a segfault.

The check you're talking about is the following one, right?

    if (ctx.preferred_pack_idx > -1) {
            struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
            if (!preferred->num_objects) {
                    error(_("cannot select preferred pack %s with no objects"),
                          preferred->pack_name);
                    result = 1;
                    goto cleanup;
            }
    }

And the segfault is because the index wasn't populated, and thus
`ctx.info[ctx.preferred_pack_idx].p == NULL`?

> Note that packs loaded from disk which don't appear in an existing MIDX
> do not trigger this issue, as those packs are loaded unconditionally. We
> conditionally load packs from a MIDX since we tolerate MIDXs whose
> packs do not resolve (i.e., via the MIDX write after removing
> unreferenced packs via 'git multi-pack-index expire').
> 
> In practice, this isn't possible to trigger when running `git
> multi-pack-index write` from via `git repack`, as the latter always

s/from via/via/

> passes `--stdin-packs`, which prevents us from loading an existing MIDX,
> as it forces all packs to be read from disk.
> 
> But a future commit in this series will change that behavior to
> unconditionally load an existing MIDX, even with `--stdin-packs`, making
> this behavior trigger-able from 'repack' much more easily.
> 
> Prevent this from being an issue by removing the segfault altogether by

Removing segfaults is always good :)

> calling `prepare_midx_pack()` on packs loaded from an existing MIDX when
> either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a
> `--preferred-pack`.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx-write.c                |  8 +++++++-
>  t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++
>  2 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/midx-write.c b/midx-write.c
> index 9d096d5a28..03e95ae821 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -930,11 +930,17 @@ static int write_midx_internal(const char *object_dir,
>  		for (i = 0; i < ctx.m->num_packs; i++) {
>  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
>  
> -			if (flags & MIDX_WRITE_REV_INDEX) {
> +			if (flags & MIDX_WRITE_REV_INDEX ||
> +			    preferred_pack_name) {
>  				/*
>  				 * If generating a reverse index, need to have
>  				 * packed_git's loaded to compare their
>  				 * mtimes and object count.
> +				 *
> +				 * If a preferred pack is specified,
> +				 * need to have packed_git's loaded to
> +				 * ensure the chosen preferred pack has
> +				 * a non-zero object count.
>  				 */
>  				if (prepare_midx_pack(the_repository, ctx.m, i)) {
>  					error(_("could not load pack"));

We now end up loading all packs, but in practice it should be sufficient
to only load the preferred pack, right? Is there a particular reason why
we now load all packs?

Patrick
Taylor Blau May 29, 2024, 10:29 p.m. UTC | #2
On Wed, May 29, 2024 at 09:47:52AM +0200, Patrick Steinhardt wrote:
> On Thu, May 23, 2024 at 12:38:03PM -0400, Taylor Blau wrote:
> > When passing a preferred pack to the MIDX write machinery, we ensure
> > that the given preferred pack is non-empty since 5d3cd09a808 (midx:
> > reject empty `--preferred-pack`'s, 2021-08-31).
> >
> > However packs are only loaded (via `write_midx_internal()`, though a
> > subsequent patch will refactor this code out to its own function) when
> > the `MIDX_WRITE_REV_INDEX` flag is set.
> >
> > So if a caller runs:
> >
> >     $ git multi-pack-index write --preferred-pack=...
> >
> > with both (a) an existing MIDX, and (b) specifies a pack from that MIDX
> > as the preferred one, without passing `--bitmap`, then the check added
> > in 5d3cd09a808 will result in a segfault.
>
> The check you're talking about is the following one, right?
>
>     if (ctx.preferred_pack_idx > -1) {
>             struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
>             if (!preferred->num_objects) {
>                     error(_("cannot select preferred pack %s with no objects"),
>                           preferred->pack_name);
>                     result = 1;
>                     goto cleanup;
>             }
>     }
>
> And the segfault is because the index wasn't populated, and thus
> `ctx.info[ctx.preferred_pack_idx].p == NULL`?

Exactly.

> > Note that packs loaded from disk which don't appear in an existing MIDX
> > do not trigger this issue, as those packs are loaded unconditionally. We
> > conditionally load packs from a MIDX since we tolerate MIDXs whose
> > packs do not resolve (i.e., via the MIDX write after removing
> > unreferenced packs via 'git multi-pack-index expire').
> >
> > In practice, this isn't possible to trigger when running `git
> > multi-pack-index write` from via `git repack`, as the latter always
>
> s/from via/via/

Ah, oops, thanks.

> > passes `--stdin-packs`, which prevents us from loading an existing MIDX,
> > as it forces all packs to be read from disk.
> >
> > But a future commit in this series will change that behavior to
> > unconditionally load an existing MIDX, even with `--stdin-packs`, making
> > this behavior trigger-able from 'repack' much more easily.
> >
> > Prevent this from being an issue by removing the segfault altogether by
>
> Removing segfaults is always good :)
>
> > calling `prepare_midx_pack()` on packs loaded from an existing MIDX when
> > either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a
> > `--preferred-pack`.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  midx-write.c                |  8 +++++++-
> >  t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++
> >  2 files changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/midx-write.c b/midx-write.c
> > index 9d096d5a28..03e95ae821 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -930,11 +930,17 @@ static int write_midx_internal(const char *object_dir,
> >  		for (i = 0; i < ctx.m->num_packs; i++) {
> >  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> >
> > -			if (flags & MIDX_WRITE_REV_INDEX) {
> > +			if (flags & MIDX_WRITE_REV_INDEX ||
> > +			    preferred_pack_name) {
> >  				/*
> >  				 * If generating a reverse index, need to have
> >  				 * packed_git's loaded to compare their
> >  				 * mtimes and object count.
> > +				 *
> > +				 * If a preferred pack is specified,
> > +				 * need to have packed_git's loaded to
> > +				 * ensure the chosen preferred pack has
> > +				 * a non-zero object count.
> >  				 */
> >  				if (prepare_midx_pack(the_repository, ctx.m, i)) {
> >  					error(_("could not load pack"));
>
> We now end up loading all packs, but in practice it should be sufficient
> to only load the preferred pack, right? Is there a particular reason why
> we now load all packs?

I had originally considered that, but discarded the idea because it
seemed like it might be fragile to depend on those two points in the
code having the exact same notion of what the preferred pack is.

Since it's cheap enough to load all of these packs in unconditionally
anyways, that's the route that I ended up taking here.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/midx-write.c b/midx-write.c
index 9d096d5a28..03e95ae821 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -930,11 +930,17 @@  static int write_midx_internal(const char *object_dir,
 		for (i = 0; i < ctx.m->num_packs; i++) {
 			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
 
-			if (flags & MIDX_WRITE_REV_INDEX) {
+			if (flags & MIDX_WRITE_REV_INDEX ||
+			    preferred_pack_name) {
 				/*
 				 * If generating a reverse index, need to have
 				 * packed_git's loaded to compare their
 				 * mtimes and object count.
+				 *
+				 * If a preferred pack is specified,
+				 * need to have packed_git's loaded to
+				 * ensure the chosen preferred pack has
+				 * a non-zero object count.
 				 */
 				if (prepare_midx_pack(the_repository, ctx.m, i)) {
 					error(_("could not load pack"));
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index dd09134db0..10d2a6bf92 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -350,6 +350,29 @@  test_expect_success 'preferred packs must be non-empty' '
 	)
 '
 
+test_expect_success 'preferred pack from existing MIDX without bitmaps' '
+	git init preferred-without-bitmaps &&
+	(
+		cd preferred-without-bitmaps &&
+
+		test_commit one &&
+		pack="$(git pack-objects --all $objdir/pack/pack </dev/null)" &&
+
+		git multi-pack-index write &&
+
+		# make another pack so that the subsequent MIDX write
+		# has something to do
+		test_commit two &&
+		git repack -d &&
+
+		# write a new MIDX without bitmaps reusing the singular
+		# pack from the existing MIDX as the preferred pack in
+		# the new MIDX
+		git multi-pack-index write --preferred-pack=pack-$pack.pack
+	)
+
+'
+
 test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '