diff mbox series

[2/5] pack-bitmap: tag bitmapped packs with their corresponding MIDX

Message ID 1838bbcf7fe6daa58a7db78b81a2d08138fe176e.1724793201.git.me@ttaylorr.com (mailing list archive)
State Accepted
Commit 41cd4b478f7ee0a9049601afdcd7872bdbeae519
Headers show
Series pack-objects: brown-paper-bag fixes for multi-pack reuse | expand

Commit Message

Taylor Blau Aug. 27, 2024, 9:13 p.m. UTC
The next commit will need to use the bitmap's MIDX (if one exists) to
translate bit positions into pack-relative positions in the source pack.

Ordinarily, we'd use the "midx" field of the bitmap_index struct. But
since that struct is defined within pack-bitmap.c, and our caller is in
a separate compilation unit, we do not have access to the MIDX field.

Instead, add a "from_midx" field to the bitmapped_pack structure so that
we can use that piece of data from outside of pack-bitmap.c. The caller
that uses this new piece of information will be added in the following
commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c        | 1 +
 pack-bitmap.c | 1 +
 pack-bitmap.h | 1 +
 3 files changed, 3 insertions(+)

Comments

Junio C Hamano Aug. 28, 2024, 12:14 a.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> The next commit will need to use the bitmap's MIDX (if one exists) to
> translate bit positions into pack-relative positions in the source pack.
>
> Ordinarily, we'd use the "midx" field of the bitmap_index struct. But
> since that struct is defined within pack-bitmap.c, and our caller is in
> a separate compilation unit, we do not have access to the MIDX field.
>
> Instead, add a "from_midx" field to the bitmapped_pack structure so that
> we can use that piece of data from outside of pack-bitmap.c. The caller
> that uses this new piece of information will be added in the following
> commit.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c        | 1 +
>  pack-bitmap.c | 1 +
>  pack-bitmap.h | 1 +
>  3 files changed, 3 insertions(+)
>
> diff --git a/midx.c b/midx.c
> index ca98bfd7c6..67e0d64004 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -496,6 +496,7 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
>  				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
>  				 sizeof(uint32_t));
>  	bp->pack_int_id = pack_int_id;
> +	bp->from_midx = m;

Do multi_pack_index objects live as long as bitmapped_pack objects
that point at them live?  If m later goes away without letting the
bitmapped_pack know about it, the borrowed pointer in from_midx
would become dangling, which is not what we want to see.

"None of these objects are released or relocated while we are
running pack-objects, so once the .from_midx member is assigned
here, it will always be pointing at a valid multi_pack_index object"
is a satisfactory answer, I guess.
Taylor Blau Aug. 29, 2024, 6:58 p.m. UTC | #2
On Tue, Aug 27, 2024 at 05:14:14PM -0700, Junio C Hamano wrote:
> > diff --git a/midx.c b/midx.c
> > index ca98bfd7c6..67e0d64004 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -496,6 +496,7 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
> >  				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
> >  				 sizeof(uint32_t));
> >  	bp->pack_int_id = pack_int_id;
> > +	bp->from_midx = m;
>
> Do multi_pack_index objects live as long as bitmapped_pack objects
> that point at them live?  If m later goes away without letting the
> bitmapped_pack know about it, the borrowed pointer in from_midx
> would become dangling, which is not what we want to see.
>
> "None of these objects are released or relocated while we are
> running pack-objects, so once the .from_midx member is assigned
> here, it will always be pointing at a valid multi_pack_index object"
> is a satisfactory answer, I guess.

Good question, and good answer ;-).

This is only relevant in a read-only path where we're generating a new
pack from existing packs and not altering those pack or rewriting /
deleting the MIDX attached to them. So I think we're OK here and don't
have any lifetime/scope issues.

Thanks,
Taylor
Jeff King Sept. 5, 2024, 9 a.m. UTC | #3
On Thu, Aug 29, 2024 at 02:58:19PM -0400, Taylor Blau wrote:

> On Tue, Aug 27, 2024 at 05:14:14PM -0700, Junio C Hamano wrote:
> > > diff --git a/midx.c b/midx.c
> > > index ca98bfd7c6..67e0d64004 100644
> > > --- a/midx.c
> > > +++ b/midx.c
> > > @@ -496,6 +496,7 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
> > >  				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
> > >  				 sizeof(uint32_t));
> > >  	bp->pack_int_id = pack_int_id;
> > > +	bp->from_midx = m;
> >
> > Do multi_pack_index objects live as long as bitmapped_pack objects
> > that point at them live?  If m later goes away without letting the
> > bitmapped_pack know about it, the borrowed pointer in from_midx
> > would become dangling, which is not what we want to see.
> >
> > "None of these objects are released or relocated while we are
> > running pack-objects, so once the .from_midx member is assigned
> > here, it will always be pointing at a valid multi_pack_index object"
> > is a satisfactory answer, I guess.
> 
> Good question, and good answer ;-).
> 
> This is only relevant in a read-only path where we're generating a new
> pack from existing packs and not altering those pack or rewriting /
> deleting the MIDX attached to them. So I think we're OK here and don't
> have any lifetime/scope issues.

Do we ever close/reopen the midx? For example, if a simultaneous process
wrote a new one and we triggered reprepare_packed_git()?

I think the answer is "no"; we might read a new midx, but we never ditch
the old one (just like we do for packed_git structs). Which I suspect is
needed even before this patch, since various other parts of the bitmap
code (and probably others) rely on the struct staying in place across as
we read many objects.

-Peff
Taylor Blau Sept. 17, 2024, 9:58 a.m. UTC | #4
On Thu, Sep 05, 2024 at 05:00:24AM -0400, Jeff King wrote:
> On Thu, Aug 29, 2024 at 02:58:19PM -0400, Taylor Blau wrote:
>
> > On Tue, Aug 27, 2024 at 05:14:14PM -0700, Junio C Hamano wrote:
> > > > diff --git a/midx.c b/midx.c
> > > > index ca98bfd7c6..67e0d64004 100644
> > > > --- a/midx.c
> > > > +++ b/midx.c
> > > > @@ -496,6 +496,7 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
> > > >  				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
> > > >  				 sizeof(uint32_t));
> > > >  	bp->pack_int_id = pack_int_id;
> > > > +	bp->from_midx = m;
> > >
> > > Do multi_pack_index objects live as long as bitmapped_pack objects
> > > that point at them live?  If m later goes away without letting the
> > > bitmapped_pack know about it, the borrowed pointer in from_midx
> > > would become dangling, which is not what we want to see.
> > >
> > > "None of these objects are released or relocated while we are
> > > running pack-objects, so once the .from_midx member is assigned
> > > here, it will always be pointing at a valid multi_pack_index object"
> > > is a satisfactory answer, I guess.
> >
> > Good question, and good answer ;-).
> >
> > This is only relevant in a read-only path where we're generating a new
> > pack from existing packs and not altering those pack or rewriting /
> > deleting the MIDX attached to them. So I think we're OK here and don't
> > have any lifetime/scope issues.
>
> Do we ever close/reopen the midx? For example, if a simultaneous process
> wrote a new one and we triggered reprepare_packed_git()?
>
> I think the answer is "no"; we might read a new midx, but we never ditch
> the old one (just like we do for packed_git structs). Which I suspect is
> needed even before this patch, since various other parts of the bitmap
> code (and probably others) rely on the struct staying in place across as
> we read many objects.

I think that we *could* close and reopen the MIDX via a call to
close_object_store() (which dispatches a call to close_midx(), before
NULL-ing out o->multi_pack_index) and then calling prepare_packed_git()
or similar.

But I'm not aware of any such codepaths that deal with pack reuse and
calling nth_bitmapped_pack that would do such a thing, so I think that
we're OK here.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index ca98bfd7c6..67e0d64004 100644
--- a/midx.c
+++ b/midx.c
@@ -496,6 +496,7 @@  int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 				 MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id +
 				 sizeof(uint32_t));
 	bp->pack_int_id = pack_int_id;
+	bp->from_midx = m;
 
 	return 0;
 }
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2e657a2aa4..218d7ac2eb 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2322,6 +2322,7 @@  void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 		packs[packs_nr].pack_int_id = pack_int_id;
 		packs[packs_nr].bitmap_nr = pack->num_objects;
 		packs[packs_nr].bitmap_pos = 0;
+		packs[packs_nr].from_midx = bitmap_git->midx;
 
 		objects_nr = packs[packs_nr++].bitmap_nr;
 	}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index ff0fd815b8..d7f4b8b8e9 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -60,6 +60,7 @@  struct bitmapped_pack {
 	uint32_t bitmap_pos;
 	uint32_t bitmap_nr;
 
+	struct multi_pack_index *from_midx; /* MIDX only */
 	uint32_t pack_int_id; /* MIDX only */
 };