diff mbox series

[v2,13/19] midx: teach `midx_preferred_pack()` about incremental MIDXs

Message ID 38b642d40423f2e14dd47d676de54f264b9d6f22.1721250704.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series midx: incremental multi-pack indexes, part one | expand

Commit Message

Taylor Blau July 17, 2024, 9:12 p.m. UTC
The function `midx_preferred_pack()` is used to determine the identity
of the preferred pack, which is the identity of a unique pack within
the MIDX which is used as a tie-breaker when selecting from which pack
to represent an object that appears in multiple packs within the MIDX.

Historically we have said that the MIDX's preferred pack has the unique
property that all objects from that pack are represented in the MIDX.
But that isn't quite true: a more precise statement would be that all
objects from that pack *which appear in the MIDX* are selected from that
pack.

This helps us extend the concept of preferred packs across a MIDX chain,
where some object(s) in the preferred pack may appear in other packs
in an earlier MIDX layer, in which case those object(s) will not appear
in a subsequent MIDX layer from either the preferred pack or any other
pack.

Extend the concept of preferred packs by using the pack which represents
the object at the first position in MIDX pseudo-pack order belonging to
the current MIDX layer (i.e., at position 'm->num_objects_in_base').

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Jeff King Aug. 1, 2024, 10:25 a.m. UTC | #1
On Wed, Jul 17, 2024 at 05:12:35PM -0400, Taylor Blau wrote:

> The function `midx_preferred_pack()` is used to determine the identity
> of the preferred pack, which is the identity of a unique pack within
> the MIDX which is used as a tie-breaker when selecting from which pack
> to represent an object that appears in multiple packs within the MIDX.
> 
> Historically we have said that the MIDX's preferred pack has the unique
> property that all objects from that pack are represented in the MIDX.
> But that isn't quite true: a more precise statement would be that all
> objects from that pack *which appear in the MIDX* are selected from that
> pack.
> 
> This helps us extend the concept of preferred packs across a MIDX chain,
> where some object(s) in the preferred pack may appear in other packs
> in an earlier MIDX layer, in which case those object(s) will not appear
> in a subsequent MIDX layer from either the preferred pack or any other
> pack.

OK, that matches my intuition for how the preferred concept should
exist. I'm not quite clear on how that will affect the code, though.

> diff --git a/midx.c b/midx.c
> index 0fa8febb9d..d2dbea41e4 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -500,13 +500,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
>  int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
>  {
>  	if (m->preferred_pack_idx == -1) {
> +		uint32_t midx_pos;
>  		if (load_midx_revindex(m) < 0) {
>  			m->preferred_pack_idx = -2;
>  			return -1;
>  		}
>  
> -		m->preferred_pack_idx =
> -			nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
> +		midx_pos = pack_pos_to_midx(m, m->num_objects_in_base);
> +
> +		m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos);
> +

OK, so rather than looking for the pack of object 0, we're looking for
the first one in _this_ layer, since the position is global within the
midx. That makes some sense, but is pack_pos_to_midx() ready for that?
It looks like it just looks at m->revindex_data. Are we going to be
generating a revindex for the whole chain? I'd think that each layer
would have its own revindex (and any trickiness would happen at the
generation stage, making sure we don't insert objects that are already
mentioned in earlier layers).

-Peff
Taylor Blau Aug. 1, 2024, 8:05 p.m. UTC | #2
On Thu, Aug 01, 2024 at 06:25:17AM -0400, Jeff King wrote:
> > diff --git a/midx.c b/midx.c
> > index 0fa8febb9d..d2dbea41e4 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -500,13 +500,16 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
> >  int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
> >  {
> >  	if (m->preferred_pack_idx == -1) {
> > +		uint32_t midx_pos;
> >  		if (load_midx_revindex(m) < 0) {
> >  			m->preferred_pack_idx = -2;
> >  			return -1;
> >  		}
> >
> > -		m->preferred_pack_idx =
> > -			nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
> > +		midx_pos = pack_pos_to_midx(m, m->num_objects_in_base);
> > +
> > +		m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos);
> > +
>
> OK, so rather than looking for the pack of object 0, we're looking for
> the first one in _this_ layer, since the position is global within the
> midx. That makes some sense, but is pack_pos_to_midx() ready for that?
> It looks like it just looks at m->revindex_data. Are we going to be
> generating a revindex for the whole chain? I'd think that each layer
> would have its own revindex (and any trickiness would happen at the
> generation stage, making sure we don't insert objects that are already
> mentioned in earlier layers).

pack_pos_to_midx() is kind of ready, and kind of not.

The way that the pseudo-pack order is constructed within the
midx-write.c code, we will write reverse indexes (within each MIDX layer
itself as a separate chunk) that contain data for each object within
that layer in the expected reverse index format.

But we don't bother writing any reverse indexes for MIDXs which are
incremental at this point in the multi-series plan, since we just bail
if the BITMAP flag is set (saying that it is unsupported at this point).

Arguably we could have just left this hunk / patch out of the series as
a whole. It's this kind of stuff that's really at the boundary between
adjacent "phases" that I think is awkward no matter which way you slice
it.

> -Peff

Thanks,
Taylor
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index 0fa8febb9d..d2dbea41e4 100644
--- a/midx.c
+++ b/midx.c
@@ -500,13 +500,16 @@  int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name)
 int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
+		uint32_t midx_pos;
 		if (load_midx_revindex(m) < 0) {
 			m->preferred_pack_idx = -2;
 			return -1;
 		}
 
-		m->preferred_pack_idx =
-			nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0));
+		midx_pos = pack_pos_to_midx(m, m->num_objects_in_base);
+
+		m->preferred_pack_idx = nth_midxed_pack_int_id(m, midx_pos);
+
 	} else if (m->preferred_pack_idx == -2)
 		return -1; /* no revindex */