diff mbox series

[v2,10/19] midx: teach `fill_midx_entry()` about incremental MIDXs

Message ID 2b335c45ae7832b886ef9adccc2530e4ca53d267.1721250704.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series midx: incremental multi-pack indexes, part one | expand

Commit Message

Taylor Blau July 17, 2024, 9:12 p.m. UTC
In a similar fashion as previous commits, teach the `fill_midx_entry()`
function to work in a incremental MIDX-aware fashion.

This function, unlike others which accept an index into either the
lexical order of objects or packs, takes in an object_id, and attempts
to fill a caller-provided 'struct pack_entry' with the remaining pieces
of information about that object from the MIDX.

The function uses `bsearch_midx()` which fills out the frame-local 'pos'
variable, recording the given object_id's lexical position within the
MIDX chain, if found (if no matching object ID was found, we'll return
immediately without filling out the `pack_entry` structure).

Once given that position, we jump back through the `->base_midx` pointer
to ensure that our `m` points at the MIDX layer which contains the given
object_id (and not an ancestor or descendant of it in the chain). Note
that we can drop the bounds check "if (pos >= m->num_objects)" because
`midx_for_object()` performs this check for us.

After that point, we only need to make two special considerations within
this function:

  - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
    is a position in the concatenated lexical order of packs, so we must
    ensure that we subtract `m->num_packs_in_base` before accessing the
    MIDX-local `packs` array.

  - Second, we must avoid translating the `pos` back to a MIDX-local
    index, since we use it as an argument to `nth_midxed_offset()` which
    expects a position relative to the concatenated lexical order of
    objects.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Jeff King Aug. 1, 2024, 10:12 a.m. UTC | #1
On Wed, Jul 17, 2024 at 05:12:25PM -0400, Taylor Blau wrote:

> After that point, we only need to make two special considerations within
> this function:
> 
>   - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
>     is a position in the concatenated lexical order of packs, so we must
>     ensure that we subtract `m->num_packs_in_base` before accessing the
>     MIDX-local `packs` array.
> 
>   - Second, we must avoid translating the `pos` back to a MIDX-local
>     index, since we use it as an argument to `nth_midxed_offset()` which
>     expects a position relative to the concatenated lexical order of
>     objects.

OK. I think this is correct, but this would be another place where we
could use an nth_midxed_offset_one() function if we had one.

My thinking was that we'd avoid walking back over the midx chain again.
But I guess we don't actually do that, because our midx_for_object()
will have overwritten our "m" variable, as well. So inside
nth_midxed_offset_one() we'll immediately realize that the global
position is inside the midx we passed in. A little extra arithmetic, but
there's no pointer chasing.

-Peff
Taylor Blau Aug. 1, 2024, 8:01 p.m. UTC | #2
On Thu, Aug 01, 2024 at 06:12:15AM -0400, Jeff King wrote:
> On Wed, Jul 17, 2024 at 05:12:25PM -0400, Taylor Blau wrote:
>
> > After that point, we only need to make two special considerations within
> > this function:
> >
> >   - First, the pack_int_id returned to us by `nth_midxed_pack_int_id()`
> >     is a position in the concatenated lexical order of packs, so we must
> >     ensure that we subtract `m->num_packs_in_base` before accessing the
> >     MIDX-local `packs` array.
> >
> >   - Second, we must avoid translating the `pos` back to a MIDX-local
> >     index, since we use it as an argument to `nth_midxed_offset()` which
> >     expects a position relative to the concatenated lexical order of
> >     objects.
>
> OK. I think this is correct, but this would be another place where we
> could use an nth_midxed_offset_one() function if we had one.

Yeah.

> My thinking was that we'd avoid walking back over the midx chain again.
> But I guess we don't actually do that, because our midx_for_object()
> will have overwritten our "m" variable, as well. So inside
> nth_midxed_offset_one() we'll immediately realize that the global
> position is inside the midx we passed in. A little extra arithmetic, but
> there's no pointer chasing.

Yup, exactly.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index f87df6bede..dea78474a2 100644
--- a/midx.c
+++ b/midx.c
@@ -406,14 +406,12 @@  int fill_midx_entry(struct repository *r,
 	if (!bsearch_midx(oid, m, &pos))
 		return 0;
 
-	if (pos >= m->num_objects)
-		return 0;
-
+	midx_for_object(&m, pos);
 	pack_int_id = nth_midxed_pack_int_id(m, pos);
 
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
-	p = m->packs[pack_int_id];
+	p = m->packs[pack_int_id - m->num_packs_in_base];
 
 	/*
 	* We are about to tell the caller where they can locate the