diff mbox series

[v2,05/19] midx: teach `nth_midxed_object_oid()` about incremental MIDXs

Message ID ec57ff434900f2b95e31fbdf854b5ebbf46b5c78.1721250704.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series midx: incremental multi-pack indexes, part one | expand

Commit Message

Taylor Blau July 17, 2024, 9:12 p.m. UTC
The function `nth_midxed_object_oid()` returns the object ID for a given
object position in the MIDX lexicographic order.

Teach this function to instead operate over the concatenated
lexicographic order defined in an earlier step so that it is able to be
used with incremental MIDXs.

To do this, we need to both (a) adjust the bounds check for the given
'n', as well as record the MIDX-local position after chasing the
`->base_midx` pointer to find the MIDX which contains that object.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jeff King Aug. 1, 2024, 9:38 a.m. UTC | #1
On Wed, Jul 17, 2024 at 05:12:10PM -0400, Taylor Blau wrote:

> The function `nth_midxed_object_oid()` returns the object ID for a given
> object position in the MIDX lexicographic order.
> 
> Teach this function to instead operate over the concatenated
> lexicographic order defined in an earlier step so that it is able to be
> used with incremental MIDXs.
> 
> To do this, we need to both (a) adjust the bounds check for the given
> 'n', as well as record the MIDX-local position after chasing the
> `->base_midx` pointer to find the MIDX which contains that object.

Yep, this makes sense. The hard thing about reviewing this, I think, is
that each individual step like this is going to make sense, but I'll
have very little clue what spots (if any) were missed.

To some degree I think the proof will be in the pudding. If you missed
any helpers, then the end result is going to crash and burn quite badly
when used with a chained midx, and we'd see it in the test suite. And
the nice thing is that most of this is abstracted inside these helpers,
so we know the set of tricky places is generally limited to the helpers,
and not arbitrary bits of midx code.

-Peff
Taylor Blau Aug. 1, 2024, 7:03 p.m. UTC | #2
On Thu, Aug 01, 2024 at 05:38:05AM -0400, Jeff King wrote:
> To some degree I think the proof will be in the pudding. If you missed
> any helpers, then the end result is going to crash and burn quite badly
> when used with a chained midx, and we'd see it in the test suite. And
> the nice thing is that most of this is abstracted inside these helpers,
> so we know the set of tricky places is generally limited to the helpers,
> and not arbitrary bits of midx code.

I think back in the old days, I might have considered experimenting with
GitHub's fork of Git to build up this feature before sharing it with the
list.

But experience has taught me that it's far better to share early and
often. Doing so gets you the benefit of having many eyes on the feature,
not just from individuals at GitHub.

Selfishly, it also reduces the pain of having to change some on-disk
format that has already been widely rolled out within GitHub's
infrastructure and similar, but the former motivation is much more
compelling.

In terms of "the pudding" here, I think that marking this feature as
experimental / incomplete is a good way for us to push this forward and
build up some real-world experience with brave users via a tagged
version of Git. Then we can refine it until we are confident it has
graduated the "experimental" phase.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/midx.c b/midx.c
index da5e0bb940..d470a88755 100644
--- a/midx.c
+++ b/midx.c
@@ -337,9 +337,11 @@  struct object_id *nth_midxed_object_oid(struct object_id *oid,
 					struct multi_pack_index *m,
 					uint32_t n)
 {
-	if (n >= m->num_objects)
+	if (n >= m->num_objects + m->num_objects_in_base)
 		return NULL;
 
+	n = midx_for_object(&m, n);
+
 	oidread(oid, m->chunk_oid_lookup + st_mult(m->hash_len, n),
 		the_repository->hash_algo);
 	return oid;