Message ID | 40cff5beb50cdfbd13ae7f6017152f2628b25814.1627420428.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | multi-pack reachability bitmaps | expand |
On Tue, Jul 27, 2021 at 05:19:46PM -0400, Taylor Blau wrote: > @@ -914,10 +915,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * > die_errno(_("unable to create leading directories of %s"), > midx_name); > > - if (m) > - ctx.m = m; > - else > - ctx.m = load_multi_pack_index(object_dir, 1); > + for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) { > + if (!strcmp(object_dir, cur->object_dir)) { > + ctx.m = cur; > + break; > + } > + } > + if (!ctx.m) > + ctx.m = get_local_multi_pack_index(the_repository); Oops, the `if (!ctx.m)` part of this diff is just plain wrong. I think that I had in my mind that some callers don't pass object_dir, and so that we should fall-back to the local MIDX in that case. And so I probably meant to write `if (!object_dir && !ctx.m)` instead. But, all of the callers *do* pass the result of get_object_directory(), so we don't need to do anything of the sort. On a related note, though, a side-effect of this change is that this makes it no longer possible to do git multi-pack-index write --object-dir=/not/an/alternate.git/objects since get_local_multi_pack_index() will only populate the MIDXs in alternate object stores. We never enforced that `--object-dir` must point to an alternate, but the documentation uses `<alt>` to describe the argument to this flag, and accepting arbitrary non-alternate paths seems like a footgun to me. So I'm OK with "breaking" that behavior, as long as nobody complains loudly. Obviously it makes the fix easier to write, but I'd argue that the behavior we're losing is worth getting rid of anyway. Thanks, Taylor
On Tue, Jul 27, 2021 at 05:19:46PM -0400, Taylor Blau wrote: > Opening multiple instance of the same MIDX can lead to problems like two > separate packed_git structures which represent the same pack being added > to the repository's object store. > [...] Thanks, I think this approach fixes all of the potential problems from our earlier discussion. You already noted the "!ctx->m" thing in a follow-up. But also... > Likewise, replace the call to `close_midx()` with > `close_object_store()`, since we're about to replace the MIDX with a new > one and should invalidate the object store's memory of any MIDX that > might have existed beforehand. Yes, I agree we need to do this, but I don't see the change in the patch. Did something get lost in the rebasing/squashing process? I think we'd need something like this: diff --git a/midx.c b/midx.c index 6dfafe7a8c..bfb6afea2e 100644 --- a/midx.c +++ b/midx.c @@ -1123,8 +1123,7 @@ static int write_midx_internal(const char *object_dir, hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk)); - if (ctx.m) - close_midx(ctx.m); + close_object_store(the_repository->objects); if (ctx.nr - dropped_packs == 0) { error(_("no pack files to index.")); though I'm not sure: - if this should be unconditional or dependent on ctx.m (I think the latter, because if we are renaming over any open midx, we would have filled in ctx.m earlier). - if this should go below the "no pack files to index" check (i.e., is there any point in closing if we know we will not write?). In fact, its purpose might be more obvious right before finalize_hashfile(), but I am OK either way on that. -Peff
On Thu, Aug 12, 2021 at 04:15:32PM -0400, Jeff King wrote: > I think we'd need something like this: > > diff --git a/midx.c b/midx.c > index 6dfafe7a8c..bfb6afea2e 100644 > --- a/midx.c > +++ b/midx.c > @@ -1123,8 +1123,7 @@ static int write_midx_internal(const char *object_dir, > hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); > f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk)); > > - if (ctx.m) > - close_midx(ctx.m); > + close_object_store(the_repository->objects); > > if (ctx.nr - dropped_packs == 0) { > error(_("no pack files to index.")); > > though I'm not sure: > > - if this should be unconditional or dependent on ctx.m (I think the > latter, because if we are renaming over any open midx, we would have > filled in ctx.m earlier). > > - if this should go below the "no pack files to index" check (i.e., is > there any point in closing if we know we will not write?). In fact, > its purpose might be more obvious right before finalize_hashfile(), > but I am OK either way on that. Ah, this close_midx() actually gets moved and made unconditional later in the series. But it still needs to be close_object_store() instead. Also, my mention of finalize_hashfile() is wrong. It's commit_lock_file() that does the actual rename, and indeed, that's where you moved it to in the end, which is good. -Peff > > -Peff
On Thu, Aug 12, 2021 at 04:22:29PM -0400, Jeff King wrote: > On Thu, Aug 12, 2021 at 04:15:32PM -0400, Jeff King wrote: > > > I think we'd need something like this: > > > > diff --git a/midx.c b/midx.c > > index 6dfafe7a8c..bfb6afea2e 100644 > > --- a/midx.c > > +++ b/midx.c > > @@ -1123,8 +1123,7 @@ static int write_midx_internal(const char *object_dir, > > hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); > > f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk)); > > > > - if (ctx.m) > > - close_midx(ctx.m); > > + close_object_store(the_repository->objects); > > > > if (ctx.nr - dropped_packs == 0) { > > error(_("no pack files to index.")); > > > > though I'm not sure: > > > > - if this should be unconditional or dependent on ctx.m (I think the > > latter, because if we are renaming over any open midx, we would have > > filled in ctx.m earlier). > > > > - if this should go below the "no pack files to index" check (i.e., is > > there any point in closing if we know we will not write?). In fact, > > its purpose might be more obvious right before finalize_hashfile(), > > but I am OK either way on that. > > Ah, this close_midx() actually gets moved and made unconditional later > in the series. But it still needs to be close_object_store() instead. Exactly; this first patch should read: if (ctx.m) close_object_store(the_repository->objects); and then the latter patch (15/25) we drop the conditional and move our call down until after the MIDX bitmap is written, but before we call commit_lock_file(). Thanks, Taylor
diff --git a/midx.c b/midx.c index 18e1949613..d67d7f383d 100644 --- a/midx.c +++ b/midx.c @@ -893,7 +893,7 @@ static int midx_checksum_valid(struct multi_pack_index *m) return hashfile_checksum_valid(m->data, m->data_len); } -static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, +static int write_midx_internal(const char *object_dir, struct string_list *packs_to_drop, const char *preferred_pack_name, unsigned flags) @@ -904,6 +904,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; + struct multi_pack_index *cur; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -914,10 +915,14 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index * die_errno(_("unable to create leading directories of %s"), midx_name); - if (m) - ctx.m = m; - else - ctx.m = load_multi_pack_index(object_dir, 1); + for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) { + if (!strcmp(object_dir, cur->object_dir)) { + ctx.m = cur; + break; + } + } + if (!ctx.m) + ctx.m = get_local_multi_pack_index(the_repository); if (ctx.m && !midx_checksum_valid(ctx.m)) { warning(_("ignoring existing multi-pack-index; checksum mismatch")); @@ -1182,8 +1187,7 @@ int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags) { - return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name, - flags); + return write_midx_internal(object_dir, NULL, preferred_pack_name, flags); } struct clear_midx_data { @@ -1460,8 +1464,10 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla free(count); - if (packs_to_drop.nr) - result = write_midx_internal(object_dir, m, &packs_to_drop, NULL, flags); + if (packs_to_drop.nr) { + result = write_midx_internal(object_dir, &packs_to_drop, NULL, flags); + m = NULL; + } string_list_clear(&packs_to_drop, 0); return result; @@ -1650,7 +1656,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, goto cleanup; } - result = write_midx_internal(object_dir, m, NULL, NULL, flags); + result = write_midx_internal(object_dir, NULL, NULL, flags); m = NULL; cleanup:
Opening multiple instance of the same MIDX can lead to problems like two separate packed_git structures which represent the same pack being added to the repository's object store. The above scenario can happen because prepare_midx_pack() checks if `m->packs[pack_int_id]` is NULL in order to determine if a pack has been opened and installed in the repository before. But a caller can construct two copies of the same MIDX by calling get_multi_pack_index() and load_multi_pack_index() since the former manipulates the object store directly but the latter is a lower-level routine which allocates a new MIDX for each call. So if prepare_midx_pack() is called on multiple MIDXs with the same pack_int_id, then that pack will be installed twice in the object store's packed_git pointer. This can lead to problems in, for e.g., the pack-bitmap code, which does something like the following (in pack-bitmap.c:open_pack_bitmap()): struct bitmap_index *bitmap_git = ...; for (p = get_all_packs(r); p; p = p->next) { if (open_pack_bitmap_1(bitmap_git, p) == 0) ret = 0; } which is a problem if two copies of the same pack exist in the packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a conditional like the following: if (bitmap_git->pack || bitmap_git->midx) { /* ignore extra bitmap file; we can only handle one */ warning("ignoring extra bitmap file: %s", packfile->pack_name); close(fd); return -1; } Avoid this scenario by not letting write_midx_internal() open a MIDX that isn't also pointed at by the object store. So long as this is the case, other routines should prefer to open MIDXs with get_multi_pack_index() or reprepare_packed_git() instead of creating instances on their own. Because get_multi_pack_index() returns `r->object_store->multi_pack_index` if it is non-NULL, we'll only have one instance of a MIDX open at one time, avoiding these problems. To encourage this, drop the `struct multi_pack_index *` parameter from `write_midx_internal()`, and rely instead on the `object_dir` to find (or initialize) the correct MIDX instance. Likewise, replace the call to `close_midx()` with `close_object_store()`, since we're about to replace the MIDX with a new one and should invalidate the object store's memory of any MIDX that might have existed beforehand. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)