Message ID | pull.1235.git.1653064572170.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | repack: respect --keep-pack with geometric repack | expand |
Hi Victoria, On Fri, May 20, 2022 at 04:36:12PM +0000, Victoria Dye via GitGitGadget wrote: > From: Victoria Dye <vdye@github.com> > > Update 'repack' to ignore packs named on the command line with the > '--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat > command line-kept packs the same way it treats packs with an on-disk '.keep' > file (that is, skip the pack and do not include it in the 'geometry' > structure). > > Without this handling, a '--keep-pack' pack would be included in the > 'geometry' structure. If the pack is *before* the geometry split line (with > at least one other pack and/or loose objects present), 'repack' assumes the > pack's contents are "rolled up" into another pack via 'pack-objects'. > However, because the internally-invoked 'pack-objects' properly excludes > '--keep-pack' objects, any new pack it creates will not contain the kept > objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since > it assumes 'pack-objects' created a new pack with its contents), resulting > in possible object loss and repository corruption. Nicely found and explained. Having discussed this fix with you already off-list, this approach (to treat kept packs as excluded from the list of packs in the `geometry` structure regardless of whether they are kept on disk or in-core) makes sense to me. I left a couple of small notes on the patch below, but since I have some patches that deal with a separate issue in the `git repack --geometric` code coming, do you want to combine forces (and I can send a lightly-reworked version of this patch as a part of my series)? > @@ -332,17 +332,34 @@ static int geometry_cmp(const void *va, const void *vb) > return 0; > } > > -static void init_pack_geometry(struct pack_geometry **geometry_p) > +static void init_pack_geometry(struct pack_geometry **geometry_p, > + struct string_list *existing_kept_packs) > { > struct packed_git *p; > struct pack_geometry *geometry; > + struct strbuf buf = STRBUF_INIT; > > *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); > geometry = *geometry_p; > > + string_list_sort(existing_kept_packs); Would it be worth sorting this as early as in collect_pack_filenames()? For our purposes in this patch, this works as-is, but it may be defensive to try and minimize the time that list has unsorted contents. > for (p = get_all_packs(the_repository); p; p = p->next) { > - if (!pack_kept_objects && p->pack_keep) > - continue; > + if (!pack_kept_objects) { > + if (p->pack_keep) > + continue; (You mentioned this to me off-list, but I'll repeat it here since it wasn't obvious to me on first read): this check for `p->pack_keep` isn't strictly necessary, since any packs that have their `pack_keep` bit set will appear in the `existing_kept_packs` list. But it does give us a fast path to avoid having to check that list, so it's worth checking that bit to avoid a slightly more expensive check where possible. > + /* > + * The pack may be kept via the --keep-pack option; > + * check 'existing_kept_packs' to determine whether to > + * ignore it. > + */ > + strbuf_reset(&buf); > + strbuf_addstr(&buf, pack_basename(p)); > + strbuf_strip_suffix(&buf, ".pack"); > + > + if (string_list_has_string(existing_kept_packs, buf.buf)) > + continue; It's too bad that we have to do this check at all, and can't rely on the `pack_keep_in_core` in the same way as we check `p->pack_keep`. But lifting that restriction is a more invasive change, so I'm happy to rely on the contents of existing_kept_packs here in the meantime. > + } > > ALLOC_GROW(geometry->pack, > geometry->pack_nr + 1, > @@ -353,6 +370,7 @@ static void init_pack_geometry(struct pack_geometry **geometry_p) > } > > QSORT(geometry->pack, geometry->pack_nr, geometry_cmp); > + strbuf_release(&buf); > } > > static void split_pack_geometry(struct pack_geometry *geometry, int factor) > @@ -714,17 +732,20 @@ int cmd_repack(int argc, const char **argv, const char *prefix) > strbuf_release(&path); > } > > + packdir = mkpathdup("%s/pack", get_object_directory()); > + packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); > + packtmp = mkpathdup("%s/%s", packdir, packtmp_name); > + > + collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, > + &keep_pack_list); > + Makes sense; we have to initialize existing_kept_packs before arranging the list of packs for the `--geometric` split. And presumably `collect_pack_filenames()` relies on `packdir`, `packtmp_name`, and `packtmp` being setup ahead of time, too. > if (geometric_factor) { > if (pack_everything) > die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a"); > - init_pack_geometry(&geometry); > + init_pack_geometry(&geometry, &existing_kept_packs); > split_pack_geometry(geometry, geometric_factor); > } > > - packdir = mkpathdup("%s/pack", get_object_directory()); > - packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); > - packtmp = mkpathdup("%s/%s", packdir, packtmp_name); > - > sigchain_push_common(remove_pack_on_signal); > > prepare_pack_objects(&cmd, &po_args); > @@ -764,9 +785,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix) > if (use_delta_islands) > strvec_push(&cmd.args, "--delta-islands"); > > - collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, > - &keep_pack_list); > - > if (pack_everything & ALL_INTO_ONE) { > repack_promisor_objects(&po_args, &names); > > diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh > index bdbbcbf1eca..f5ac23413d5 100755 > --- a/t/t7703-repack-geometric.sh > +++ b/t/t7703-repack-geometric.sh > @@ -180,6 +180,40 @@ test_expect_success '--geometric ignores kept packs' ' > ) > ' > > +test_expect_success '--geometric ignores --keep-pack packs' ' > + git init geometric && > + test_when_finished "rm -fr geometric" && > + ( > + cd geometric && > + > + # Create two equal-sized packs > + test_commit kept && # 3 objects > + test_commit pack && # 3 objects > + > + KEPT=$(git pack-objects --revs $objdir/pack/pack <<-EOF > + refs/tags/kept > + EOF > + ) && > + PACK=$(git pack-objects --revs $objdir/pack/pack <<-EOF > + refs/tags/pack > + ^refs/tags/kept > + EOF > + ) && Nit; we don't care about the name of $PACK, so it would probably be fine to avoid storing the `PACK` variable. We could write these packs with just `git repack -d` after each `test_commit` (which would avoid us having to call `prune-packed`). Does it matter which one is kept? I don't think so, since AFAICT the critical bit is that we mark one of the packs being rolled up as a `--keep-pack`. > + # Prune loose objects that are now packed into PACK and KEEP > + git prune-packed && > + > + git repack --geometric 2 -dm --keep-pack=pack-$KEPT.pack >out && > + > + # Packs should not have changed (only one non-kept pack, no > + # loose objects), but midx should now exist. > + test_i18ngrep "Nothing new to pack" out && Nit; test_i18ngrep here should just be "grep". > + test_path_is_file $midx && > + test_path_is_file $objdir/pack/pack-$KEPT.pack && > + git fsck > + ) Thanks, Taylor
Taylor Blau wrote: > Hi Victoria, > > On Fri, May 20, 2022 at 04:36:12PM +0000, Victoria Dye via GitGitGadget wrote: >> From: Victoria Dye <vdye@github.com> >> >> Update 'repack' to ignore packs named on the command line with the >> '--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat >> command line-kept packs the same way it treats packs with an on-disk '.keep' >> file (that is, skip the pack and do not include it in the 'geometry' >> structure). >> >> Without this handling, a '--keep-pack' pack would be included in the >> 'geometry' structure. If the pack is *before* the geometry split line (with >> at least one other pack and/or loose objects present), 'repack' assumes the >> pack's contents are "rolled up" into another pack via 'pack-objects'. >> However, because the internally-invoked 'pack-objects' properly excludes >> '--keep-pack' objects, any new pack it creates will not contain the kept >> objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since >> it assumes 'pack-objects' created a new pack with its contents), resulting >> in possible object loss and repository corruption. > > Nicely found and explained. Having discussed this fix with you already > off-list, this approach (to treat kept packs as excluded from the list > of packs in the `geometry` structure regardless of whether they are kept > on disk or in-core) makes sense to me. > > I left a couple of small notes on the patch below, but since I have some > patches that deal with a separate issue in the `git repack --geometric` > code coming, do you want to combine forces (and I can send a > lightly-reworked version of this patch as a part of my series)? > Works for me! I'm happy with all the suggested changes you noted below (moving the 'string_list_sort' and cleaning up the test), so feel free to include them in your series. Thanks! >> @@ -332,17 +332,34 @@ static int geometry_cmp(const void *va, const void *vb) >> return 0; >> } >> >> -static void init_pack_geometry(struct pack_geometry **geometry_p) >> +static void init_pack_geometry(struct pack_geometry **geometry_p, >> + struct string_list *existing_kept_packs) >> { >> struct packed_git *p; >> struct pack_geometry *geometry; >> + struct strbuf buf = STRBUF_INIT; >> >> *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); >> geometry = *geometry_p; >> >> + string_list_sort(existing_kept_packs); > > Would it be worth sorting this as early as in collect_pack_filenames()? > For our purposes in this patch, this works as-is, but it may be > defensive to try and minimize the time that list has unsorted contents. > I went back and forth on this, eventually settling on this to keep the 'string_list_sort' as close as possible to where the sorted list is needed. I'm still pretty indifferent, though, so moving it to the end of 'collect_pack_filenames()' is fine with me. >> for (p = get_all_packs(the_repository); p; p = p->next) { >> - if (!pack_kept_objects && p->pack_keep) >> - continue; >> + if (!pack_kept_objects) { >> + if (p->pack_keep) >> + continue; > > (You mentioned this to me off-list, but I'll repeat it here since it > wasn't obvious to me on first read): this check for `p->pack_keep` isn't > strictly necessary, since any packs that have their `pack_keep` bit set > will appear in the `existing_kept_packs` list. > > But it does give us a fast path to avoid having to check that list, so > it's worth checking that bit to avoid a slightly more expensive check > where possible. > >> + /* >> + * The pack may be kept via the --keep-pack option; >> + * check 'existing_kept_packs' to determine whether to >> + * ignore it. >> + */ >> + strbuf_reset(&buf); >> + strbuf_addstr(&buf, pack_basename(p)); >> + strbuf_strip_suffix(&buf, ".pack"); >> + >> + if (string_list_has_string(existing_kept_packs, buf.buf)) >> + continue; > > It's too bad that we have to do this check at all, and can't rely on the > `pack_keep_in_core` in the same way as we check `p->pack_keep`. But > lifting that restriction is a more invasive change, so I'm happy to > rely on the contents of existing_kept_packs here in the meantime. > >> + } >> >> ALLOC_GROW(geometry->pack, >> geometry->pack_nr + 1, >> @@ -353,6 +370,7 @@ static void init_pack_geometry(struct pack_geometry **geometry_p) >> } >> >> QSORT(geometry->pack, geometry->pack_nr, geometry_cmp); >> + strbuf_release(&buf); >> } >> >> static void split_pack_geometry(struct pack_geometry *geometry, int factor) >> @@ -714,17 +732,20 @@ int cmd_repack(int argc, const char **argv, const char *prefix) >> strbuf_release(&path); >> } >> >> + packdir = mkpathdup("%s/pack", get_object_directory()); >> + packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); >> + packtmp = mkpathdup("%s/%s", packdir, packtmp_name); >> + >> + collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, >> + &keep_pack_list); >> + > > Makes sense; we have to initialize existing_kept_packs before arranging > the list of packs for the `--geometric` split. And presumably > `collect_pack_filenames()` relies on `packdir`, `packtmp_name`, and > `packtmp` being setup ahead of time, too. > >> if (geometric_factor) { >> if (pack_everything) >> die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a"); >> - init_pack_geometry(&geometry); >> + init_pack_geometry(&geometry, &existing_kept_packs); >> split_pack_geometry(geometry, geometric_factor); >> } >> >> - packdir = mkpathdup("%s/pack", get_object_directory()); >> - packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); >> - packtmp = mkpathdup("%s/%s", packdir, packtmp_name); >> - >> sigchain_push_common(remove_pack_on_signal); >> >> prepare_pack_objects(&cmd, &po_args); >> @@ -764,9 +785,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix) >> if (use_delta_islands) >> strvec_push(&cmd.args, "--delta-islands"); >> >> - collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, >> - &keep_pack_list); >> - >> if (pack_everything & ALL_INTO_ONE) { >> repack_promisor_objects(&po_args, &names); >> >> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh >> index bdbbcbf1eca..f5ac23413d5 100755 >> --- a/t/t7703-repack-geometric.sh >> +++ b/t/t7703-repack-geometric.sh >> @@ -180,6 +180,40 @@ test_expect_success '--geometric ignores kept packs' ' >> ) >> ' >> >> +test_expect_success '--geometric ignores --keep-pack packs' ' >> + git init geometric && >> + test_when_finished "rm -fr geometric" && >> + ( >> + cd geometric && >> + >> + # Create two equal-sized packs >> + test_commit kept && # 3 objects >> + test_commit pack && # 3 objects >> + >> + KEPT=$(git pack-objects --revs $objdir/pack/pack <<-EOF >> + refs/tags/kept >> + EOF >> + ) && >> + PACK=$(git pack-objects --revs $objdir/pack/pack <<-EOF >> + refs/tags/pack >> + ^refs/tags/kept >> + EOF >> + ) && > > Nit; we don't care about the name of $PACK, so it would probably be fine > to avoid storing the `PACK` variable. We could write these packs with > just `git repack -d` after each `test_commit` (which would avoid us > having to call `prune-packed`). > Makes sense. > Does it matter which one is kept? I don't think so, since AFAICT the > critical bit is that we mark one of the packs being rolled up as a > `--keep-pack`. > Correct, the two packs in this test are just two same-sized (or, more generally, non-geometrically progressing) packs with non-overlapping content. >> + # Prune loose objects that are now packed into PACK and KEEP >> + git prune-packed && >> + >> + git repack --geometric 2 -dm --keep-pack=pack-$KEPT.pack >out && >> + >> + # Packs should not have changed (only one non-kept pack, no >> + # loose objects), but midx should now exist. >> + test_i18ngrep "Nothing new to pack" out && > > Nit; test_i18ngrep here should just be "grep". > Thanks for pointing this out - I've been a bit unsure of the difference for a while, but this pushed me to figure out the difference and I found the note in 'test-lib-functions.sh' clarifying that 'test_i18ngrep' is deprecated. >> + test_path_is_file $midx && >> + test_path_is_file $objdir/pack/pack-$KEPT.pack && >> + git fsck >> + ) > > > Thanks, > Taylor
On Fri, May 20, 2022 at 10:27:21AM -0700, Victoria Dye wrote: > > I left a couple of small notes on the patch below, but since I have some > > patches that deal with a separate issue in the `git repack --geometric` > > code coming, do you want to combine forces (and I can send a > > lightly-reworked version of this patch as a part of my series)? > > Works for me! I'm happy with all the suggested changes you noted below > (moving the 'string_list_sort' and cleaning up the test), so feel free to > include them in your series. > > Thanks! No problem; I (re-)sent this patch as 1/3 in my series which should be available via the archive at: https://lore.kernel.org/git/cover.1653073280.git.me@ttaylorr.com/T/#t It looks like we're on the same page with respect to my suggestions, but feel free to take a look at the reworked version of this patch (and the new ones on top, too) and let me know what you think. > >> @@ -332,17 +332,34 @@ static int geometry_cmp(const void *va, const void *vb) > >> return 0; > >> } > >> > >> -static void init_pack_geometry(struct pack_geometry **geometry_p) > >> +static void init_pack_geometry(struct pack_geometry **geometry_p, > >> + struct string_list *existing_kept_packs) > >> { > >> struct packed_git *p; > >> struct pack_geometry *geometry; > >> + struct strbuf buf = STRBUF_INIT; > >> > >> *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); > >> geometry = *geometry_p; > >> > >> + string_list_sort(existing_kept_packs); > > > > Would it be worth sorting this as early as in collect_pack_filenames()? > > For our purposes in this patch, this works as-is, but it may be > > defensive to try and minimize the time that list has unsorted contents. > > I went back and forth on this, eventually settling on this to keep the > 'string_list_sort' as close as possible to where the sorted list is needed. > I'm still pretty indifferent, though, so moving it to the end of > 'collect_pack_filenames()' is fine with me. I'm fine with it either way. I opted to sorting the list in collect_pack_filenames() since I think it's slightly more fool-proof, but I also don't have strong feelings about its placement. Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: >> *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); >> geometry = *geometry_p; >> >> + string_list_sort(existing_kept_packs); > > Would it be worth sorting this as early as in collect_pack_filenames()? > For our purposes in this patch, this works as-is, but it may be > defensive to try and minimize the time that list has unsorted contents. It does not matter too much but after reading the latest one before coming back to this original thread, I actually thought that sorting it too early was questionable. Nobody other than this code cares about the list being sorted, and if it turns out that the look-up in the loop need to be optimized, it is plausible that we do not need this list sorted when we populate a hashmap out of its contents, for example. >> for (p = get_all_packs(the_repository); p; p = p->next) { >> - if (!pack_kept_objects && p->pack_keep) >> - continue; >> + if (!pack_kept_objects) { >> + if (p->pack_keep) >> + continue; > > (You mentioned this to me off-list, but I'll repeat it here since it > wasn't obvious to me on first read): this check for `p->pack_keep` isn't > strictly necessary, since any packs that have their `pack_keep` bit set > will appear in the `existing_kept_packs` list. > > But it does give us a fast path to avoid having to check that list, so > it's worth checking that bit to avoid a slightly more expensive check > where possible. That invites a related but different question. Doesn't p->pack_keep and string_list_has_string(existing_kept_packs, name_of_pack(p)) mirror each other? Can a pack appear on the existing_kept_packs list without having .pack_keep bit set and under what condition? It is answered by the comment below, I presume? >> + /* >> + * The pack may be kept via the --keep-pack option; >> + * check 'existing_kept_packs' to determine whether to >> + * ignore it. >> + */ OK. So there are two classes of packs we want to exclude from the geometry repacking. Those that already have .pack_keep bit set, and those that are _are_ newly making into kept packs that do not yet have .pack_keep bit set. Makes sense.
Junio C Hamano <gitster@pobox.com> writes: >>> + /* >>> + * The pack may be kept via the --keep-pack option; >>> + * check 'existing_kept_packs' to determine whether to >>> + * ignore it. >>> + */ > > OK. So there are two classes of packs we want to exclude from the > geometry repacking. Those that already have .pack_keep bit set, and > those that are _are_ newly making into kept packs that do not yet > have .pack_keep bit set. Makes sense. And with another topic in-flight combined, we have the third class that we would want to exclude here, i.e. the ones that are "cruft". (Noticed while rebuilding 'seen').
On Fri, May 20, 2022 at 03:12:32PM -0700, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > >>> + /* > >>> + * The pack may be kept via the --keep-pack option; > >>> + * check 'existing_kept_packs' to determine whether to > >>> + * ignore it. > >>> + */ > > > > OK. So there are two classes of packs we want to exclude from the > > geometry repacking. Those that already have .pack_keep bit set, and > > those that are _are_ newly making into kept packs that do not yet > > have .pack_keep bit set. Makes sense. > > And with another topic in-flight combined, we have the third class > that we would want to exclude here, i.e. the ones that are "cruft". Indeed. tb/cruft-packs handles this by: if (p->is_cruft) continue; since we would never want to roll up the objects in a cruft pack during geometric repacking. There is likely a small conflict you had to resolve, but the resolution on 'seen' looks as-expected to me. Thanks, Taylor
On Fri, May 20, 2022 at 01:41:49PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > >> *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); > >> geometry = *geometry_p; > >> > >> + string_list_sort(existing_kept_packs); > > > > Would it be worth sorting this as early as in collect_pack_filenames()? > > For our purposes in this patch, this works as-is, but it may be > > defensive to try and minimize the time that list has unsorted contents. > > It does not matter too much but after reading the latest one before > coming back to this original thread, I actually thought that sorting > it too early was questionable. Nobody other than this code cares > about the list being sorted, and if it turns out that the look-up in > the loop need to be optimized, it is plausible that we do not need > this list sorted when we populate a hashmap out of its contents, for > example. My thinking was simply that sorting it as often as possible (by doing so immediately after appending all of our newly written packs into it) was the most defensive approach that we could take. At least in the sense of another caller which _does_ depend on `names` sorted, and makes the same mistake of forgetting to call `string_list_sort()` itself. I suspect that this is mostly academic anyway, and that either would be fine, so if others feel strongly I'm happy to send an updated version (though TBH I would be just as happy with what we already have in the series I posted here). > >> for (p = get_all_packs(the_repository); p; p = p->next) { > >> - if (!pack_kept_objects && p->pack_keep) > >> - continue; > >> + if (!pack_kept_objects) { > >> + if (p->pack_keep) > >> + continue; > > > > (You mentioned this to me off-list, but I'll repeat it here since it > > wasn't obvious to me on first read): this check for `p->pack_keep` isn't > > strictly necessary, since any packs that have their `pack_keep` bit set > > will appear in the `existing_kept_packs` list. > > > > But it does give us a fast path to avoid having to check that list, so > > it's worth checking that bit to avoid a slightly more expensive check > > where possible. > > That invites a related but different question. Doesn't p->pack_keep > and string_list_has_string(existing_kept_packs, name_of_pack(p)) > mirror each other? Can a pack appear on the existing_kept_packs > list without having .pack_keep bit set and under what condition? Unfortunately not. `repack` is much more keen to manipulate the results of calling readdir() over $GIT_DIR/objects/pack than it is to, say, call `get_all_packs()` and mark `p->pack_keep_in_core`. In a future where `repack` does not interact with the packdir at such a low level, I don't think we would have a use for a function like `collect_pack_filenames()` at all. But that is a much larger change than this one ;). > It is answered by the comment below, I presume? > > >> + /* > >> + * The pack may be kept via the --keep-pack option; > >> + * check 'existing_kept_packs' to determine whether to > >> + * ignore it. > >> + */ > > OK. So there are two classes of packs we want to exclude from the > geometry repacking. Those that already have .pack_keep bit set, and > those that are _are_ newly making into kept packs that do not yet > have .pack_keep bit set. Makes sense. Yep (and also "yep" to your note below about adding cruft packs into the mix). Thanks, Taylor
diff --git a/builtin/repack.c b/builtin/repack.c index d1a563d5b65..0b636676056 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -332,17 +332,34 @@ static int geometry_cmp(const void *va, const void *vb) return 0; } -static void init_pack_geometry(struct pack_geometry **geometry_p) +static void init_pack_geometry(struct pack_geometry **geometry_p, + struct string_list *existing_kept_packs) { struct packed_git *p; struct pack_geometry *geometry; + struct strbuf buf = STRBUF_INIT; *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); geometry = *geometry_p; + string_list_sort(existing_kept_packs); for (p = get_all_packs(the_repository); p; p = p->next) { - if (!pack_kept_objects && p->pack_keep) - continue; + if (!pack_kept_objects) { + if (p->pack_keep) + continue; + + /* + * The pack may be kept via the --keep-pack option; + * check 'existing_kept_packs' to determine whether to + * ignore it. + */ + strbuf_reset(&buf); + strbuf_addstr(&buf, pack_basename(p)); + strbuf_strip_suffix(&buf, ".pack"); + + if (string_list_has_string(existing_kept_packs, buf.buf)) + continue; + } ALLOC_GROW(geometry->pack, geometry->pack_nr + 1, @@ -353,6 +370,7 @@ static void init_pack_geometry(struct pack_geometry **geometry_p) } QSORT(geometry->pack, geometry->pack_nr, geometry_cmp); + strbuf_release(&buf); } static void split_pack_geometry(struct pack_geometry *geometry, int factor) @@ -714,17 +732,20 @@ int cmd_repack(int argc, const char **argv, const char *prefix) strbuf_release(&path); } + packdir = mkpathdup("%s/pack", get_object_directory()); + packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); + packtmp = mkpathdup("%s/%s", packdir, packtmp_name); + + collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, + &keep_pack_list); + if (geometric_factor) { if (pack_everything) die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a"); - init_pack_geometry(&geometry); + init_pack_geometry(&geometry, &existing_kept_packs); split_pack_geometry(geometry, geometric_factor); } - packdir = mkpathdup("%s/pack", get_object_directory()); - packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); - packtmp = mkpathdup("%s/%s", packdir, packtmp_name); - sigchain_push_common(remove_pack_on_signal); prepare_pack_objects(&cmd, &po_args); @@ -764,9 +785,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (use_delta_islands) strvec_push(&cmd.args, "--delta-islands"); - collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, - &keep_pack_list); - if (pack_everything & ALL_INTO_ONE) { repack_promisor_objects(&po_args, &names); diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh index bdbbcbf1eca..f5ac23413d5 100755 --- a/t/t7703-repack-geometric.sh +++ b/t/t7703-repack-geometric.sh @@ -180,6 +180,40 @@ test_expect_success '--geometric ignores kept packs' ' ) ' +test_expect_success '--geometric ignores --keep-pack packs' ' + git init geometric && + test_when_finished "rm -fr geometric" && + ( + cd geometric && + + # Create two equal-sized packs + test_commit kept && # 3 objects + test_commit pack && # 3 objects + + KEPT=$(git pack-objects --revs $objdir/pack/pack <<-EOF + refs/tags/kept + EOF + ) && + PACK=$(git pack-objects --revs $objdir/pack/pack <<-EOF + refs/tags/pack + ^refs/tags/kept + EOF + ) && + + # Prune loose objects that are now packed into PACK and KEEP + git prune-packed && + + git repack --geometric 2 -dm --keep-pack=pack-$KEPT.pack >out && + + # Packs should not have changed (only one non-kept pack, no + # loose objects), but midx should now exist. + test_i18ngrep "Nothing new to pack" out && + test_path_is_file $midx && + test_path_is_file $objdir/pack/pack-$KEPT.pack && + git fsck + ) +' + test_expect_success '--geometric chooses largest MIDX preferred pack' ' git init geometric && test_when_finished "rm -fr geometric" &&