Message ID | 313537ef68892b15e772eaad8937a4a8c7ebbe61.1693946195.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | repack: refactor pack snapshot-ing logic | expand |
On Tue, Sep 05, 2023 at 04:36:43PM -0400, Taylor Blau wrote: > At the end of a repack (when given `-d`), Git attempts to remove any > packs which have been made "redundant" as a result of the repacking > operation. For example, an all-into-one (`-A` or `-a`) repack makes > every pre-existing pack which is not marked as kept redundant. Geometric > repacks (with `--geometric=<n>`) make any packs which were rolled up > redundant, and so on. > > But before deleting the set of packs we think are redundant, we first > check to see whether or not we just wrote a pack which is identical to > any one of the packs we were going to delete. When this is the case, Git > must avoid deleting that pack, since it matches a pack we just wrote > (so deleting it may cause the repository to become corrupt). > > Right now we only process the list of non-kept packs in a single pass. > But a future change will split the existing non-kept packs further into > two lists: one for cruft packs, and another for non-cruft packs. > > Factor out this routine to prepare for calling it twice on two separate > lists in a future patch. There are really two "factor outs" here: we pull the code from cmd_repack() into a helper, and then the helper is also just a thin wrapper around its "_1" variant. That latter part isn't needed yet, but I can guess from your description that we'll eventually have the main function dispatch to the "_1" helper for lists. The main caller in cmd_repack() could do that double-dispatch, but then this code: > + if (delete_redundant && pack_everything & ALL_INTO_ONE) > + mark_packs_for_deletion(&existing, &names); would know more about how "existing_packs" work than it needs to. So this seems like a good split (and the two-liner above is making cmd_repack() much more readable than the big loop it had in the pre-image). -Peff
On Thu, Sep 07, 2023 at 03:59:31AM -0400, Jeff King wrote: > There are really two "factor outs" here: we pull the code from > cmd_repack() into a helper, and then the helper is also just a thin > wrapper around its "_1" variant. That latter part isn't needed yet, but > I can guess from your description that we'll eventually have the main > function dispatch to the "_1" helper for lists. Yeah... the "_1" variant looks ugly in isolation in this patch, but I think makes things cleaner in subsequent patches. Thanks, Taylor
diff --git a/builtin/repack.c b/builtin/repack.c index c3ab89912e..708556836e 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -105,6 +105,36 @@ struct existing_packs { .non_kept_packs = STRING_LIST_INIT_DUP, \ } +static void mark_packs_for_deletion_1(struct string_list *names, + struct string_list *list) +{ + struct string_list_item *item; + const int hexsz = the_hash_algo->hexsz; + + for_each_string_list_item(item, list) { + char *sha1; + size_t len = strlen(item->string); + if (len < hexsz) + continue; + sha1 = item->string + len - hexsz; + /* + * Mark this pack for deletion, which ensures that this + * pack won't be included in a MIDX (if `--write-midx` + * was given) and that we will actually delete this pack + * (if `-d` was given). + */ + if (!string_list_has_string(names, sha1)) + item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK); + } +} + +static void mark_packs_for_deletion(struct existing_packs *existing, + struct string_list *names) + +{ + mark_packs_for_deletion_1(names, &existing->non_kept_packs); +} + static void existing_packs_release(struct existing_packs *existing) { string_list_clear(&existing->kept_packs, 0); @@ -1141,24 +1171,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) } /* End of pack replacement. */ - if (delete_redundant && pack_everything & ALL_INTO_ONE) { - const int hexsz = the_hash_algo->hexsz; - for_each_string_list_item(item, &existing.non_kept_packs) { - char *sha1; - size_t len = strlen(item->string); - if (len < hexsz) - continue; - sha1 = item->string + len - hexsz; - /* - * Mark this pack for deletion, which ensures that this - * pack won't be included in a MIDX (if `--write-midx` - * was given) and that we will actually delete this pack - * (if `-d` was given). - */ - if (!string_list_has_string(&names, sha1)) - item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK); - } - } + if (delete_redundant && pack_everything & ALL_INTO_ONE) + mark_packs_for_deletion(&existing, &names); if (write_midx) { struct string_list include = STRING_LIST_INIT_NODUP;
At the end of a repack (when given `-d`), Git attempts to remove any packs which have been made "redundant" as a result of the repacking operation. For example, an all-into-one (`-A` or `-a`) repack makes every pre-existing pack which is not marked as kept redundant. Geometric repacks (with `--geometric=<n>`) make any packs which were rolled up redundant, and so on. But before deleting the set of packs we think are redundant, we first check to see whether or not we just wrote a pack which is identical to any one of the packs we were going to delete. When this is the case, Git must avoid deleting that pack, since it matches a pack we just wrote (so deleting it may cause the repository to become corrupt). Right now we only process the list of non-kept packs in a single pass. But a future change will split the existing non-kept packs further into two lists: one for cruft packs, and another for non-cruft packs. Factor out this routine to prepare for calling it twice on two separate lists in a future patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/repack.c | 50 +++++++++++++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 18 deletions(-)