mbox series

[v2,0/8] repack: refactor pack snapshot-ing logic

Message ID cover.1694632644.git.me@ttaylorr.com (mailing list archive)
Headers show
Series repack: refactor pack snapshot-ing logic | expand

Message

Taylor Blau Sept. 13, 2023, 7:17 p.m. UTC
Here is a small reroll of my series to clean up some of the internals of
'git repack' used to track the set of existing packs.

Much is unchanged from the last round, save for some additional clean-up
on how we handle the '->util' field for each pack's string_list_item in
response to very helpful review from those CC'd.

As usual, a range-diff is available below for convenience. Thanks in
advance for your review!

Taylor Blau (8):
  builtin/repack.c: extract structure to store existing packs
  builtin/repack.c: extract marking packs for deletion
  builtin/repack.c: extract redundant pack cleanup for --geometric
  builtin/repack.c: extract redundant pack cleanup for existing packs
  builtin/repack.c: extract `has_existing_non_kept_packs()`
  builtin/repack.c: store existing cruft packs separately
  builtin/repack.c: avoid directly inspecting "util"
  builtin/repack.c: extract common cruft pack loop

 builtin/repack.c | 293 +++++++++++++++++++++++++++++------------------
 1 file changed, 180 insertions(+), 113 deletions(-)

Range-diff against v1:
1:  5b48b7e3cc = 1:  2e26beff22 builtin/repack.c: extract structure to store existing packs
2:  313537ef68 = 2:  62d916169d builtin/repack.c: extract marking packs for deletion
3:  5c25ef87c1 = 3:  7ed45804ea builtin/repack.c: extract redundant pack cleanup for --geometric
4:  7bb543fef8 = 4:  82057de4cf builtin/repack.c: extract redundant pack cleanup for existing packs
5:  e2cf87bb94 = 5:  f4f7b4c08f builtin/repack.c: extract `has_existing_non_kept_packs()`
6:  414a558883 = 6:  d68a88dbd5 builtin/repack.c: store existing cruft packs separately
7:  559b487e2a ! 7:  481a29599b builtin/repack.c: drop `DELETE_PACK` macro
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    builtin/repack.c: drop `DELETE_PACK` macro
    +    builtin/repack.c: avoid directly inspecting "util"
     
    +    The `->util` field corresponding to each string_list_item is used to
    +    track the existence of some pack at the beginning of a repack operation
    +    was originally intended to be used as a bitfield.
    +
    +    This bitfield tracked:
    +
    +      - (1 << 0): whether or not the pack should be deleted
    +      - (1 << 1): whether or not the pack is cruft
    +
    +    The previous commit removed the use of the second bit, but a future
    +    patch (from a different series than this one) will introduce a new use
    +    of it.
    +
    +    So we could stop treating the util pointer as a bitfield and instead
    +    start treating it as if it were a boolean. But this would require some
    +    backtracking when that later patch is applied.
    +
    +    Instead, let's avoid touching the ->util field directly, and instead
    +    introduce convenience functions like:
    +
    +      - pack_mark_for_deletion()
    +      - pack_is_marked_for_deletion()
    +
    +    Helped-by: Junio C Hamano <gitster@pobox.com>
    +    Helped-by: Jeff King <peff@peff.net>
    +    Helped-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## builtin/repack.c ##
    -@@
    - #define LOOSEN_UNREACHABLE 2
    - #define PACK_CRUFT 4
    +@@ builtin/repack.c: static int has_existing_non_kept_packs(const struct existing_packs *existing)
    + 	return existing->non_kept_packs.nr || existing->cruft_packs.nr;
    + }
      
    --#define DELETE_PACK 1
    --
    - static int pack_everything;
    - static int delta_base_offset = 1;
    - static int pack_kept_objects = -1;
    -@@ builtin/repack.c: static int repack_config(const char *var, const char *value,
    - 
    - struct existing_packs {
    - 	struct string_list kept_packs;
    -+	/*
    -+	 * for both non_kept_packs, and cruft_packs, a non-NULL
    -+	 * 'util' field indicates the pack should be deleted.
    -+	 */
    - 	struct string_list non_kept_packs;
    - 	struct string_list cruft_packs;
    - };
    ++static void pack_mark_for_deletion(struct string_list_item *item)
    ++{
    ++	item->util = (void*)((uintptr_t)item->util | DELETE_PACK);
    ++}
    ++
    ++static int pack_is_marked_for_deletion(struct string_list_item *item)
    ++{
    ++	return (uintptr_t)item->util & DELETE_PACK;
    ++}
    ++
    + static void mark_packs_for_deletion_1(struct string_list *names,
    + 				      struct string_list *list)
    + {
     @@ builtin/repack.c: static void mark_packs_for_deletion_1(struct string_list *names,
      		 * (if `-d` was given).
      		 */
      		if (!string_list_has_string(names, sha1))
     -			item->util = (void*)(uintptr_t)((size_t)item->util | DELETE_PACK);
    -+			item->util = (void*)1;
    ++			pack_mark_for_deletion(item);
      	}
      }
      
    @@ builtin/repack.c: static void remove_redundant_packs_1(struct string_list *packs
      	struct string_list_item *item;
      	for_each_string_list_item(item, packs) {
     -		if (!((uintptr_t)item->util & DELETE_PACK))
    -+		if (!item->util)
    ++		if (!pack_is_marked_for_deletion(item))
      			continue;
      		remove_redundant_pack(packdir, item->string);
      	}
     @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
    - 
    - 		for_each_string_list_item(item, &existing->cruft_packs) {
    - 			/*
    --			 * no need to check DELETE_PACK, since we're not
    --			 * doing an ALL_INTO_ONE repack
    -+			 * no need to check for deleted packs, since we're
    -+			 * not doing an ALL_INTO_ONE repack
    - 			 */
    - 			string_list_insert(include, xstrfmt("%s.idx", item->string));
      		}
      	} else {
      		for_each_string_list_item(item, &existing->non_kept_packs) {
     -			if ((uintptr_t)item->util & DELETE_PACK)
    -+			if (item->util)
    ++			if (pack_is_marked_for_deletion(item))
      				continue;
      			string_list_insert(include, xstrfmt("%s.idx", item->string));
      		}
      
      		for_each_string_list_item(item, &existing->cruft_packs) {
     -			if ((uintptr_t)item->util & DELETE_PACK)
    -+			if (item->util)
    ++			if (pack_is_marked_for_deletion(item))
      				continue;
      			string_list_insert(include, xstrfmt("%s.idx", item->string));
      		}
8:  ca7d13e7bf ! 8:  68748eb9c8 builtin/repack.c: extract common cruft pack loop
    @@ Commit message
     
      ## builtin/repack.c ##
     @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
    + 
      			string_list_insert(include, strbuf_detach(&buf, NULL));
      		}
    - 
    +-
     -		for_each_string_list_item(item, &existing->cruft_packs) {
     -			/*
    --			 * no need to check for deleted packs, since we're
    --			 * not doing an ALL_INTO_ONE repack
    +-			 * no need to check DELETE_PACK, since we're not
    +-			 * doing an ALL_INTO_ONE repack
     -			 */
     -			string_list_insert(include, xstrfmt("%s.idx", item->string));
     -		}
      	} else {
      		for_each_string_list_item(item, &existing->non_kept_packs) {
    - 			if (item->util)
    + 			if (pack_is_marked_for_deletion(item))
      				continue;
      			string_list_insert(include, xstrfmt("%s.idx", item->string));
      		}
     +	}
      
     -		for_each_string_list_item(item, &existing->cruft_packs) {
    --			if (item->util)
    +-			if (pack_is_marked_for_deletion(item))
     -				continue;
     -			string_list_insert(include, xstrfmt("%s.idx", item->string));
     -		}
    @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
     +		 * `mark_packs_for_deletion()` when doing an all-into-one
     +		 * repack).
     +		 */
    -+		if (item->util)
    ++		if (pack_is_marked_for_deletion(item))
     +			continue;
     +		string_list_insert(include, xstrfmt("%s.idx", item->string));
      	}

Comments

Junio C Hamano Sept. 13, 2023, 7:44 p.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> Here is a small reroll of my series to clean up some of the internals of
> 'git repack' used to track the set of existing packs.
>
> Much is unchanged from the last round, save for some additional clean-up
> on how we handle the '->util' field for each pack's string_list_item in
> response to very helpful review from those CC'd.

The change to [7/8] was as expected and looking good.  Let's see if
we see additional reviews from others, plan to declare victory and
merge it to 'next' by early next week at the latest, if not sooner.

Christian, your cc/repack-sift-filtered-objects-to-separate-pack
topic will have to interact with this topic when merged to 'seen',
so it would be good for you to give a review on these patches (if
only to understand the new world order) and optionally make a trial
merge between the two to see how well they work together and what
adjustment will be needed when you eventually rebase your topic on
top.  Actual rebasing can wait until this topic graduates, but trial
merge is something you can immediately do in the meantime to prepare
for the future.

Thanks.
Jeff King Sept. 14, 2023, 12:07 a.m. UTC | #2
On Wed, Sep 13, 2023 at 12:44:04PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> > Here is a small reroll of my series to clean up some of the internals of
> > 'git repack' used to track the set of existing packs.
> >
> > Much is unchanged from the last round, save for some additional clean-up
> > on how we handle the '->util' field for each pack's string_list_item in
> > response to very helpful review from those CC'd.
> 
> The change to [7/8] was as expected and looking good.  Let's see if
> we see additional reviews from others, plan to declare victory and
> merge it to 'next' by early next week at the latest, if not sooner.

This looks great to me. The motivation in the revised patch 7 is much
easier to follow, and the end result is much nicer to read. :)

-Peff
Patrick Steinhardt Sept. 14, 2023, 10:33 a.m. UTC | #3
On Wed, Sep 13, 2023 at 08:07:50PM -0400, Jeff King wrote:
> On Wed, Sep 13, 2023 at 12:44:04PM -0700, Junio C Hamano wrote:
> 
> > Taylor Blau <me@ttaylorr.com> writes:
> > 
> > > Here is a small reroll of my series to clean up some of the internals of
> > > 'git repack' used to track the set of existing packs.
> > >
> > > Much is unchanged from the last round, save for some additional clean-up
> > > on how we handle the '->util' field for each pack's string_list_item in
> > > response to very helpful review from those CC'd.
> > 
> > The change to [7/8] was as expected and looking good.  Let's see if
> > we see additional reviews from others, plan to declare victory and
> > merge it to 'next' by early next week at the latest, if not sooner.
> 
> This looks great to me. The motivation in the revised patch 7 is much
> easier to follow, and the end result is much nicer to read. :)
> 
> -Peff

Agreed, the patch series loogs good to me now. Thanks!

Patrick
Christian Couder Sept. 14, 2023, 11:10 a.m. UTC | #4
On Wed, Sep 13, 2023 at 9:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Here is a small reroll of my series to clean up some of the internals of
> > 'git repack' used to track the set of existing packs.
> >
> > Much is unchanged from the last round, save for some additional clean-up
> > on how we handle the '->util' field for each pack's string_list_item in
> > response to very helpful review from those CC'd.
>
> The change to [7/8] was as expected and looking good.  Let's see if
> we see additional reviews from others, plan to declare victory and
> merge it to 'next' by early next week at the latest, if not sooner.
>
> Christian, your cc/repack-sift-filtered-objects-to-separate-pack
> topic will have to interact with this topic when merged to 'seen',
> so it would be good for you to give a review on these patches (if
> only to understand the new world order) and optionally make a trial
> merge between the two to see how well they work together and what
> adjustment will be needed when you eventually rebase your topic on
> top.  Actual rebasing can wait until this topic graduates, but trial
> merge is something you can immediately do in the meantime to prepare
> for the future.

Ok, I will try to review and merge this with
cc/repack-sift-filtered-objects-to-separate-pack soon.
Taylor Blau Sept. 14, 2023, 6:01 p.m. UTC | #5
On Thu, Sep 14, 2023 at 01:10:38PM +0200, Christian Couder wrote:
> Ok, I will try to review and merge this with
> cc/repack-sift-filtered-objects-to-separate-pack soon.

I took a look at how much/little effort was going to be required, and
luckily the changes are isolated only to a single patch. It's just your
f1ffa71e8f (repack: add `--filter=<filter-spec>` option, 2023-09-11),
and in particular the `write_filtered_pack()` function.

I started messing around with it myself and generated the following
fixup! which can be applied on top of your version of f1ffa71e8f. It's
mostly straightforward, but there is a gotcha that the loop over
non-kept packs has to change to:

    for_each_string_list_item(item, &existing->non_kept_packs)
            /* ... */
    for_each_string_list_item(item, &existing->cruft_packs)
            /* ... */

, instead of just the first loop over non_kept_packs, since cruft packs
are stored in a separate list.

In any event, here's the fixup! I generated on top of that patch:

--- 8< ---
Subject: [PATCH] fixup! repack: add `--filter=<filter-spec>` option

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 120f4241c0..0d23323d05 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -834,15 +834,13 @@ static int finish_pack_objects_cmd(struct child_process *cmd,
 static int write_filtered_pack(const struct pack_objects_args *args,
 			       const char *destination,
 			       const char *pack_prefix,
-			       struct string_list *keep_pack_list,
-			       struct string_list *names,
-			       struct string_list *existing_packs,
-			       struct string_list *existing_kept_packs)
+			       struct existing_packs *existing,
+			       struct string_list *names)
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
 	FILE *in;
-	int ret, i;
+	int ret;
 	const char *caret;
 	const char *scratch;
 	int local = skip_prefix(destination, packdir, &scratch);
@@ -853,9 +851,8 @@ static int write_filtered_pack(const struct pack_objects_args *args,

 	if (!pack_kept_objects)
 		strvec_push(&cmd.args, "--honor-pack-keep");
-	for (i = 0; i < keep_pack_list->nr; i++)
-		strvec_pushf(&cmd.args, "--keep-pack=%s",
-			     keep_pack_list->items[i].string);
+	for_each_string_list_item(item, &existing->kept_packs)
+		strvec_pushf(&cmd.args, "--keep-pack=%s", item->string);

 	cmd.in = -1;

@@ -872,10 +869,12 @@ static int write_filtered_pack(const struct pack_objects_args *args,
 	in = xfdopen(cmd.in, "w");
 	for_each_string_list_item(item, names)
 		fprintf(in, "^%s-%s.pack\n", pack_prefix, item->string);
-	for_each_string_list_item(item, existing_packs)
+	for_each_string_list_item(item, &existing->non_kept_packs)
+		fprintf(in, "%s.pack\n", item->string);
+	for_each_string_list_item(item, &existing->cruft_packs)
 		fprintf(in, "%s.pack\n", item->string);
 	caret = pack_kept_objects ? "" : "^";
-	for_each_string_list_item(item, existing_kept_packs)
+	for_each_string_list_item(item, &existing->kept_packs)
 		fprintf(in, "%s%s.pack\n", caret, item->string);
 	fclose(in);

@@ -1261,10 +1260,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		ret = write_filtered_pack(&po_args,
 					  packtmp,
 					  find_pack_prefix(packdir, packtmp),
-					  &keep_pack_list,
-					  &names,
-					  &existing_nonkept_packs,
-					  &existing_kept_packs);
+					  &existing,
+					  &names);
 		if (ret)
 			goto cleanup;
 	}
--
2.42.0.137.g6fe1dff026
--- >8 ---

Thanks,
Taylor
Christian Couder Sept. 15, 2023, 5:56 a.m. UTC | #6
On Thu, Sep 14, 2023 at 8:01 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Thu, Sep 14, 2023 at 01:10:38PM +0200, Christian Couder wrote:
> > Ok, I will try to review and merge this with
> > cc/repack-sift-filtered-objects-to-separate-pack soon.
>
> I took a look at how much/little effort was going to be required, and
> luckily the changes are isolated only to a single patch. It's just your
> f1ffa71e8f (repack: add `--filter=<filter-spec>` option, 2023-09-11),
> and in particular the `write_filtered_pack()` function.
>
> I started messing around with it myself and generated the following
> fixup! which can be applied on top of your version of f1ffa71e8f. It's
> mostly straightforward, but there is a gotcha that the loop over
> non-kept packs has to change to:
>
>     for_each_string_list_item(item, &existing->non_kept_packs)
>             /* ... */
>     for_each_string_list_item(item, &existing->cruft_packs)
>             /* ... */
>
> , instead of just the first loop over non_kept_packs, since cruft packs
> are stored in a separate list.
>
> In any event, here's the fixup! I generated on top of that patch:

Thanks a lot! I will very likely use it in the new version I will send
after your series has graduated.
Christian Couder Sept. 15, 2023, 10:09 a.m. UTC | #7
On Wed, Sep 13, 2023 at 9:17 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> Here is a small reroll of my series to clean up some of the internals of
> 'git repack' used to track the set of existing packs.
>
> Much is unchanged from the last round, save for some additional clean-up
> on how we handle the '->util' field for each pack's string_list_item in
> response to very helpful review from those CC'd.
>
> As usual, a range-diff is available below for convenience. Thanks in
> advance for your review!
>
> Taylor Blau (8):
>   builtin/repack.c: extract structure to store existing packs
>   builtin/repack.c: extract marking packs for deletion
>   builtin/repack.c: extract redundant pack cleanup for --geometric
>   builtin/repack.c: extract redundant pack cleanup for existing packs
>   builtin/repack.c: extract `has_existing_non_kept_packs()`
>   builtin/repack.c: store existing cruft packs separately
>   builtin/repack.c: avoid directly inspecting "util"
>   builtin/repack.c: extract common cruft pack loop

I think it would be a bit nicer with s/builtin\/repack.c/repack/ in
all the above commit subjects, but I don't think it's worth a reroll.

Except for another very small nit in a commit message also not worth a
reroll, this LGTM.

Thanks,
Christian.