diff mbox series

pack-refs: teach --exclude option to exclude refs from being packed

Message ID pull.1501.git.git.1683215331910.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series pack-refs: teach --exclude option to exclude refs from being packed | expand

Commit Message

John Cai May 4, 2023, 3:48 p.m. UTC
From: John Cai <jcai@gitlab.com>

Currently once refs are packed into a pack-refs file, deleting them can be
slow since it involves a full file scan of the entire pack-refs file. At
GitLab, we have a system that creates ephemeral internal refs that don't
live long before getting deleted. Having an option to not include certain
refs from a pack-refs file allows these internal references to be deleted
much more efficiently.

Add an --exclude option to the pack-refs builtin, and use the ref
exclusions API to exclude certain refs from being packed into the final
pack-refs file

Signed-off-by: John Cai <johncai86@gmail.com>
---
    pack-refs: Teach --exclude option to exclude refs from being packed
    
    Add an --exclude option to the pack-refs builtin, to prevent certain
    refs from being packed into the final packrefs file. At GitLab we keep
    ephemeral internal references that quickly get deleted. Being able to
    exclude them from the packrefs file will avoid incurring the performance
    hit from ref deletions--especially since it scales with packrefs size.
    
    It's worth noting that [1] discussed a proposal for a pack refs v2
    format that would improve deletion speeds. The ref-table backend would
    also improve performance of deletions. However, both of those proposals
    are still being discussed.
    
     1. https://lore.kernel.org/git/pull.1408.git.1667846164.gitgitgadget@gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1501%2Fjohn-cai%2Fjc%2Fexclude-refs-from-pack-refs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1501/john-cai/jc/exclude-refs-from-pack-refs-v1
Pull-Request: https://github.com/git/git/pull/1501

 Documentation/git-pack-refs.txt |  8 +++++++-
 builtin/pack-refs.c             | 17 +++++++++++++++--
 refs.c                          |  4 ++--
 refs.h                          |  6 +++++-
 refs/debug.c                    |  4 ++--
 refs/files-backend.c            | 13 ++++++++++---
 refs/packed-backend.c           |  3 ++-
 refs/refs-internal.h            |  4 +++-
 t/helper/test-ref-store.c       |  3 ++-
 t/t3210-pack-refs.sh            | 18 ++++++++++++++++++
 10 files changed, 66 insertions(+), 14 deletions(-)


base-commit: f85cd430b12b0d3e4f1a30ef3239a1b73d5f6331

Comments

Junio C Hamano May 4, 2023, 4:48 p.m. UTC | #1
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <jcai@gitlab.com>
>
> Currently once refs are packed into a pack-refs file, deleting them can be
> slow since it involves a full file scan of the entire pack-refs file. At

"pack-refs" -> "packed-refs".

But performing a full file scan of 100MB file would take how many
milliseconds?  Having to remove many refs from a single packed-refs
file would be costly if it is done one-by-one, though.  I wonder how
effective our batched update mechanism is these days...

Sorry for straying to a tangent.  In any case, I do not think the
first sentence is necessary; just start it with "At GitLab, ..." to
say that you have a need to be more selective than "is it a tag or
not?" to choose refs to be and not to be packed.  The reason why we
may want to leave a ref loose is not limited to ref deletion
performance, and the benefit of this new feature is not limited to
it, either.

> GitLab, we have a system that creates ephemeral internal refs that don't
> live long before getting deleted. Having an option to not include certain
> refs from a pack-refs file allows these internal references to be deleted
> much more efficiently.

I think that is a valid use case.

If we step back a bit, we can see "pack-refs" has an option "--all"
that was added by b3d4204f (git-pack-refs --all, 2006-10-08), to
allow us pack only tags by default, because packing branches that
are meant to be updated often and also removed more often than tags
were found to be detrimental.  We can view this new option a
follow-up work for the commit, to allow the users to define what to
be and what not to be packed, depending on their workflow.

This observation also makes us realize that we should consider the
opposite.  It would benefit us to include some refs that we normally
do not pack and be more selective than "--all" ("--exclude" proposed
here is a way to leave some refs that we normally pack and be more
selective than not running pack-refs at all).  A set of branches
that are only occasionally used may want to be packed while the rest
of branches want to be left loose because they are more active, or
something.

> Add an --exclude option to the pack-refs builtin, and use the ref
> exclusions API to exclude certain refs from being packed into the final
> pack-refs file

"pack-refs" appears here, too.

>  Documentation/git-pack-refs.txt |  8 +++++++-
>  builtin/pack-refs.c             | 17 +++++++++++++++--
>  refs.c                          |  4 ++--
>  refs.h                          |  6 +++++-
>  refs/debug.c                    |  4 ++--
>  refs/files-backend.c            | 13 ++++++++++---
>  refs/packed-backend.c           |  3 ++-
>  refs/refs-internal.h            |  4 +++-
>  t/helper/test-ref-store.c       |  3 ++-
>  t/t3210-pack-refs.sh            | 18 ++++++++++++++++++
>  10 files changed, 66 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/git-pack-refs.txt b/Documentation/git-pack-refs.txt
> index 154081f2de2..d80e0a1562d 100644
> --- a/Documentation/git-pack-refs.txt
> +++ b/Documentation/git-pack-refs.txt
> @@ -8,7 +8,7 @@ git-pack-refs - Pack heads and tags for efficient repository access
>  SYNOPSIS
>  --------
>  [verse]
> -'git pack-refs' [--all] [--no-prune]
> +'git pack-refs' [--all] [--no-prune] [--exclude <pattern>]
>  
>  DESCRIPTION
>  -----------
> @@ -59,6 +59,12 @@ a repository with many branches of historical interests.
>  The command usually removes loose refs under `$GIT_DIR/refs`
>  hierarchy after packing them.  This option tells it not to.
>  
> +--exclude <pattern>::
> +
> +Do not pack refs matching the given `glob(7)` pattern. Repetitions of this option
> +accumulate exclusion patterns. Use `--no-exclude` to clear and reset the list of
> +patterns.

The interaction of this option with "--all" needs to be described
somewhere.  If we are to be adding "--include" for completeness,
that one also needs to interact with "--all".

> diff --git a/refs.c b/refs.c
> index d2a98e1c21f..637810347a0 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2132,9 +2132,9 @@ void base_ref_store_init(struct ref_store *refs, struct repository *repo,
>  }
>  
>  /* backend functions */
> -int refs_pack_refs(struct ref_store *refs, unsigned int flags)
> +int refs_pack_refs(struct ref_store *refs, unsigned int flags, struct pack_refs_opts *pack_opts)
>  {
> -	return refs->be->pack_refs(refs, flags);
> +	return refs->be->pack_refs(refs, flags, pack_opts);

That's a curious choice of the API.  It is not like backend methods
all share the same "flags" bitset (they share "refs" pointer to the
ref_store), so I would have expected that it would become part of
the pack_refs_opts structure.  Do not call the parameter pack_opts;
either spell it out as "pack_refs_opts", or just use "opts".  The
latter is probably more preferrable as I do not expect it to be
ambiguous with other kinds of "opts".

The rest of the pack I found nothing unexpected or surprising.
John Cai May 4, 2023, 9:26 p.m. UTC | #2
On 23/05/04 09:48AM, Junio C Hamano wrote:
> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>

Hi Junio,

> > From: John Cai <jcai@gitlab.com>
> >
> > Currently once refs are packed into a pack-refs file, deleting them can be
> > slow since it involves a full file scan of the entire pack-refs file. At
> 
> "pack-refs" -> "packed-refs".
> 
> But performing a full file scan of 100MB file would take how many
> milliseconds?  Having to remove many refs from a single packed-refs
> file would be costly if it is done one-by-one, though.  I wonder how
> effective our batched update mechanism is these days...
> 
> Sorry for straying to a tangent.  In any case, I do not think the
> first sentence is necessary; just start it with "At GitLab, ..." to
> say that you have a need to be more selective than "is it a tag or
> not?" to choose refs to be and not to be packed.  The reason why we
> may want to leave a ref loose is not limited to ref deletion
> performance, and the benefit of this new feature is not limited to
> it, either.

Good point

> 
> > GitLab, we have a system that creates ephemeral internal refs that don't
> > live long before getting deleted. Having an option to not include certain
> > refs from a pack-refs file allows these internal references to be deleted
> > much more efficiently.
> 
> I think that is a valid use case.
> 
> If we step back a bit, we can see "pack-refs" has an option "--all"
> that was added by b3d4204f (git-pack-refs --all, 2006-10-08), to
> allow us pack only tags by default, because packing branches that
> are meant to be updated often and also removed more often than tags
> were found to be detrimental.  We can view this new option a
> follow-up work for the commit, to allow the users to define what to
> be and what not to be packed, depending on their workflow.
> 
> This observation also makes us realize that we should consider the
> opposite.  It would benefit us to include some refs that we normally
> do not pack and be more selective than "--all" ("--exclude" proposed
> here is a way to leave some refs that we normally pack and be more
> selective than not running pack-refs at all).  A set of branches
> that are only occasionally used may want to be packed while the rest
> of branches want to be left loose because they are more active, or
> something.

Yeah, that's a good observation. It would be nice to add an --include option to
give the user full flexibility of which refs to include.

> 
> > Add an --exclude option to the pack-refs builtin, and use the ref
> > exclusions API to exclude certain refs from being packed into the final
> > pack-refs file
> 
> "pack-refs" appears here, too.
> 
> >  Documentation/git-pack-refs.txt |  8 +++++++-
> >  builtin/pack-refs.c             | 17 +++++++++++++++--
> >  refs.c                          |  4 ++--
> >  refs.h                          |  6 +++++-
> >  refs/debug.c                    |  4 ++--
> >  refs/files-backend.c            | 13 ++++++++++---
> >  refs/packed-backend.c           |  3 ++-
> >  refs/refs-internal.h            |  4 +++-
> >  t/helper/test-ref-store.c       |  3 ++-
> >  t/t3210-pack-refs.sh            | 18 ++++++++++++++++++
> >  10 files changed, 66 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/git-pack-refs.txt b/Documentation/git-pack-refs.txt
> > index 154081f2de2..d80e0a1562d 100644
> > --- a/Documentation/git-pack-refs.txt
> > +++ b/Documentation/git-pack-refs.txt
> > @@ -8,7 +8,7 @@ git-pack-refs - Pack heads and tags for efficient repository access
> >  SYNOPSIS
> >  --------
> >  [verse]
> > -'git pack-refs' [--all] [--no-prune]
> > +'git pack-refs' [--all] [--no-prune] [--exclude <pattern>]
> >  
> >  DESCRIPTION
> >  -----------
> > @@ -59,6 +59,12 @@ a repository with many branches of historical interests.
> >  The command usually removes loose refs under `$GIT_DIR/refs`
> >  hierarchy after packing them.  This option tells it not to.
> >  
> > +--exclude <pattern>::
> > +
> > +Do not pack refs matching the given `glob(7)` pattern. Repetitions of this option
> > +accumulate exclusion patterns. Use `--no-exclude` to clear and reset the list of
> > +patterns.
> 
> The interaction of this option with "--all" needs to be described
> somewhere.  If we are to be adding "--include" for completeness,
> that one also needs to interact with "--all".

Sounds good

> 
> > diff --git a/refs.c b/refs.c
> > index d2a98e1c21f..637810347a0 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -2132,9 +2132,9 @@ void base_ref_store_init(struct ref_store *refs, struct repository *repo,
> >  }
> >  
> >  /* backend functions */
> > -int refs_pack_refs(struct ref_store *refs, unsigned int flags)
> > +int refs_pack_refs(struct ref_store *refs, unsigned int flags, struct pack_refs_opts *pack_opts)
> >  {
> > -	return refs->be->pack_refs(refs, flags);
> > +	return refs->be->pack_refs(refs, flags, pack_opts);
> 
> That's a curious choice of the API.  It is not like backend methods
> all share the same "flags" bitset (they share "refs" pointer to the
> ref_store), so I would have expected that it would become part of
> the pack_refs_opts structure.  Do not call the parameter pack_opts;
> either spell it out as "pack_refs_opts", or just use "opts".  The
> latter is probably more preferrable as I do not expect it to be
> ambiguous with other kinds of "opts".

I didn't notice this, but it makes sense. We can move flags into the
pack_refs_opts struct.

> 
> The rest of the pack I found nothing unexpected or surprising.

thanks
John
diff mbox series

Patch

diff --git a/Documentation/git-pack-refs.txt b/Documentation/git-pack-refs.txt
index 154081f2de2..d80e0a1562d 100644
--- a/Documentation/git-pack-refs.txt
+++ b/Documentation/git-pack-refs.txt
@@ -8,7 +8,7 @@  git-pack-refs - Pack heads and tags for efficient repository access
 SYNOPSIS
 --------
 [verse]
-'git pack-refs' [--all] [--no-prune]
+'git pack-refs' [--all] [--no-prune] [--exclude <pattern>]
 
 DESCRIPTION
 -----------
@@ -59,6 +59,12 @@  a repository with many branches of historical interests.
 The command usually removes loose refs under `$GIT_DIR/refs`
 hierarchy after packing them.  This option tells it not to.
 
+--exclude <pattern>::
+
+Do not pack refs matching the given `glob(7)` pattern. Repetitions of this option
+accumulate exclusion patterns. Use `--no-exclude` to clear and reset the list of
+patterns.
+
 
 BUGS
 ----
diff --git a/builtin/pack-refs.c b/builtin/pack-refs.c
index 9833815fb30..bc2db41c2cb 100644
--- a/builtin/pack-refs.c
+++ b/builtin/pack-refs.c
@@ -4,22 +4,35 @@ 
 #include "parse-options.h"
 #include "refs.h"
 #include "repository.h"
+#include "revision.h"
 
 static char const * const pack_refs_usage[] = {
-	N_("git pack-refs [--all] [--no-prune]"),
+	N_("git pack-refs [--all] [--no-prune] [--exclude <pattern>]"),
 	NULL
 };
 
 int cmd_pack_refs(int argc, const char **argv, const char *prefix)
 {
 	unsigned int flags = PACK_REFS_PRUNE;
+	static struct ref_exclusions excludes = REF_EXCLUSIONS_INIT;
+	struct pack_refs_opts pack_opts = {.exclusions = &excludes};
+	static struct string_list option_excluded_refs = STRING_LIST_INIT_NODUP;
+	struct string_list_item *item;
+
 	struct option opts[] = {
 		OPT_BIT(0, "all",   &flags, N_("pack everything"), PACK_REFS_ALL),
 		OPT_BIT(0, "prune", &flags, N_("prune loose refs (default)"), PACK_REFS_PRUNE),
+		OPT_STRING_LIST(0, "exclude", &option_excluded_refs, N_("pattern"),
+			N_("references to exclude")),
 		OPT_END(),
 	};
+
 	git_config(git_default_config, NULL);
 	if (parse_options(argc, argv, prefix, opts, pack_refs_usage, 0))
 		usage_with_options(pack_refs_usage, opts);
-	return refs_pack_refs(get_main_ref_store(the_repository), flags);
+
+	for_each_string_list_item(item, &option_excluded_refs)
+		add_ref_exclusion(pack_opts.exclusions, item->string);
+
+	return refs_pack_refs(get_main_ref_store(the_repository), flags, &pack_opts);
 }
diff --git a/refs.c b/refs.c
index d2a98e1c21f..637810347a0 100644
--- a/refs.c
+++ b/refs.c
@@ -2132,9 +2132,9 @@  void base_ref_store_init(struct ref_store *refs, struct repository *repo,
 }
 
 /* backend functions */
-int refs_pack_refs(struct ref_store *refs, unsigned int flags)
+int refs_pack_refs(struct ref_store *refs, unsigned int flags, struct pack_refs_opts *pack_opts)
 {
-	return refs->be->pack_refs(refs, flags);
+	return refs->be->pack_refs(refs, flags, pack_opts);
 }
 
 int peel_iterated_oid(const struct object_id *base, struct object_id *peeled)
diff --git a/refs.h b/refs.h
index 123cfa44244..a76dbb2d3f0 100644
--- a/refs.h
+++ b/refs.h
@@ -63,6 +63,10 @@  struct worktree;
 #define RESOLVE_REF_NO_RECURSE 0x02
 #define RESOLVE_REF_ALLOW_BAD_NAME 0x04
 
+struct pack_refs_opts {
+	struct ref_exclusions *exclusions;
+};
+
 const char *refs_resolve_ref_unsafe(struct ref_store *refs,
 				    const char *refname,
 				    int resolve_flags,
@@ -405,7 +409,7 @@  void warn_dangling_symrefs(FILE *fp, const char *msg_fmt,
  * Write a packed-refs file for the current repository.
  * flags: Combination of the above PACK_REFS_* flags.
  */
-int refs_pack_refs(struct ref_store *refs, unsigned int flags);
+int refs_pack_refs(struct ref_store *refs, unsigned int flags, struct pack_refs_opts *opts);
 
 /*
  * Setup reflog before using. Fill in err and return -1 on failure.
diff --git a/refs/debug.c b/refs/debug.c
index adc34c836fc..93f6168052a 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -122,10 +122,10 @@  static int debug_initial_transaction_commit(struct ref_store *refs,
 	return res;
 }
 
-static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags)
+static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags, struct pack_refs_opts *opts)
 {
 	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
-	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
+	int res = drefs->refs->be->pack_refs(drefs->refs, flags, opts);
 	trace_printf_key(&trace_refs, "pack_refs: %d\n", res);
 	return res;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d0581ee41ac..b77c474919f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -19,6 +19,7 @@ 
 #include "../worktree.h"
 #include "../wrapper.h"
 #include "../write-or-die.h"
+#include "../revision.h"
 
 /*
  * This backend uses the following flags in `ref_update::flags` for
@@ -1173,7 +1174,8 @@  static void prune_refs(struct files_ref_store *refs, struct ref_to_prune **refs_
  */
 static int should_pack_ref(const char *refname,
 			   const struct object_id *oid, unsigned int ref_flags,
-			   unsigned int pack_flags)
+			   unsigned int pack_flags,
+			   struct pack_refs_opts *opts)
 {
 	/* Do not pack per-worktree refs: */
 	if (parse_worktree_ref(refname, NULL, NULL, NULL) !=
@@ -1192,10 +1194,15 @@  static int should_pack_ref(const char *refname,
 	if (!ref_resolves_to_object(refname, the_repository, oid, ref_flags))
 		return 0;
 
+	if (opts->exclusions && ref_excluded(opts->exclusions, refname))
+		return 0;
+
 	return 1;
 }
 
-static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
+static int files_pack_refs(struct ref_store *ref_store,
+			   unsigned int flags,
+			   struct pack_refs_opts *pack_opts)
 {
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_WRITE | REF_STORE_ODB,
@@ -1221,7 +1228,7 @@  static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
 		 * pruned, also add it to refs_to_prune.
 		 */
 		if (!should_pack_ref(iter->refname, iter->oid, iter->flags,
-				     flags))
+				     flags, pack_opts))
 			continue;
 
 		/*
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2333ed5a1f7..8a611dd92b4 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1576,7 +1576,8 @@  static int packed_delete_refs(struct ref_store *ref_store, const char *msg,
 }
 
 static int packed_pack_refs(struct ref_store *ref_store UNUSED,
-			    unsigned int flags UNUSED)
+			    unsigned int flags UNUSED,
+			    struct pack_refs_opts *pack_opts UNUSED)
 {
 	/*
 	 * Packed refs are already packed. It might be that loose refs
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index a85d113123c..7b4852104f0 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -547,7 +547,9 @@  typedef int ref_transaction_commit_fn(struct ref_store *refs,
 				      struct ref_transaction *transaction,
 				      struct strbuf *err);
 
-typedef int pack_refs_fn(struct ref_store *ref_store, unsigned int flags);
+typedef int pack_refs_fn(struct ref_store *ref_store,
+			 unsigned int flags,
+			 struct pack_refs_opts *opts);
 typedef int create_symref_fn(struct ref_store *ref_store,
 			     const char *ref_target,
 			     const char *refs_heads_master,
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 6d8f844e9c7..0405cacc99b 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -116,8 +116,9 @@  static struct flag_definition pack_flags[] = { FLAG_DEF(PACK_REFS_PRUNE),
 static int cmd_pack_refs(struct ref_store *refs, const char **argv)
 {
 	unsigned int flags = arg_flags(*argv++, "flags", pack_flags);
+	struct pack_refs_opts pack_opts = { 0 };
 
-	return refs_pack_refs(refs, flags);
+	return refs_pack_refs(refs, flags, &pack_opts);
 }
 
 static int cmd_create_symref(struct ref_store *refs, const char **argv)
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index 07a0ff93def..31b9f72e84a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -108,6 +108,24 @@  test_expect_success \
      git branch -d n/o/p &&
      git branch n'
 
+test_expect_success \
+    'test excluded refs are not packed' '
+     git branch dont_pack1 &&
+     git branch dont_pack2 &&
+     git branch pack_this &&
+     git pack-refs --all --exclude refs/heads/dont_pack* &&
+     test -f .git/refs/heads/dont_pack1 &&
+     test -f .git/refs/heads/dont_pack2 &&
+     ! test -f ./git/refs/heads/pack_this'
+
+test_expect_success \
+    'test --no-exclude refs clears excluded refs' '
+     git branch dont_pack3 &&
+     git branch dont_pack4 &&
+     git pack-refs --all --exclude refs/heads/dont_pack* --no-exclude &&
+     ! test -f .git/refs/heads/dont_pack3 &&
+     ! test -f .git/refs/heads/dont_pack4'
+
 test_expect_success \
 	'see if up-to-date packed refs are preserved' \
 	'git branch q &&