diff mbox series

[v2] maintenance: add prune-remote-refs task

Message ID pull.1838.v2.git.1735380461980.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series [v2] maintenance: add prune-remote-refs task | expand

Commit Message

Shubham Kanodia Dec. 28, 2024, 10:07 a.m. UTC
From: Shubham Kanodia <shubham.kanodia10@gmail.com>

Remote-tracking refs can accumulate in local repositories even as branches
are deleted on remotes, impacting git performance negatively. Existing
alternatives to keep refs pruned have a few issues — 

1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
prune stale refs, but requires a manual operation and also pulls in new
refs from remote which can be an undesirable side-effect.

2.`git remote prune` cleans up refs without adding to the existing list
but requires periodic user intervention.

This adds a new maintenance task 'prune-remote-refs' that runs
'git remote prune' for each configured remote daily. This provides an
automated way to clean up stale remote-tracking refs — especially when
users may not do a full fetch.

This task is disabled by default.

Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
---
    maintenance: add prune-remote-refs task
    
    As discussed previously on:
    https://lore.kernel.org/git/xmqqwmfr112w.fsf@gitster.g/T/#t
    
    Remote-tracking refs can accumulate in local repositories even as
    branches are deleted on remotes, impacting git performance negatively.
    Existing alternatives to keep refs pruned have a few issues — 
    
     1. The fetch.prune config automatically cleans up remote ref on fetch,
        but also pulls in new ref from remote which is an undesirable
        side-effect.
    
    2.git remote prune cleans up refs without adding to the existing list
    but requires periodic user intervention.
    
    This adds a new maintenance task 'prune-remote-refs' that runs 'git
    remote prune' for each configured remote daily. This provides an
    automated way to clean up stale remote-tracking refs — especially when
    users may not do a full fetch.
    
    This task is disabled by default.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1838%2Fpastelsky%2Fsk%2Fadd-remote-prune-maintenance-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1838/pastelsky/sk/add-remote-prune-maintenance-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1838

Range-diff vs v1:

 1:  72e27d3ebe6 ! 1:  4d6c143c970 maintenance: add prune-remote-refs task
     @@ Commit message
          are deleted on remotes, impacting git performance negatively. Existing
          alternatives to keep refs pruned have a few issues — 
      
     -    1. The `fetch.prune` config automatically cleans up remote ref on fetch,
     -    but also pulls in new ref from remote which is an undesirable side-effect.
     +    1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
     +    prune stale refs, but requires a manual operation and also pulls in new
     +    refs from remote which can be an undesirable side-effect.
      
          2.`git remote prune` cleans up refs without adding to the existing list
          but requires periodic user intervention.
     @@ Documentation/git-maintenance.txt: pack-refs::
      ++
      +--
      +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
     -+  where `fetch.prune` would not affect refs outside the fetched hierarchy
     ++  where `fetch.prune` would only affect refs that are explicitly fetched.
      +* When third-party tools might perform unexpected full fetches, and you want
     -+  periodic cleanup independently of fetch operations
     ++  periodic cleanup independently of fetch operations.
      +--
      +
       OPTIONS
     @@ builtin/gc.c: static int maintenance_opt_schedule(const struct option *opt, cons
       	return 0;
       }
       
     -+static int collect_remote(struct remote *remote, void *cb_data)
     ++static int prune_remote(struct remote *remote, void *cb_data UNUSED)
      +{
     -+	struct string_list *list = cb_data;
     ++	struct child_process child = CHILD_PROCESS_INIT;
      +
      +	if (!remote->url.nr)
      +		return 0;
      +
     -+	string_list_append(list, remote->name);
     -+	return 0;
     ++	child.git_cmd = 1;
     ++	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
     ++
     ++	return !!run_command(&child);
      +}
      +
     -+static int maintenance_task_prune_remote(struct maintenance_run_opts *opts UNUSED,
     ++static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
      +					 struct gc_config *cfg UNUSED)
      +{
     -+	struct string_list_item *item;
     -+	struct string_list remotes_list = STRING_LIST_INIT_NODUP;
     -+	struct child_process child = CHILD_PROCESS_INIT;
     -+	int result = 0;
     -+
     -+	for_each_remote(collect_remote, &remotes_list);
     -+
     -+	for_each_string_list_item (item, &remotes_list) {
     -+		const char *remote_name = item->string;
     -+		child.git_cmd = 1;
     -+		strvec_pushl(&child.args, "remote", "prune", remote_name, NULL);
     -+
     -+		if (run_command(&child))
     -+			result = error(_("failed to prune '%s'"), remote_name);
     ++	if (for_each_remote(prune_remote, opts)) {
     ++		error(_("failed to prune remotes"));
     ++		return 1;
      +	}
      +
     -+	string_list_clear(&remotes_list, 0);
     -+	return result;
     ++	return 0;
      +}
      +
       /* Remember to update object flag allocation in object.h */


 Documentation/git-maintenance.txt | 20 ++++++++++++++
 builtin/gc.c                      | 32 ++++++++++++++++++++++
 t/t7900-maintenance.sh            | 44 +++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+)


base-commit: 063bcebf0c917140ca0e705cbe0fdea127e90086

Comments

Junio C Hamano Dec. 28, 2024, 4:25 p.m. UTC | #1
"Shubham Kanodia via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Shubham Kanodia <shubham.kanodia10@gmail.com>
>
> Remote-tracking refs can accumulate in local repositories even as branches
> are deleted on remotes, impacting git performance negatively. Existing
> alternatives to keep refs pruned have a few issues — 
>
> 1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
> prune stale refs, but requires a manual operation and also pulls in new
> refs from remote which can be an undesirable side-effect.

It is only true if you cloned without any tweaks.  For example, if
you cloned with the single-branch option, you would not pull in new
refs, wouldn't you?  Also "requires a manual operation" is not quite
a good rationale, as you could have placed such a "git fetch"
instead of "git remote prune", in the maintenance schedule.

For this to become an issue, the condition we saw in earlier
discussion, i.e.

    while having the *default* refspec
    "+refs/heads/*:refs/remotes/$name/*" configured in
    remote.$name.fetch

is crucial.  Since that is the default refspec "git clone" gives
you, your "git fetch --prune" would give you full set of refs while
pruning, and the end result is that you have up-to-date set of
remote-tracking branches (which you may not want).

> 2.`git remote prune` cleans up refs without adding to the existing list
> but requires periodic user intervention.

You have a SP after "1." but not after "2.".

> This adds a new maintenance task 'prune-remote-refs' that runs
> 'git remote prune' for each configured remote daily. This provides an
> automated way to clean up stale remote-tracking refs — especially when
> users may not do a full fetch.

"This adds" -> "Add".

I'd strike the latter sentence.  Regardless of what users do or do
not do, the automated clean-up is performed.

> This task is disabled by default.
>
> Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
> ---

> +NOTE: This task is opt-in to prevent unexpected removal of remote refs
> +for users of git-maintenance. For most users, configuring `fetch.prune=true`
> +is a acceptable solution, as it will automatically clean up stale remote-tracking

"a acceptable" -> "an acceptable".

> +branches during normal fetch operations. However, this task can be useful in
> +specific scenarios:
> ++
> +--
> +* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
> +  where `fetch.prune` would only affect refs that are explicitly fetched.
> +* When third-party tools might perform unexpected full fetches, and you want
> +  periodic cleanup independently of fetch operations.
> +--

Very well written.

> @@ -913,6 +914,30 @@ static int maintenance_opt_schedule(const struct option *opt, const char *arg,
>  	return 0;
>  }
>  
> +static int prune_remote(struct remote *remote, void *cb_data UNUSED)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	if (!remote->url.nr)
> +		return 0;
> +
> +	child.git_cmd = 1;
> +	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
> +
> +	return !!run_command(&child);
> +}
> +
> +static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
> +					 struct gc_config *cfg UNUSED)
> +{
> +	if (for_each_remote(prune_remote, opts)) {
> +		error(_("failed to prune remotes"));
> +		return 1;
> +	}
> +
> +	return 0;
> +}

Nice reuse of the program structure, which is very clean and easy to read.

Overall very well written.  Will queue, with attached range-diff.
Thanks.


--- >8 ---
1:  0ae9668b5c ! 1:  8a40f8b319 maintenance: add prune-remote-refs task
    @@ Commit message
     
         Remote-tracking refs can accumulate in local repositories even as branches
         are deleted on remotes, impacting git performance negatively. Existing
    -    alternatives to keep refs pruned have a few issues — 
    +    alternatives to keep refs pruned have a few issues:
     
    -    1. Running `git fetch` with either `--prune` or `fetch.prune=true` set will
    -    prune stale refs, but requires a manual operation and also pulls in new
    -    refs from remote which can be an undesirable side-effect.
    +      1. Running `git fetch` with either `--prune` or `fetch.prune=true`
    +         set, with the default refspec to copy all their branches into
    +         our remote-tracking branches, will prune stale refs, but also
    +         pulls in new branches from remote.  That is undesirable if the
    +         user wants to only work with a selected few remote branches.
     
    -    2.`git remote prune` cleans up refs without adding to the existing list
    -    but requires periodic user intervention.
    +      2. `git remote prune` cleans up refs without adding to the
    +         existing list but requires periodic user intervention.
     
    -    This adds a new maintenance task 'prune-remote-refs' that runs
    -    'git remote prune' for each configured remote daily. This provides an
    -    automated way to clean up stale remote-tracking refs — especially when
    -    users may not do a full fetch.
    -
    -    This task is disabled by default.
    +    Add a new maintenance task 'prune-remote-refs' that runs 'git remote
    +    prune' for each configured remote daily.  Leave the task disabled by
    +    default, as it may be unexpected to see their remote-tracking
    +    branches to disappear while they are not watching for unsuspecting
    +    users.
     
         Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
    @@ Documentation/git-maintenance.txt: pack-refs::
     ++
     +NOTE: This task is opt-in to prevent unexpected removal of remote refs
     +for users of git-maintenance. For most users, configuring `fetch.prune=true`
    -+is a acceptable solution, as it will automatically clean up stale remote-tracking
    ++is an acceptable solution, as it will automatically clean up stale remote-tracking
     +branches during normal fetch operations. However, this task can be useful in
     +specific scenarios:
     ++
diff mbox series

Patch

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 6e6651309d3..8b3e496c8ef 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -158,6 +158,26 @@  pack-refs::
 	need to iterate across many references. See linkgit:git-pack-refs[1]
 	for more information.
 
+prune-remote-refs::
+	The `prune-remote-refs` task runs `git remote prune` on each remote
+	repository registered in the local repository. This task helps clean
+	up deleted remote branches, improving the performance of operations
+	that iterate through the refs. See linkgit:git-remote[1] for more
+	information. This task is disabled by default.
++
+NOTE: This task is opt-in to prevent unexpected removal of remote refs
+for users of git-maintenance. For most users, configuring `fetch.prune=true`
+is a acceptable solution, as it will automatically clean up stale remote-tracking
+branches during normal fetch operations. However, this task can be useful in
+specific scenarios:
++
+--
+* When using selective fetching (e.g., `git fetch origin +foo:refs/remotes/origin/foo`)
+  where `fetch.prune` would only affect refs that are explicitly fetched.
+* When third-party tools might perform unexpected full fetches, and you want
+  periodic cleanup independently of fetch operations.
+--
+
 OPTIONS
 -------
 --auto::
diff --git a/builtin/gc.c b/builtin/gc.c
index 4ae5196aedf..329c764f300 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -20,6 +20,7 @@ 
 #include "lockfile.h"
 #include "parse-options.h"
 #include "run-command.h"
+#include "remote.h"
 #include "sigchain.h"
 #include "strvec.h"
 #include "commit.h"
@@ -913,6 +914,30 @@  static int maintenance_opt_schedule(const struct option *opt, const char *arg,
 	return 0;
 }
 
+static int prune_remote(struct remote *remote, void *cb_data UNUSED)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	if (!remote->url.nr)
+		return 0;
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "remote", "prune", remote->name, NULL);
+
+	return !!run_command(&child);
+}
+
+static int maintenance_task_prune_remote(struct maintenance_run_opts *opts,
+					 struct gc_config *cfg UNUSED)
+{
+	if (for_each_remote(prune_remote, opts)) {
+		error(_("failed to prune remotes"));
+		return 1;
+	}
+
+	return 0;
+}
+
 /* Remember to update object flag allocation in object.h */
 #define SEEN		(1u<<0)
 
@@ -1375,6 +1400,7 @@  enum maintenance_task_label {
 	TASK_GC,
 	TASK_COMMIT_GRAPH,
 	TASK_PACK_REFS,
+	TASK_PRUNE_REMOTE_REFS,
 
 	/* Leave as final value */
 	TASK__COUNT
@@ -1411,6 +1437,10 @@  static struct maintenance_task tasks[] = {
 		maintenance_task_pack_refs,
 		pack_refs_condition,
 	},
+	[TASK_PRUNE_REMOTE_REFS] = {
+		"prune-remote-refs",
+		maintenance_task_prune_remote,
+	},
 };
 
 static int compare_tasks_by_selection(const void *a_, const void *b_)
@@ -1505,6 +1535,8 @@  static void initialize_maintenance_strategy(void)
 		tasks[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY;
 		tasks[TASK_PACK_REFS].enabled = 1;
 		tasks[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY;
+		tasks[TASK_PRUNE_REMOTE_REFS].enabled = 0;
+		tasks[TASK_PRUNE_REMOTE_REFS].schedule = SCHEDULE_DAILY;
 	}
 }
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 0ce4ba1cbef..60a0c3f8353 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -446,6 +446,50 @@  test_expect_success 'pack-refs task' '
 	test_subcommand git pack-refs --all --prune <pack-refs.txt
 '
 
+test_expect_success 'prune-remote-refs task not enabled by default' '
+	git clone . prune-test &&
+	(
+		cd prune-test &&
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run 2>err &&
+		test_subcommand ! git remote prune origin <prune.txt
+	)
+'
+
+test_expect_success 'prune-remote-refs task cleans stale remote refs' '
+	test_commit initial &&
+
+	# Create two separate remote repos
+	git clone . remote1 &&
+	git clone . remote2 &&
+
+	git clone . prune-test-clean &&
+	(
+		cd prune-test-clean &&
+		git config maintenance.prune-remote-refs.enabled true &&
+
+		# Add both remotes
+		git remote add remote1 "../remote1" &&
+		git remote add remote2 "../remote2" &&
+
+		# Create and push branches to both remotes
+		git branch -f side2 HEAD &&
+		git push remote1 side2 &&
+		git push remote2 side2 &&
+
+		# Rename branches in each remote to simulate a stale branch
+		git -C ../remote1 branch -m side2 side3 &&
+		git -C ../remote2 branch -m side2 side4 &&
+
+		GIT_TRACE2_EVENT="$(pwd)/prune.txt" git maintenance run --task=prune-remote-refs &&
+
+		# Verify pruning happened for both remotes
+		test_subcommand git remote prune remote1 <prune.txt &&
+		test_subcommand git remote prune remote2 <prune.txt &&
+		test_must_fail git rev-parse refs/remotes/remote1/side2 &&
+		test_must_fail git rev-parse refs/remotes/remote2/side2
+	)
+'
+
 test_expect_success '--auto and --schedule incompatible' '
 	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
 	test_grep "at most one" err