diff mbox series

[01/11] maintenance: create basic maintenance runner

Message ID 2b9deb6d6a23e53bec75e109f2e3ef9217420425.1596728921.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Maintenance I: Command, gc and commit-graph tasks | expand

Commit Message

Jean-Noël Avila via GitGitGadget Aug. 6, 2020, 3:48 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                        |  1 +
 Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
 builtin.h                         |  1 +
 builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
 git.c                             |  1 +
 t/t7900-maintenance.sh            | 19 +++++++++++
 t/test-lib-functions.sh           | 33 ++++++++++++++++++
 7 files changed, 169 insertions(+)
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh

Comments

Martin Ågren Aug. 7, 2020, 10:16 p.m. UTC | #1
On Thu, 6 Aug 2020 at 19:57, Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> +DESCRIPTION
> +-----------
> +Run tasks to optimize Git repository data, speeding up other Git commands
> +and reducing storage requirements for the repository.
> ++

This "+" and the one below render literally so you would want to drop
them. (You're not in any kind of "list" here, so no need for a "list
continuation".)

> +Git commands that add repository data, such as `git add` or `git fetch`,
> +are optimized for a responsive user experience. These commands do not take
> +time to optimize the Git data, since such optimizations scale with the full
> +size of the repository while these user commands each perform a relatively
> +small action.
> ++
> +The `git maintenance` command provides flexibility for how to optimize the
> +Git repository.

Martin
Jonathan Nieder Aug. 12, 2020, 9:03 p.m. UTC | #2
Derrick Stolee wrote:

> Helped-by: Jonathan Nieder <jrnieder@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  .gitignore                        |  1 +
>  Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
>  builtin.h                         |  1 +
>  builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
>  git.c                             |  1 +
>  t/t7900-maintenance.sh            | 19 +++++++++++
>  t/test-lib-functions.sh           | 33 ++++++++++++++++++
>  7 files changed, 169 insertions(+)
>  create mode 100644 Documentation/git-maintenance.txt
>  create mode 100755 t/t7900-maintenance.sh

Looks reasonable.

[...]
> --- /dev/null
> +++ b/Documentation/git-maintenance.txt
> @@ -0,0 +1,57 @@
> +git-maintenance(1)
> +==================
> +
> +NAME
> +----
> +git-maintenance - Run tasks to optimize Git repository data
> +
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'git maintenance' run [<options>]
> +
> +
> +DESCRIPTION
> +-----------
> +Run tasks to optimize Git repository data, speeding up other Git commands
> +and reducing storage requirements for the repository.
> ++
> +Git commands that add repository data, such as `git add` or `git fetch`,
> +are optimized for a responsive user experience. These commands do not take
> +time to optimize the Git data, since such optimizations scale with the full
> +size of the repository while these user commands each perform a relatively
> +small action.
> ++
> +The `git maintenance` command provides flexibility for how to optimize the
> +Git repository.
> +
> +SUBCOMMANDS
> +-----------
> +
> +run::
> +	Run one or more maintenance tasks.

This is still confusing --- shouldn't it say something like "Run the
`gc` maintenance task (see below)"?

[...]
> +TASKS
> +-----
> +
> +gc::
> +	Cleanup unnecessary files and optimize the local repository. "GC"

nit: cleanup is the noun, "clean up" is the verb

> +	stands for "garbage collection," but this task performs many
> +	smaller tasks. This task can be rather expensive for large

nit: the word "rather" is not doing much work here, so we could leave
it out

> +	repositories, as it repacks all Git objects into a single pack-file.
> +	It can also be disruptive in some situations, as it deletes stale
> +	data.

What stale data is this referring to?  Where can I read more about
what disruption it causes (or in other words, as a user, how would I
decide when *not* to run this command)?

[...]
> +OPTIONS
> +-------
> +--auto::
> +	When combined with the `run` subcommand, run maintenance tasks
> +	only if certain thresholds are met. For example, the `gc` task
> +	runs when the number of loose objects exceeds the number stored
> +	in the `gc.auto` config setting, or when the number of pack-files
> +	exceeds the `gc.autoPackLimit` config setting.

Hm, today I learned.  I had thought that --auto doesn't only affect
thresholds but also would affect how aggressive the gc is when
triggered, but the git-gc(1) manpage agrees with what's said above.

Is there a way we can share information between the two to avoid one
falling out of date?  For example, should git-gc.txt point to this
page for the authoritative description?

[...]
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>  
>  	return 0;
>  }
> +
> +static const char * const builtin_maintenance_usage[] = {
> +	N_("git maintenance run [<options>]"),

not about this patch: I wish we could automatically generate a usage
string in this simple kind of case, to decrease the burden on
translators.

[...]
> +static int maintenance_task_gc(struct maintenance_opts *opts)
> +{
> +	struct child_process child = CHILD_PROCESS_INIT;
> +
> +	child.git_cmd = 1;
> +	strvec_push(&child.args, "gc");
> +
> +	if (opts->auto_flag)
> +		strvec_push(&child.args, "--auto");
> +
> +	close_object_store(the_repository->objects);

Interesting --- what does this function call do?

> +	return run_command(&child);
> +}
> +
> +static int maintenance_run(struct maintenance_opts *opts)
> +{
> +	return maintenance_task_gc(opts);
> +}
> +
> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
> +{
> +	static struct maintenance_opts opts;
> +	static struct option builtin_maintenance_options[] = {
> +		OPT_BOOL(0, "auto", &opts.auto_flag,
> +			 N_("run tasks based on the state of the repository")),
> +		OPT_END()
> +	};

optional: these could be local instead of static

> +
> +	memset(&opts, 0, sizeof(opts));

not needed if it's static.  If it's not static, could use `opts = {0}`
where it's declared to do the same thing in a simpler way.

> +
> +	if (argc == 2 && !strcmp(argv[1], "-h"))
> +		usage_with_options(builtin_maintenance_usage,
> +				   builtin_maintenance_options);
> +
> +	argc = parse_options(argc, argv, prefix,
> +			     builtin_maintenance_options,
> +			     builtin_maintenance_usage,
> +			     PARSE_OPT_KEEP_UNKNOWN);
> +
> +	if (argc == 1) {
> +		if (!strcmp(argv[0], "run"))
> +			return maintenance_run(&opts);
> +	}
> +
> +	usage_with_options(builtin_maintenance_usage,
> +			   builtin_maintenance_options);

optional: could reverse this test to get the uninteresting case out of
the way first:

	if (argc != 1)
		usage ...

	...

That would also allow making the usage string when argv[0] is not
"run" more specific.

[...]
> --- /dev/null
> +++ b/t/t7900-maintenance.sh
> @@ -0,0 +1,19 @@
> +#!/bin/sh
> +
> +test_description='git maintenance builtin'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'help text' '
> +	test_expect_code 129 git maintenance -h 2>err &&
> +	test_i18ngrep "usage: git maintenance run" err
> +'
> +
> +test_expect_success 'run [--auto]' '
> +	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
> +	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
> +	test_subcommand git gc <run-no-auto.txt &&
> +	test_subcommand git gc --auto <run-auto.txt

Nice.

[...]
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1561,3 +1561,36 @@ test_path_is_hidden () {
>  	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
>  	return 1
>  }
> +
> +# Check that the given command was invoked as part of the
> +# trace2-format trace on stdin.
> +#
> +#	test_subcommand [!] <command> <args>... < <trace>
> +#
> +# For example, to look for an invocation of "git upload-pack
> +# /path/to/repo"
> +#
> +#	GIT_TRACE2_EVENT=event.log git fetch ... &&
> +#	test_subcommand git upload-pack "$PATH" <event.log
> +#
> +# If the first parameter passed is !, this instead checks that
> +# the given command was not called.
> +#
> +test_subcommand () {
> +	local negate=
> +	if test "$1" = "!"
> +	then
> +		negate=t
> +		shift
> +	fi
> +
> +	local expr=$(printf '"%s",' "$@")
> +	expr="${expr%,}"
> +
> +	if test -n "$negate"
> +	then
> +		! grep "\[$expr\]"
> +	else
> +		grep "\[$expr\]"
> +	fi
> +}

With whatever subset of the changes described above makes sense,

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks for your patient work.
Junio C Hamano Aug. 12, 2020, 10:07 p.m. UTC | #3
Jonathan Nieder <jrnieder@gmail.com> writes:

>> +static int maintenance_run(struct maintenance_opts *opts)
>> +{
>> +	return maintenance_task_gc(opts);
>> +}
>> +
>> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
>> +{
>> +	static struct maintenance_opts opts;
>> +	static struct option builtin_maintenance_options[] = {
>> +		OPT_BOOL(0, "auto", &opts.auto_flag,
>> +			 N_("run tasks based on the state of the repository")),
>> +		OPT_END()
>> +	};
>
> optional: these could be local instead of static

Do we have preference?  I think it is more common in our codebase to
have these non-static but I do not think we've chosen which is more
preferable with technical reasoning behind it.  As the top-level
function in any callchain, cmd_foo() does not have multiple instances
running at the same time, or it does not have to be prepared to be
called twice, so they can afford to be static, but is there any upside
for them to be static?

>> +
>> +	memset(&opts, 0, sizeof(opts));
>
> not needed if it's static.  If it's not static, could use `opts = {0}`
> where it's declared to do the same thing in a simpler way.

Okay, so that's one upside, albeit a small one.
Jonathan Nieder Aug. 12, 2020, 10:50 p.m. UTC | #4
Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>>> +static int maintenance_run(struct maintenance_opts *opts)
>>> +{
>>> +	return maintenance_task_gc(opts);
>>> +}
>>> +
>>> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
>>> +{
>>> +	static struct maintenance_opts opts;
>>> +	static struct option builtin_maintenance_options[] = {
>>> +		OPT_BOOL(0, "auto", &opts.auto_flag,
>>> +			 N_("run tasks based on the state of the repository")),
>>> +		OPT_END()
>>> +	};
>>
>> optional: these could be local instead of static
>
> Do we have preference?  I think it is more common in our codebase to
> have these non-static but I do not think we've chosen which is more
> preferable with technical reasoning behind it.  As the top-level
> function in any callchain, cmd_foo() does not have multiple instances
> running at the same time, or it does not have to be prepared to be
> called twice, so they can afford to be static, but is there any upside
> for them to be static?

Code size, execution time:

- benefit of static: can rely on automatic zero-initialization of
  pages instead of spending cycles and text size on explicit
  zero-initialization

- benefit of local: avoids wasting address space in bss when the
  function isn't called

Neither of those seems important enough to care about. :)

Maintainability:

- benefit of static: address is determined at compile time, so can build
  using C89 compilers that require struct initializers to be constant
  (but Git already cannot be built with such compilers)

- benefit of local: less likely to accidentally move the static var into
  a function that needs to be reentrant

- benefit of local: allows readers and reviewers to build the habit of
  seeing non-const "static" declarations within a function as the
  unusual case.  When a "static" is declared in a function body, this
  means we are caching something or have some other performance reason
  to sacrifice reentrancy

It's mostly that last aspect (saving readers from having to think
about it) that motivates my suggestion above.

Older Git code used static more because it was needed when building
using C89 compilers.

Thanks,
Jonathan
Derrick Stolee Aug. 14, 2020, 1:05 a.m. UTC | #5
On 8/12/2020 5:03 PM, Jonathan Nieder wrote:
> Derrick Stolee wrote:
> 
>> Helped-by: Jonathan Nieder <jrnieder@gmail.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  .gitignore                        |  1 +
>>  Documentation/git-maintenance.txt | 57 +++++++++++++++++++++++++++++++
>>  builtin.h                         |  1 +
>>  builtin/gc.c                      | 57 +++++++++++++++++++++++++++++++
>>  git.c                             |  1 +
>>  t/t7900-maintenance.sh            | 19 +++++++++++
>>  t/test-lib-functions.sh           | 33 ++++++++++++++++++
>>  7 files changed, 169 insertions(+)
>>  create mode 100644 Documentation/git-maintenance.txt
>>  create mode 100755 t/t7900-maintenance.sh
> 
> Looks reasonable.
> 
> [...]
>> --- /dev/null
>> +++ b/Documentation/git-maintenance.txt
>> @@ -0,0 +1,57 @@
>> +git-maintenance(1)
>> +==================
>> +
>> +NAME
>> +----
>> +git-maintenance - Run tasks to optimize Git repository data
>> +
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'git maintenance' run [<options>]
>> +
>> +
>> +DESCRIPTION
>> +-----------
>> +Run tasks to optimize Git repository data, speeding up other Git commands
>> +and reducing storage requirements for the repository.
>> ++
>> +Git commands that add repository data, such as `git add` or `git fetch`,
>> +are optimized for a responsive user experience. These commands do not take
>> +time to optimize the Git data, since such optimizations scale with the full
>> +size of the repository while these user commands each perform a relatively
>> +small action.
>> ++
>> +The `git maintenance` command provides flexibility for how to optimize the
>> +Git repository.
>> +
>> +SUBCOMMANDS
>> +-----------
>> +
>> +run::
>> +	Run one or more maintenance tasks.
> 
> This is still confusing --- shouldn't it say something like "Run the
> `gc` maintenance task (see below)"?

As I mentioned in the earlier version, I disagree with this.

> [...]
>> +TASKS
>> +-----
>> +
>> +gc::
>> +	Cleanup unnecessary files and optimize the local repository. "GC"
> 
> nit: cleanup is the noun, "clean up" is the verb
> 
>> +	stands for "garbage collection," but this task performs many
>> +	smaller tasks. This task can be rather expensive for large
> 
> nit: the word "rather" is not doing much work here, so we could leave
> it out

Both good.

> 
>> +	repositories, as it repacks all Git objects into a single pack-file.
>> +	It can also be disruptive in some situations, as it deletes stale
>> +	data.
> 
> What stale data is this referring to?  Where can I read more about
> what disruption it causes (or in other words, as a user, how would I
> decide when *not* to run this command)?

For this and the next comment, I prefer (for now) to keep all of the
deep details of the 'gc' task in the git-gc.txt documentation. However,
I should use "linkgit:git-gc[1]" to point users to that for reference.

Eventually, the maintenance builtin might replace the gc builtin, but
only after all of its behavior has been extracted into smaller tasks.
Those tasks would be documented in detail here, allowing us to make the
blurb for the 'gc' task be "Run the 'prune-reflog', 'pack-refs', ...,
and 'full-repack' tasks."

> [...]
>> +OPTIONS
>> +-------
>> +--auto::
>> +	When combined with the `run` subcommand, run maintenance tasks
>> +	only if certain thresholds are met. For example, the `gc` task
>> +	runs when the number of loose objects exceeds the number stored
>> +	in the `gc.auto` config setting, or when the number of pack-files
>> +	exceeds the `gc.autoPackLimit` config setting.
> 
> Hm, today I learned.  I had thought that --auto doesn't only affect
> thresholds but also would affect how aggressive the gc is when
> triggered, but the git-gc(1) manpage agrees with what's said above.
> 
> Is there a way we can share information between the two to avoid one
> falling out of date?  For example, should git-gc.txt point to this
> page for the authoritative description?

I remember Junio mentioning something about how 'gc' interprets '--auto'
as a "quick" mode, but like you I cannot find any reference to that in
the docs. Since it is not documented there and none of the other tasks
I am implementing here use that interpretation, I'll plan to leave this
portion as-is.
 
> [...]
>> --- a/builtin/gc.c
>> +++ b/builtin/gc.c
>> @@ -699,3 +699,60 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
>>  
>>  	return 0;
>>  }
>> +
>> +static const char * const builtin_maintenance_usage[] = {
>> +	N_("git maintenance run [<options>]"),
> 
> not about this patch: I wish we could automatically generate a usage
> string in this simple kind of case, to decrease the burden on
> translators.
> 
> [...]
>> +static int maintenance_task_gc(struct maintenance_opts *opts)
>> +{
>> +	struct child_process child = CHILD_PROCESS_INIT;
>> +
>> +	child.git_cmd = 1;
>> +	strvec_push(&child.args, "gc");
>> +
>> +	if (opts->auto_flag)
>> +		strvec_push(&child.args, "--auto");
>> +
>> +	close_object_store(the_repository->objects);
> 
> Interesting --- what does this function call do?

We need to close our file handles on the pack-files (and
commit-graph and multi-pack-index) or else the GC subcommand
cannot delete the files on Windows.

>> +	return run_command(&child);
>> +}
>> +
>> +static int maintenance_run(struct maintenance_opts *opts)
>> +{
>> +	return maintenance_task_gc(opts);
>> +}
>> +
>> +int cmd_maintenance(int argc, const char **argv, const char *prefix)
>> +{
>> +	static struct maintenance_opts opts;
>> +	static struct option builtin_maintenance_options[] = {
>> +		OPT_BOOL(0, "auto", &opts.auto_flag,
>> +			 N_("run tasks based on the state of the repository")),
>> +		OPT_END()
>> +	};
> 
> optional: these could be local instead of static

I see my problem here. The builtin_maintenance_options is static
(copied from another builtin, I'm sure) which is why the 'opts'
struct needs to be static or else this doesn't compile.

>> +
>> +	memset(&opts, 0, sizeof(opts));
> 
> not needed if it's static.  If it's not static, could use `opts = {0}`
> where it's declared to do the same thing in a simpler way.
> 
>> +
>> +	if (argc == 2 && !strcmp(argv[1], "-h"))
>> +		usage_with_options(builtin_maintenance_usage,
>> +				   builtin_maintenance_options);
>> +
>> +	argc = parse_options(argc, argv, prefix,
>> +			     builtin_maintenance_options,
>> +			     builtin_maintenance_usage,
>> +			     PARSE_OPT_KEEP_UNKNOWN);
>> +
>> +	if (argc == 1) {
>> +		if (!strcmp(argv[0], "run"))
>> +			return maintenance_run(&opts);
>> +	}
>> +
>> +	usage_with_options(builtin_maintenance_usage,
>> +			   builtin_maintenance_options);
> 
> optional: could reverse this test to get the uninteresting case out of
> the way first:
> 
> 	if (argc != 1)
> 		usage ...
> 
> 	...
> 
> That would also allow making the usage string when argv[0] is not
> "run" more specific.

Sure. Were you thinking of anything more specific than

	die(_("invalid subcommand: %s"), argv[0]);

? Of course, running "git maintenance barf" would result in

	fatal: invalid subcommand: barf

with an exit code of 128 instead of 129 like the usage string does,
so I'm not sure of the best way to make a custom "usage" error.

> [...]
>> --- /dev/null
>> +++ b/t/t7900-maintenance.sh
>> @@ -0,0 +1,19 @@
>> +#!/bin/sh
>> +
>> +test_description='git maintenance builtin'
>> +
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'help text' '
>> +	test_expect_code 129 git maintenance -h 2>err &&
>> +	test_i18ngrep "usage: git maintenance run" err
>> +'
>> +
>> +test_expect_success 'run [--auto]' '
>> +	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
>> +	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
>> +	test_subcommand git gc <run-no-auto.txt &&
>> +	test_subcommand git gc --auto <run-auto.txt
> 
> Nice.
> 
> [...]
>> --- a/t/test-lib-functions.sh
>> +++ b/t/test-lib-functions.sh
>> @@ -1561,3 +1561,36 @@ test_path_is_hidden () {
>>  	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
>>  	return 1
>>  }
>> +
>> +# Check that the given command was invoked as part of the
>> +# trace2-format trace on stdin.
>> +#
>> +#	test_subcommand [!] <command> <args>... < <trace>
>> +#
>> +# For example, to look for an invocation of "git upload-pack
>> +# /path/to/repo"
>> +#
>> +#	GIT_TRACE2_EVENT=event.log git fetch ... &&
>> +#	test_subcommand git upload-pack "$PATH" <event.log
>> +#
>> +# If the first parameter passed is !, this instead checks that
>> +# the given command was not called.
>> +#
>> +test_subcommand () {
>> +	local negate=
>> +	if test "$1" = "!"
>> +	then
>> +		negate=t
>> +		shift
>> +	fi
>> +
>> +	local expr=$(printf '"%s",' "$@")
>> +	expr="${expr%,}"
>> +
>> +	if test -n "$negate"
>> +	then
>> +		! grep "\[$expr\]"
>> +	else
>> +		grep "\[$expr\]"
>> +	fi
>> +}
> 
> With whatever subset of the changes described above makes sense,
> 
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
> 
> Thanks for your patient work.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index ee509a2ad2..a5808fa30d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -90,6 +90,7 @@ 
 /git-ls-tree
 /git-mailinfo
 /git-mailsplit
+/git-maintenance
 /git-merge
 /git-merge-base
 /git-merge-index
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
new file mode 100644
index 0000000000..34cd2b4417
--- /dev/null
+++ b/Documentation/git-maintenance.txt
@@ -0,0 +1,57 @@ 
+git-maintenance(1)
+==================
+
+NAME
+----
+git-maintenance - Run tasks to optimize Git repository data
+
+
+SYNOPSIS
+--------
+[verse]
+'git maintenance' run [<options>]
+
+
+DESCRIPTION
+-----------
+Run tasks to optimize Git repository data, speeding up other Git commands
+and reducing storage requirements for the repository.
++
+Git commands that add repository data, such as `git add` or `git fetch`,
+are optimized for a responsive user experience. These commands do not take
+time to optimize the Git data, since such optimizations scale with the full
+size of the repository while these user commands each perform a relatively
+small action.
++
+The `git maintenance` command provides flexibility for how to optimize the
+Git repository.
+
+SUBCOMMANDS
+-----------
+
+run::
+	Run one or more maintenance tasks.
+
+TASKS
+-----
+
+gc::
+	Cleanup unnecessary files and optimize the local repository. "GC"
+	stands for "garbage collection," but this task performs many
+	smaller tasks. This task can be rather expensive for large
+	repositories, as it repacks all Git objects into a single pack-file.
+	It can also be disruptive in some situations, as it deletes stale
+	data.
+
+OPTIONS
+-------
+--auto::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain thresholds are met. For example, the `gc` task
+	runs when the number of loose objects exceeds the number stored
+	in the `gc.auto` config setting, or when the number of pack-files
+	exceeds the `gc.autoPackLimit` config setting.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/builtin.h b/builtin.h
index a5ae15bfe5..17c1c0ce49 100644
--- a/builtin.h
+++ b/builtin.h
@@ -167,6 +167,7 @@  int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 int cmd_ls_remote(int argc, const char **argv, const char *prefix);
 int cmd_mailinfo(int argc, const char **argv, const char *prefix);
 int cmd_mailsplit(int argc, const char **argv, const char *prefix);
+int cmd_maintenance(int argc, const char **argv, const char *prefix);
 int cmd_merge(int argc, const char **argv, const char *prefix);
 int cmd_merge_base(int argc, const char **argv, const char *prefix);
 int cmd_merge_index(int argc, const char **argv, const char *prefix);
diff --git a/builtin/gc.c b/builtin/gc.c
index aafa0946f5..e4f0ce1c86 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -699,3 +699,60 @@  int cmd_gc(int argc, const char **argv, const char *prefix)
 
 	return 0;
 }
+
+static const char * const builtin_maintenance_usage[] = {
+	N_("git maintenance run [<options>]"),
+	NULL
+};
+
+struct maintenance_opts {
+	int auto_flag;
+};
+
+static int maintenance_task_gc(struct maintenance_opts *opts)
+{
+	struct child_process child = CHILD_PROCESS_INIT;
+
+	child.git_cmd = 1;
+	strvec_push(&child.args, "gc");
+
+	if (opts->auto_flag)
+		strvec_push(&child.args, "--auto");
+
+	close_object_store(the_repository->objects);
+	return run_command(&child);
+}
+
+static int maintenance_run(struct maintenance_opts *opts)
+{
+	return maintenance_task_gc(opts);
+}
+
+int cmd_maintenance(int argc, const char **argv, const char *prefix)
+{
+	static struct maintenance_opts opts;
+	static struct option builtin_maintenance_options[] = {
+		OPT_BOOL(0, "auto", &opts.auto_flag,
+			 N_("run tasks based on the state of the repository")),
+		OPT_END()
+	};
+
+	memset(&opts, 0, sizeof(opts));
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_maintenance_usage,
+				   builtin_maintenance_options);
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_maintenance_options,
+			     builtin_maintenance_usage,
+			     PARSE_OPT_KEEP_UNKNOWN);
+
+	if (argc == 1) {
+		if (!strcmp(argv[0], "run"))
+			return maintenance_run(&opts);
+	}
+
+	usage_with_options(builtin_maintenance_usage,
+			   builtin_maintenance_options);
+}
diff --git a/git.c b/git.c
index 8bd1d7551d..24f250d29a 100644
--- a/git.c
+++ b/git.c
@@ -529,6 +529,7 @@  static struct cmd_struct commands[] = {
 	{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 	{ "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "mailsplit", cmd_mailsplit, NO_PARSEOPT },
+	{ "maintenance", cmd_maintenance, RUN_SETUP_GENTLY | NO_PARSEOPT },
 	{ "merge", cmd_merge, RUN_SETUP | NEED_WORK_TREE },
 	{ "merge-base", cmd_merge_base, RUN_SETUP },
 	{ "merge-file", cmd_merge_file, RUN_SETUP_GENTLY },
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
new file mode 100755
index 0000000000..c4b9b4a6fe
--- /dev/null
+++ b/t/t7900-maintenance.sh
@@ -0,0 +1,19 @@ 
+#!/bin/sh
+
+test_description='git maintenance builtin'
+
+. ./test-lib.sh
+
+test_expect_success 'help text' '
+	test_expect_code 129 git maintenance -h 2>err &&
+	test_i18ngrep "usage: git maintenance run" err
+'
+
+test_expect_success 'run [--auto]' '
+	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run &&
+	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto &&
+	test_subcommand git gc <run-no-auto.txt &&
+	test_subcommand git gc --auto <run-auto.txt
+'
+
+test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 3103be8a32..0adf2b85f8 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1561,3 +1561,36 @@  test_path_is_hidden () {
 	case "$("$SYSTEMROOT"/system32/attrib "$1")" in *H*?:*) return 0;; esac
 	return 1
 }
+
+# Check that the given command was invoked as part of the
+# trace2-format trace on stdin.
+#
+#	test_subcommand [!] <command> <args>... < <trace>
+#
+# For example, to look for an invocation of "git upload-pack
+# /path/to/repo"
+#
+#	GIT_TRACE2_EVENT=event.log git fetch ... &&
+#	test_subcommand git upload-pack "$PATH" <event.log
+#
+# If the first parameter passed is !, this instead checks that
+# the given command was not called.
+#
+test_subcommand () {
+	local negate=
+	if test "$1" = "!"
+	then
+		negate=t
+		shift
+	fi
+
+	local expr=$(printf '"%s",' "$@")
+	expr="${expr%,}"
+
+	if test -n "$negate"
+	then
+		! grep "\[$expr\]"
+	else
+		grep "\[$expr\]"
+	fi
+}