diff mbox series

[3/7] maintenance: add --scheduled option and config

Message ID 4473c93b118a0e0cdb205d1758aaaa2d8bf5139a.1597857412.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Maintenance III: background maintenance | expand

Commit Message

Linus Arver via GitGitGadget Aug. 19, 2020, 5:16 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

A user may want to run certain maintenance tasks based on frequency, not
conditions given in the repository. For example, the user may want to
perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
update the 'git maintenance run --scheduled' command to check the config
for the last run of that task and add a number of seconds. The task
would then run only if the current time is beyond that minimum
timestamp.

Add a '--scheduled' option to 'git maintenance run' to only run tasks
that have had enough time pass since their last run. This is done for
each enabled task by checking if the current timestamp is at least as
large as the sum of 'maintenance.<task>.lastRun' and
'maintenance.<task>.schedule' in the Git config. This second value is
new to this commit, storing a number of seconds intended between runs.

A user could then set up an hourly maintenance run with the following
cron table:

  0 * * * * git -C <repo> maintenance run --scheduled

Then, the user could configure the repository with the following config
values:

  maintenance.prefetch.schedule  3000
  maintenance.gc.schedule       86000

These numbers are slightly lower than one hour and one day (in seconds).
The cron schedule will enforce the hourly run rate, but we can use these
schedules to ensure the 'gc' task runs once a day. The error is given
because the *.lastRun config option is specified at the _start_ of the
task run. Otherwise, a slow task run could shift the "daily" job of 'gc'
from a 10:00pm run to 11:00pm run, or later.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/maintenance.txt |  9 +++++
 Documentation/git-maintenance.txt    | 13 +++++++-
 builtin/gc.c                         | 50 +++++++++++++++++++++++++++-
 t/t7900-maintenance.sh               | 20 +++++++++++
 4 files changed, 90 insertions(+), 2 deletions(-)

Comments

Đoàn Trần Công Danh Aug. 20, 2020, 2:51 p.m. UTC | #1
On 2020-08-19 17:16:44+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> +static void fill_schedule_info(struct maintenance_task *task,
> +			       const char *config_name,
> +			       timestamp_t schedule_delay)
> +{
> +	timestamp_t now = approxidate("now");

I see this pattern in both previous patch and this patch,
should we create a helper (if not exist) to get current timestamp
instead, parsing "now" every now and then is not a good idea, in my
very opinionated opinion.

> +	char *value = NULL;
> +	struct strbuf last_run = STRBUF_INIT;
> +	int64_t previous_run;
> +
> +	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
> +
> +	if (git_config_get_string(last_run.buf, &value))
> +		task->scheduled = 1;
> +	else {
> +		previous_run = git_config_int64(last_run.buf, value);
> +		if (now >= previous_run + schedule_delay)
> +			task->scheduled = 1;
> +	}
> +
> +	free(value);
> +	strbuf_release(&last_run);
> +}
> +
>  static void initialize_task_config(void)
>  {
>  	int i;
> @@ -1359,6 +1387,7 @@ static void initialize_task_config(void)
>  
>  	for (i = 0; i < TASK__COUNT; i++) {
>  		int config_value;
> +		char *config_str;
>  
>  		strbuf_setlen(&config_name, 0);
>  		strbuf_addf(&config_name, "maintenance.%s.enabled",
> @@ -1366,6 +1395,20 @@ static void initialize_task_config(void)
>  
>  		if (!git_config_get_bool(config_name.buf, &config_value))
>  			tasks[i].enabled = config_value;
> +
> +		strbuf_setlen(&config_name, 0);

It looks like we have a simple and better named alias for this:

	strbuf_reset(&config_name)

_reset has 400+ occurences in this code base, compare to 20 of _setlen
Derrick Stolee Aug. 24, 2020, 2:03 p.m. UTC | #2
On 8/20/2020 10:51 AM, Đoàn Trần Công Danh wrote:
> On 2020-08-19 17:16:44+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> +static void fill_schedule_info(struct maintenance_task *task,
>> +			       const char *config_name,
>> +			       timestamp_t schedule_delay)
>> +{
>> +	timestamp_t now = approxidate("now");
> 
> I see this pattern in both previous patch and this patch,
> should we create a helper (if not exist) to get current timestamp
> instead, parsing "now" every now and then is not a good idea, in my
> very opinionated opinion.

Parsing "now" is not that much work, and it is done only once per
maintenance task. To make a helper that avoids these string comparisons
(specifically to avoid iterating through the "special" array in date.c)
is unlikely to be worth the effort and code duplication.

If you mean it would be good to use a macro here, then that would be
easy:

	#define approxidate_now() approxidate("now")

One important thing for using this over time(NULL) is that we really
want this to work with GIT_TEST_DATE_NOW.

>> +	char *value = NULL;
>> +	struct strbuf last_run = STRBUF_INIT;
>> +	int64_t previous_run;
>> +
>> +	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
>> +
>> +	if (git_config_get_string(last_run.buf, &value))
>> +		task->scheduled = 1;
>> +	else {
>> +		previous_run = git_config_int64(last_run.buf, value);
>> +		if (now >= previous_run + schedule_delay)
>> +			task->scheduled = 1;
>> +	}
>> +
>> +	free(value);
>> +	strbuf_release(&last_run);
>> +}
>> +
>>  static void initialize_task_config(void)
>>  {
>>  	int i;
>> @@ -1359,6 +1387,7 @@ static void initialize_task_config(void)
>>  
>>  	for (i = 0; i < TASK__COUNT; i++) {
>>  		int config_value;
>> +		char *config_str;
>>  
>>  		strbuf_setlen(&config_name, 0);
>>  		strbuf_addf(&config_name, "maintenance.%s.enabled",
>> @@ -1366,6 +1395,20 @@ static void initialize_task_config(void)
>>  
>>  		if (!git_config_get_bool(config_name.buf, &config_value))
>>  			tasks[i].enabled = config_value;
>> +
>> +		strbuf_setlen(&config_name, 0);
> 
> It looks like we have a simple and better named alias for this:
> 
> 	strbuf_reset(&config_name)
> 
> _reset has 400+ occurences in this code base, compare to 20 of _setlen

Makes sense. Thanks.

-Stolee
diff mbox series

Patch

diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 8dd34169da..caacacd322 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -15,6 +15,15 @@  maintenance.<task>.lastRun::
 	`<task>` is run. It stores a timestamp representing the most-recent
 	run of the `<task>`.
 
+maintenance.<task>.schedule::
+	This config option controls whether or not the given `<task>` runs
+	during a `git maintenance run --scheduled` command. If the option
+	is an integer value `S`, then the `<task>` is run when the current
+	time is `S` seconds after the timestamp stored in
+	`maintenance.<task>.lastRun`. If the option has no value or a
+	non-integer value, then the task will never run with the `--scheduled`
+	option.
+
 maintenance.commit-graph.auto::
 	This integer config option controls how often the `commit-graph` task
 	should be run as part of `git maintenance run --auto`. If zero, then
diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 95a333f000..e8004e7b11 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -110,7 +110,18 @@  OPTIONS
 	only if certain thresholds are met. For example, the `gc` task
 	runs when the number of loose objects exceeds the number stored
 	in the `gc.auto` config setting, or when the number of pack-files
-	exceeds the `gc.autoPackLimit` config setting.
+	exceeds the `gc.autoPackLimit` config setting. Not compatible with
+	the `--scheduled` option.
+
+--scheduled::
+	When combined with the `run` subcommand, run maintenance tasks
+	only if certain time conditions are met, as specified by the
+	`maintenance.<task>.schedule` config value for each `<task>`.
+	This config value specifies a number of seconds since the last
+	time that task ran, according to the `maintenance.<task>.lastRun`
+	config value. The tasks that are tested are those provided by
+	the `--task=<task>` option(s) or those with
+	`maintenance.<task>.enabled` set to true.
 
 --quiet::
 	Do not report progress or other information over `stderr`.
diff --git a/builtin/gc.c b/builtin/gc.c
index 707c144fb9..352948529d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -711,6 +711,7 @@  static const char * const builtin_maintenance_usage[] = {
 
 struct maintenance_opts {
 	int auto_flag;
+	int scheduled;
 	int quiet;
 };
 
@@ -1226,7 +1227,8 @@  struct maintenance_task {
 	const char *name;
 	maintenance_task_fn *fn;
 	maintenance_auto_fn *auto_condition;
-	unsigned enabled:1;
+	unsigned enabled:1,
+		 scheduled:1;
 
 	/* -1 if not selected. */
 	int selected_order;
@@ -1337,6 +1339,9 @@  static int maintenance_run(struct maintenance_opts *opts)
 		     !tasks[i].auto_condition()))
 			continue;
 
+		if (opts->scheduled && !tasks[i].scheduled)
+			continue;
+
 		update_last_run(&tasks[i]);
 
 		trace2_region_enter("maintenance", tasks[i].name, r);
@@ -1351,6 +1356,29 @@  static int maintenance_run(struct maintenance_opts *opts)
 	return result;
 }
 
+static void fill_schedule_info(struct maintenance_task *task,
+			       const char *config_name,
+			       timestamp_t schedule_delay)
+{
+	timestamp_t now = approxidate("now");
+	char *value = NULL;
+	struct strbuf last_run = STRBUF_INIT;
+	int64_t previous_run;
+
+	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
+
+	if (git_config_get_string(last_run.buf, &value))
+		task->scheduled = 1;
+	else {
+		previous_run = git_config_int64(last_run.buf, value);
+		if (now >= previous_run + schedule_delay)
+			task->scheduled = 1;
+	}
+
+	free(value);
+	strbuf_release(&last_run);
+}
+
 static void initialize_task_config(void)
 {
 	int i;
@@ -1359,6 +1387,7 @@  static void initialize_task_config(void)
 
 	for (i = 0; i < TASK__COUNT; i++) {
 		int config_value;
+		char *config_str;
 
 		strbuf_setlen(&config_name, 0);
 		strbuf_addf(&config_name, "maintenance.%s.enabled",
@@ -1366,6 +1395,20 @@  static void initialize_task_config(void)
 
 		if (!git_config_get_bool(config_name.buf, &config_value))
 			tasks[i].enabled = config_value;
+
+		strbuf_setlen(&config_name, 0);
+		strbuf_addf(&config_name, "maintenance.%s.schedule",
+			    tasks[i].name);
+
+		if (!git_config_get_string(config_name.buf, &config_str)) {
+			timestamp_t schedule_delay = git_config_int64(
+							config_name.buf,
+							config_str);
+			fill_schedule_info(&tasks[i],
+						config_name.buf,
+						schedule_delay);
+			free(config_str);
+		}
 	}
 
 	strbuf_release(&config_name);
@@ -1409,6 +1452,8 @@  int cmd_maintenance(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "scheduled", &opts.scheduled,
+			 N_("run tasks based on time intervals")),
 		OPT_BOOL(0, "quiet", &opts.quiet,
 			 N_("do not report progress or other information over stderr")),
 		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
@@ -1434,6 +1479,9 @@  int cmd_maintenance(int argc, const char **argv, const char *prefix)
 			     builtin_maintenance_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
 
+	if (opts.auto_flag + opts.scheduled > 1)
+		die(_("use at most one of the --auto and --scheduled options"));
+
 	if (argc != 1)
 		usage_with_options(builtin_maintenance_usage,
 				   builtin_maintenance_options);
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index a985ce3674..3e0c5f1ca8 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -264,6 +264,11 @@  test_expect_success 'maintenance.incremental-repack.auto' '
 	done
 '
 
+test_expect_success '--auto and --scheduled incompatible' '
+	test_must_fail git maintenance run --auto --scheduled 2>err &&
+	test_i18ngrep "at most one" err
+'
+
 test_expect_success 'tasks update maintenance.<task>.lastRun' '
 	git config --unset maintenance.commit-graph.lastrun &&
 	GIT_TRACE2_EVENT="$(pwd)/run.txt" \
@@ -274,4 +279,19 @@  test_expect_success 'tasks update maintenance.<task>.lastRun' '
 	test_cmp_config 1595000000 maintenance.commit-graph.lastrun
 '
 
+test_expect_success '--scheduled with specific time' '
+	git config maintenance.commit-graph.schedule 100 &&
+	GIT_TRACE2_EVENT="$(pwd)/too-soon.txt" \
+		GIT_TEST_DATE_NOW=1595000099 \
+		git maintenance run --scheduled 2>/dev/null &&
+	test_subcommand ! git commit-graph write --split --reachable \
+		--no-progress <too-soon.txt &&
+	GIT_TRACE2_EVENT="$(pwd)/long-enough.txt" \
+		GIT_TEST_DATE_NOW=1595000100 \
+		git maintenance run --scheduled 2>/dev/null &&
+	test_subcommand git commit-graph write --split --reachable \
+		--no-progress <long-enough.txt &&
+	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
+'
+
 test_done