diff mbox series

[v5,07/11] maintenance: take a lock on the objects directory

Message ID 1a0a3eebb825ac3eabfdd86f82ed7ef6abb454c5.1600366313.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit d7514f6ed57d20bcc9dcfb43016b95dba82ba790
Headers show
Series Maintenance I: Command, gc and commit-graph tasks | expand

Commit Message

Linus Arver via GitGitGadget Sept. 17, 2020, 6:11 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/gc.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Ævar Arnfjörð Bjarmason Sept. 21, 2020, 1:36 p.m. UTC | #1
On Thu, Sep 17 2020, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Performing maintenance on a Git repository involves writing data to the
> .git directory, which is not safe to do with multiple writers attempting
> the same operation. Ensure that only one 'git maintenance' process is
> running at a time by holding a file-based lock. Simply the presence of
> the .git/maintenance.lock file will prevent future maintenance. This
> lock is never committed, since it does not represent meaningful data.
> Instead, it is only a placeholder.
>
> If the lock file already exists, then no maintenance tasks are
> attempted. This will become very important later when we implement the
> 'prefetch' task, as this is our stop-gap from creating a recursive process
> loop between 'git fetch' and 'git maintenance run --auto'.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/gc.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 00fff59bdb..7ba9c6f7c9 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -798,6 +798,25 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts)
>  {
>  	int i, found_selected = 0;
>  	int result = 0;
> +	struct lock_file lk;
> +	struct repository *r = the_repository;
> +	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
> +
> +	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
> +		/*
> +		 * Another maintenance command is running.
> +		 *
> +		 * If --auto was provided, then it is likely due to a
> +		 * recursive process stack. Do not report an error in
> +		 * that case.
> +		 */
> +		if (!opts->auto_flag && !opts->quiet)
> +			warning(_("lock file '%s' exists, skipping maintenance"),
> +				lock_path);
> +		free(lock_path);
> +		return 0;
> +	}
> +	free(lock_path);
>  
>  	for (i = 0; !found_selected && i < TASK__COUNT; i++)
>  		found_selected = tasks[i].selected_order >= 0;

There's now two different lock strategies in builtin/gc.c, the existing
one introduced in 64a99eb476 ("gc: reject if another gc is running,
unless --force is given", 2013-08-08) where we write the hostname to the
gc.pid file, and then discard the lockfile depending on a heuristic of
whether or not it's the same etc., and this one.

With this as an entry point we'll entirely do away with the old one
since we don't use the "gc --auto" entry point.

All of that may or may not be desirable, but I think a description in
the docs & tests for how these lock files should interact would be
helpful. E.g. writing a different hostname in the gc lockfile and
setting the time on it with with "test-tool chmtime" will cause it to
plow ahead, but doing the same for "git maintenance" will stop it in its
tracks no matter the time or content.
Derrick Stolee Sept. 21, 2020, 1:43 p.m. UTC | #2
On 9/21/2020 9:36 AM, Ævar Arnfjörð Bjarmason wrote:
> There's now two different lock strategies in builtin/gc.c, the existing
> one introduced in 64a99eb476 ("gc: reject if another gc is running,
> unless --force is given", 2013-08-08) where we write the hostname to the
> gc.pid file, and then discard the lockfile depending on a heuristic of
> whether or not it's the same etc., and this one.
> 
> With this as an entry point we'll entirely do away with the old one
> since we don't use the "gc --auto" entry point.

Users could still erroneously launch two `git gc` commands concurrently
and that will not use the maintenance.lock file. The GC lock is worth
keeping around until we redirect the `gc` command to be an alias that
runs `git maintenance run --task=gc [options]`.

> All of that may or may not be desirable, but I think a description in
> the docs & tests for how these lock files should interact would be
> helpful. E.g. writing a different hostname in the gc lockfile and
> setting the time on it with with "test-tool chmtime" will cause it to
> plow ahead, but doing the same for "git maintenance" will stop it in its
> tracks no matter the time or content.

Thanks for the pointers to create this test. I'll try to get to it
soon. For now, here is a tracking issue [1].

[1] https://github.com/gitgitgadget/git/issues/737

Thanks,
-Stolee
Junio C Hamano Sept. 21, 2020, 7:29 p.m. UTC | #3
Derrick Stolee <stolee@gmail.com> writes:

>> With this as an entry point we'll entirely do away with the old one
>> since we don't use the "gc --auto" entry point.
>
> Users could still erroneously launch two `git gc` commands concurrently
> and that will not use the maintenance.lock file. The GC lock is worth
> keeping around until we redirect the `gc` command to be an alias that
> runs `git maintenance run --task=gc [options]`.

Yes, that is prudent.  And as long as the traditional gc lock stops
the maintenance command from running the gc task, everthing would
work consistently ;-)

Thanks.
diff mbox series

Patch

diff --git a/builtin/gc.c b/builtin/gc.c
index 00fff59bdb..7ba9c6f7c9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -798,6 +798,25 @@  static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 {
 	int i, found_selected = 0;
 	int result = 0;
+	struct lock_file lk;
+	struct repository *r = the_repository;
+	char *lock_path = xstrfmt("%s/maintenance", r->objects->odb->path);
+
+	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
+		/*
+		 * Another maintenance command is running.
+		 *
+		 * If --auto was provided, then it is likely due to a
+		 * recursive process stack. Do not report an error in
+		 * that case.
+		 */
+		if (!opts->auto_flag && !opts->quiet)
+			warning(_("lock file '%s' exists, skipping maintenance"),
+				lock_path);
+		free(lock_path);
+		return 0;
+	}
+	free(lock_path);
 
 	for (i = 0; !found_selected && i < TASK__COUNT; i++)
 		found_selected = tasks[i].selected_order >= 0;
@@ -818,6 +837,7 @@  static int maintenance_run_tasks(struct maintenance_run_opts *opts)
 		}
 	}
 
+	rollback_lock_file(&lk);
 	return result;
 }