mbox series

[RFC,0/1] maintenance: separate parallelism safe and unsafe tasks

Message ID 20241108173112.1240584-1-calvinwan@google.com (mailing list archive)
Headers show
Series maintenance: separate parallelism safe and unsafe tasks | expand

Message

Calvin Wan Nov. 8, 2024, 5:31 p.m. UTC
Unless a user changes the config options `gc.autoDetach` or
`maintenance.autoDetach`, Git by default runs maintenance and gc tasks
in the background. This is because maintenance and gc tasks, especially
in large repositories, can take a long time to run. Therefore, they
_should_ be as nonintrusive as possible to other Git commands,
especially porcelain ones.

However, this is not the case as discovered earlier[1] -- certain
maintenance and gc tasks are not safe to run in parallel with other
commands. The consequences of such are that scripts with commands that
trigger maintenance/gc can race and crash. Users can also run into
unexpected errors from porcelain commands that touch common files such
as HEAD.lock, unaware that a background maintenance/gc task is the one
holding the lock.

As Patrick points out[2], the two unsafe commands are `git reflog expire
--all`, invoked by gc, and `git pack-refs --all --prune`, invoked by
maintenance. We can create two buckets for subtasks -- one for async
safe tasks and one for async unsafe tasks. When `[maintenance,
gc].autoDetach` is not set or set to true, maintenance will run the
unsafe tasks first before detaching to run the safe tasks.

This series is in RFC to see if the general direction of the patch is
going in the right direction. I left a couple of WIPs in the first patch
documenting what still needs to be done if the direction is palatable.

[1] https://lore.kernel.org/git/CAFySSZBCKUiY5DO3fz340a0dTb0zUDNKxaTYU0LAqsBD2RMwSg@mail.gmail.com/
[2] https://lore.kernel.org/git/ZxeilMDwq0Z3krhz@pks.im

Calvin Wan (1):
  maintenance: separate parallelism safe and unsafe tasks

 builtin/gc.c           | 173 ++++++++++++++++++++++++++++++++++++-----
 t/t7900-maintenance.sh |  24 +++---
 2 files changed, 168 insertions(+), 29 deletions(-)

Comments

Junio C Hamano Nov. 11, 2024, 4:50 a.m. UTC | #1
Calvin Wan <calvinwan@google.com> writes:

> However, this is not the case as discovered earlier[1] -- certain
> maintenance and gc tasks are not safe to run in parallel with other
> commands. The consequences of such are that scripts with commands that
> trigger maintenance/gc can race and crash.
>
> Users can also run into
> unexpected errors from porcelain commands that touch common files such
> as HEAD.lock, unaware that a background maintenance/gc task is the one
> holding the lock.

The symptom looks more like a controlled refusal of execution than a
crash.

> As Patrick points out[2], the two unsafe commands are `git reflog expire
> --all`, invoked by gc, and `git pack-refs --all --prune`, invoked by
> maintenance. We can create two buckets for subtasks -- one for async
> safe tasks and one for async unsafe tasks.

I am not sure if they can be partitioned into black and white, but
let's see.

> This series is in RFC to see if the general direction of the patch is
> going in the right direction. I left a couple of WIPs in the first patch
> documenting what still needs to be done if the direction is palatable.
Calvin Wan Nov. 11, 2024, 6:39 p.m. UTC | #2
On Sun, Nov 10, 2024 at 8:50 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Calvin Wan <calvinwan@google.com> writes:
>
> > However, this is not the case as discovered earlier[1] -- certain
> > maintenance and gc tasks are not safe to run in parallel with other
> > commands. The consequences of such are that scripts with commands that
> > trigger maintenance/gc can race and crash.
> >
> > Users can also run into
> > unexpected errors from porcelain commands that touch common files such
> > as HEAD.lock, unaware that a background maintenance/gc task is the one
> > holding the lock.
>
> The symptom looks more like a controlled refusal of execution than a
> crash.

Correct, I also do think some responsibility should be on the scripts
to be able to both handle the "controlled refusal of execution" as
well as checking to see if those lock files exist first to avoid such
errors.