diff mbox series

[v3,7/7] run-command: fix detaching when running auto maintenance

Message ID 9befef7c1f7520d58af2b2db17174b8dbc493d56.1723804990.git.ps@pks.im (mailing list archive)
State Accepted
Commit 98077d06b28b97d508c389886ee5014056707a5e
Headers show
Series builtin/maintenance: fix auto-detach with non-standard tasks | expand

Commit Message

Patrick Steinhardt Aug. 16, 2024, 10:45 a.m. UTC
In the past, we used to execute `git gc --auto` as part of our automatic
housekeeping routines. As git-gc(1) may require quite some time to
perform the housekeeping, it knows to detach itself and run in the
background so that the user can continue their work.

Eventually, we refactored our automatic housekeeping to instead use the
more flexible git-maintenance(1) command. The upside of this new infra
is that the user can configure which maintenance tasks are performed, at
least to a certain degree. So while it continues to run git-gc(1) by
default, it can also be adapted to e.g. use git-multi-pack-index(1) for
maintenance of the object database.

The auto-detach of the new infra is somewhat broken though once the user
configures non-standard tasks. The problem is essentially that we detach
at the wrong level in the process hierarchy: git-maintenance(1) never
detaches itself, but instead it continues to be git-gc(1) which does.

When configured to only run the git-gc(1) maintenance task, then the
result is basically the same as before. But when configured to run other
tasks, then git-maintenance(1) will wait for these to run to completion.
Even worse, it may be that git-gc(1) runs concurrently with other
housekeeping tasks, stomping on each others feet.

Fix this bug by asking git-gc(1) to not detach when it is being invoked
via git-maintenance(1). Instead, git-maintenance(1) now respects a new
config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and
detaches itself into the background when running as part of our auto
maintenance. This should continue to behave the same for all users which
use the git-gc(1) task, only. For others though, it means that we now
properly perform all tasks in the background. The default behaviour of
git-maintenance(1) when executed by the user does not change, it will
remain in the foreground unless they pass the `--detach` option.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/gc.txt          |  3 +-
 Documentation/config/maintenance.txt | 11 +++++++
 builtin/gc.c                         |  1 +
 run-command.c                        | 12 +++++++-
 t/t5616-partial-clone.sh             |  6 ++--
 t/t7900-maintenance.sh               | 43 ++++++++++++++++++++++------
 6 files changed, 62 insertions(+), 14 deletions(-)

Comments

Jeff King Aug. 17, 2024, 12:14 p.m. UTC | #1
On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote:

> Fix this bug by asking git-gc(1) to not detach when it is being invoked
> via git-maintenance(1). Instead, git-maintenance(1) now respects a new
> config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and
> detaches itself into the background when running as part of our auto
> maintenance. This should continue to behave the same for all users which
> use the git-gc(1) task, only. For others though, it means that we now
> properly perform all tasks in the background. The default behaviour of
> git-maintenance(1) when executed by the user does not change, it will
> remain in the foreground unless they pass the `--detach` option.

This patch seems to cause segfaults in t5616 when combined with the
reftable backend. Try this:

  GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress

which fails for me within a few runs. Bisecting leads to 98077d06b2
(run-command: fix detaching when running auto maintenance, 2024-08-16).
It doesn't trigger with the files ref backend.

Compiling with ASan gets me a stack trace like this:

  + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
  AddressSanitizer:DEADLYSIGNAL
  =================================================================
  ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
  ==657994==The signal is caused by a READ memory access.
      #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
      #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
      #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
      #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
      #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
      #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
      #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
      #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
      #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
      #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
      #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
      #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
      #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
      #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
      #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
      #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
      #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
      #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
      #18 0x55f23dba7764 in run_builtin git.c:484
      #19 0x55f23dba7764 in handle_builtin git.c:741
      #20 0x55f23dbab61e in run_argv git.c:805
      #21 0x55f23dbab61e in cmd_main git.c:1000
      #22 0x55f23dba4781 in main common-main.c:64
      #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
      #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
      #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

My guess based on what I'm seeing and what the patch does is that now
maintenance from a previous command is running in the background while
our foreground git-fetch runs, and it somehow confuses things (perhaps
by trying to compact reftables or something?). So I think there are two
problems:

  1. The reftable code needs to be more robust against whatever race is
     happening. I didn't dig further, but I'm sure it would be possible
     to produce a coredump.

  2. Having racy background maintenance doesn't seem great for test
     robustness. At the very least, it might subject us to the "rm"
     problems mentioned elsewhere, where we fail to clean up. Annotating
     individual "git gc" or "git maintenance" calls with an extra
     descriptor isn't too bad, but in this case it's all happening under
     the hood via fetch. Is it a potential problem for every script,
     then? If so, should we disable background detaching for all test
     repos, and then let the few that want to test it turn it back on?

-Peff
Patrick Steinhardt Aug. 19, 2024, 6:17 a.m. UTC | #2
On Sat, Aug 17, 2024 at 08:14:24AM -0400, Jeff King wrote:
> On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote:
> 
> > Fix this bug by asking git-gc(1) to not detach when it is being invoked
> > via git-maintenance(1). Instead, git-maintenance(1) now respects a new
> > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and
> > detaches itself into the background when running as part of our auto
> > maintenance. This should continue to behave the same for all users which
> > use the git-gc(1) task, only. For others though, it means that we now
> > properly perform all tasks in the background. The default behaviour of
> > git-maintenance(1) when executed by the user does not change, it will
> > remain in the foreground unless they pass the `--detach` option.
> 
> This patch seems to cause segfaults in t5616 when combined with the
> reftable backend. Try this:
> 
>   GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress
> 
> which fails for me within a few runs. Bisecting leads to 98077d06b2
> (run-command: fix detaching when running auto maintenance, 2024-08-16).
> It doesn't trigger with the files ref backend.
> 
> Compiling with ASan gets me a stack trace like this:
> 
>   + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
>   AddressSanitizer:DEADLYSIGNAL
>   =================================================================
>   ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
>   ==657994==The signal is caused by a READ memory access.
>       #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
>       #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
>       #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
>       #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
>       #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
>       #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
>       #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
>       #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
>       #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
>       #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
>       #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
>       #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
>       #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
>       #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
>       #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
>       #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
>       #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
>       #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
>       #18 0x55f23dba7764 in run_builtin git.c:484
>       #19 0x55f23dba7764 in handle_builtin git.c:741
>       #20 0x55f23dbab61e in run_argv git.c:805
>       #21 0x55f23dbab61e in cmd_main git.c:1000
>       #22 0x55f23dba4781 in main common-main.c:64
>       #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>       #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
>       #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)
> 
> My guess based on what I'm seeing and what the patch does is that now
> maintenance from a previous command is running in the background while
> our foreground git-fetch runs, and it somehow confuses things (perhaps
> by trying to compact reftables or something?). So I think there are two
> problems:
> 
>   1. The reftable code needs to be more robust against whatever race is
>      happening. I didn't dig further, but I'm sure it would be possible
>      to produce a coredump.

Yes, it certainly has to be robust against this. It's also where the
recent reftable compaction fixes [1] came from. In theory, it should
work alright with a concurrent process compacting the stack at the same
time where another process is reading. In practice the backend is still
in its infancy, so I'm not entirely surprised that there are concurrency
bugs.

I will investigate this issue, thanks a lot for the backtrace.

[1]: https://lore.kernel.org/git/cover.1722435214.git.ps@pks.im/

>   2. Having racy background maintenance doesn't seem great for test
>      robustness. At the very least, it might subject us to the "rm"
>      problems mentioned elsewhere, where we fail to clean up. Annotating
>      individual "git gc" or "git maintenance" calls with an extra
>      descriptor isn't too bad, but in this case it's all happening under
>      the hood via fetch. Is it a potential problem for every script,
>      then? If so, should we disable background detaching for all test
>      repos, and then let the few that want to test it turn it back on?

Might be a good idea to set `maintenance.autoDetach=false` globally,
yes. The only downside is of course that it wouldn't cause us to detect
failures like the above, where the concurrency itself causes failure.

Anyway, for now I'll:

  - Send a patch to fix the race in t7900.

  - Investigate the reftable concurrency issue.

  - _Not_ send a patch that sets `maintenance.autoDetach=false`.

The last one requires a bit more discussion first, and we have been
running with `gc.autoDetach=true` implicitly in the past. Thinking a bit
more about it, the reason why the above bug triggers now is that
git-gc(1) itself runs git-pack-refs(1), but does that _synchronously_
before detaching itself. Now we detach at a higher level in the
hierarchy, which means that the previously-synchronous part now runs
asynchronously, as well.

I cannot think of a reason why we shouldn't do this, as the ref backends
should handle this gracefully. The fact that the reftable backend
doesn't is a separate, preexisting bug.

Patrick
Jeff King Aug. 19, 2024, 8:46 a.m. UTC | #3
On Mon, Aug 19, 2024 at 08:17:22AM +0200, Patrick Steinhardt wrote:

> >   2. Having racy background maintenance doesn't seem great for test
> >      robustness. At the very least, it might subject us to the "rm"
> >      problems mentioned elsewhere, where we fail to clean up. Annotating
> >      individual "git gc" or "git maintenance" calls with an extra
> >      descriptor isn't too bad, but in this case it's all happening under
> >      the hood via fetch. Is it a potential problem for every script,
> >      then? If so, should we disable background detaching for all test
> >      repos, and then let the few that want to test it turn it back on?
> 
> Might be a good idea to set `maintenance.autoDetach=false` globally,
> yes. The only downside is of course that it wouldn't cause us to detect
> failures like the above, where the concurrency itself causes failure.
> 
> Anyway, for now I'll:
> 
>   - Send a patch to fix the race in t7900.
> 
>   - Investigate the reftable concurrency issue.
> 
>   - _Not_ send a patch that sets `maintenance.autoDetach=false`.

That sounds like a good direction. I do suspect there are at least _two_
races in t7900:

  1. the detached maintenance that we run explicitly, which causes the
     "rm" cleanup to fail

  2. whatever earlier test is kicking off detached maintenance via "git
     fetch", which is causing the reftable concurrency issue.

Fixing (1) should be easy (and it looks like you've already sent a
series). Fixing the reftable code will stop us from segfaulting for (2),
but I wonder if that detached maintenance might cause similar "rm" style
problems elsewhere.

> The last one requires a bit more discussion first, and we have been
> running with `gc.autoDetach=true` implicitly in the past. Thinking a bit
> more about it, the reason why the above bug triggers now is that
> git-gc(1) itself runs git-pack-refs(1), but does that _synchronously_
> before detaching itself. Now we detach at a higher level in the
> hierarchy, which means that the previously-synchronous part now runs
> asynchronously, as well.

That makes sense. I guess we've perhaps been doing background gc for a
long time, then, just not in the refs? In practice most repos in the
test suite aren't big enough to trigger auto-gc anyway, so it may only
affect a handful of scripts.

Once the reftable issue is fixed, it's possible that the lingering
detached processes don't cause a problem in practice (because they're
not really writing much, and/or have finished by the time anybody else
gets to cleanup), and we can just live with them. But I'm worried that
sounds like wishful thinking. ;)

-Peff
Patrick Steinhardt Aug. 19, 2024, 9:04 a.m. UTC | #4
On Mon, Aug 19, 2024 at 04:46:14AM -0400, Jeff King wrote:
> On Mon, Aug 19, 2024 at 08:17:22AM +0200, Patrick Steinhardt wrote:
> 
> > >   2. Having racy background maintenance doesn't seem great for test
> > >      robustness. At the very least, it might subject us to the "rm"
> > >      problems mentioned elsewhere, where we fail to clean up. Annotating
> > >      individual "git gc" or "git maintenance" calls with an extra
> > >      descriptor isn't too bad, but in this case it's all happening under
> > >      the hood via fetch. Is it a potential problem for every script,
> > >      then? If so, should we disable background detaching for all test
> > >      repos, and then let the few that want to test it turn it back on?
> > 
> > Might be a good idea to set `maintenance.autoDetach=false` globally,
> > yes. The only downside is of course that it wouldn't cause us to detect
> > failures like the above, where the concurrency itself causes failure.
> > 
> > Anyway, for now I'll:
> > 
> >   - Send a patch to fix the race in t7900.
> > 
> >   - Investigate the reftable concurrency issue.
> > 
> >   - _Not_ send a patch that sets `maintenance.autoDetach=false`.
> 
> That sounds like a good direction. I do suspect there are at least _two_
> races in t7900:
> 
>   1. the detached maintenance that we run explicitly, which causes the
>      "rm" cleanup to fail
> 
>   2. whatever earlier test is kicking off detached maintenance via "git
>      fetch", which is causing the reftable concurrency issue.
> 
> Fixing (1) should be easy (and it looks like you've already sent a
> series). Fixing the reftable code will stop us from segfaulting for (2),
> but I wonder if that detached maintenance might cause similar "rm" style
> problems elsewhere.

It certainly might. The only reason why I don't want to send that patch
now is that it feels a bit too reactionary. We haven't had issues in the
past with it to the best of my knowledge, even though it is an issue in
theory.

We can certainly revisit that though if we see that it indeed is a more
widespread issue.

> > The last one requires a bit more discussion first, and we have been
> > running with `gc.autoDetach=true` implicitly in the past. Thinking a bit
> > more about it, the reason why the above bug triggers now is that
> > git-gc(1) itself runs git-pack-refs(1), but does that _synchronously_
> > before detaching itself. Now we detach at a higher level in the
> > hierarchy, which means that the previously-synchronous part now runs
> > asynchronously, as well.
> 
> That makes sense. I guess we've perhaps been doing background gc for a
> long time, then, just not in the refs? In practice most repos in the
> test suite aren't big enough to trigger auto-gc anyway, so it may only
> affect a handful of scripts.

Yup. We should expect this to work just fine, because it can trigger
regardless of whether or not we run auto-compaction concurrently or not.
After all, the reftable backend even performs auto-compaction after
every write to the table, so any two concurrent writes may hit the bug
without even invoking git-maintenance(1) at all.

> Once the reftable issue is fixed, it's possible that the lingering
> detached processes don't cause a problem in practice (because they're
> not really writing much, and/or have finished by the time anybody else
> gets to cleanup), and we can just live with them. But I'm worried that
> sounds like wishful thinking. ;)

Could certainly be, yeah. I just want to focus on fixing the immediate
issues before we jump to conclusions too fast.

Patrick
Patrick Steinhardt Aug. 19, 2024, 10:49 a.m. UTC | #5
On Sat, Aug 17, 2024 at 08:14:24AM -0400, Jeff King wrote:
> On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote:
> 
> > Fix this bug by asking git-gc(1) to not detach when it is being invoked
> > via git-maintenance(1). Instead, git-maintenance(1) now respects a new
> > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and
> > detaches itself into the background when running as part of our auto
> > maintenance. This should continue to behave the same for all users which
> > use the git-gc(1) task, only. For others though, it means that we now
> > properly perform all tasks in the background. The default behaviour of
> > git-maintenance(1) when executed by the user does not change, it will
> > remain in the foreground unless they pass the `--detach` option.
> 
> This patch seems to cause segfaults in t5616 when combined with the
> reftable backend. Try this:
> 
>   GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress
> 
> which fails for me within a few runs. Bisecting leads to 98077d06b2
> (run-command: fix detaching when running auto maintenance, 2024-08-16).
> It doesn't trigger with the files ref backend.
> 
> Compiling with ASan gets me a stack trace like this:
> 
>   + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
>   AddressSanitizer:DEADLYSIGNAL
>   =================================================================
>   ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
>   ==657994==The signal is caused by a READ memory access.
>       #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
>       #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
>       #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
>       #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
>       #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
>       #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
>       #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
>       #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
>       #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
>       #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
>       #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
>       #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
>       #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
>       #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
>       #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
>       #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
>       #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
>       #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
>       #18 0x55f23dba7764 in run_builtin git.c:484
>       #19 0x55f23dba7764 in handle_builtin git.c:741
>       #20 0x55f23dbab61e in run_argv git.c:805
>       #21 0x55f23dbab61e in cmd_main git.c:1000
>       #22 0x55f23dba4781 in main common-main.c:64
>       #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>       #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
>       #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)

I haven't yet been able to definitely tell, but I think this is a
lifetime issue. We create an iterator, eventually notice that the
reftable stack has been rewritten, and reload the stack. But the
block sources used for the old tables are still referenced by the
iterator, even though it was closed. As such, the mmapped memory of the
table has been unmapped and is now invalid, which causes the above
invalid reads.

I'll work on a patch series that introduces refcounting for block
sources, but guess that'll take a bit.

Patrick
Patrick Steinhardt Aug. 19, 2024, 3:41 p.m. UTC | #6
On Mon, Aug 19, 2024 at 12:49:52PM +0200, Patrick Steinhardt wrote:
> On Sat, Aug 17, 2024 at 08:14:24AM -0400, Jeff King wrote:
> > On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote:
> > 
> > > Fix this bug by asking git-gc(1) to not detach when it is being invoked
> > > via git-maintenance(1). Instead, git-maintenance(1) now respects a new
> > > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and
> > > detaches itself into the background when running as part of our auto
> > > maintenance. This should continue to behave the same for all users which
> > > use the git-gc(1) task, only. For others though, it means that we now
> > > properly perform all tasks in the background. The default behaviour of
> > > git-maintenance(1) when executed by the user does not change, it will
> > > remain in the foreground unless they pass the `--detach` option.
> > 
> > This patch seems to cause segfaults in t5616 when combined with the
> > reftable backend. Try this:
> > 
> >   GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress
> > 
> > which fails for me within a few runs. Bisecting leads to 98077d06b2
> > (run-command: fix detaching when running auto maintenance, 2024-08-16).
> > It doesn't trigger with the files ref backend.
> > 
> > Compiling with ASan gets me a stack trace like this:
> > 
> >   + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin
> >   AddressSanitizer:DEADLYSIGNAL
> >   =================================================================
> >   ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0)
> >   ==657994==The signal is caused by a READ memory access.
> >       #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29
> >       #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170
> >       #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194
> >       #3 0x55f23e54e72e in block_iter_next reftable/block.c:398
> >       #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240
> >       #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355
> >       #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339
> >       #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69
> >       #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123
> >       #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172
> >       #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175
> >       #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464
> >       #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13
> >       #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452
> >       #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623
> >       #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659
> >       #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133
> >       #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432
> >       #18 0x55f23dba7764 in run_builtin git.c:484
> >       #19 0x55f23dba7764 in handle_builtin git.c:741
> >       #20 0x55f23dbab61e in run_argv git.c:805
> >       #21 0x55f23dbab61e in cmd_main git.c:1000
> >       #22 0x55f23dba4781 in main common-main.c:64
> >       #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
> >       #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360
> >       #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27)
> 
> I haven't yet been able to definitely tell, but I think this is a
> lifetime issue. We create an iterator, eventually notice that the
> reftable stack has been rewritten, and reload the stack. But the
> block sources used for the old tables are still referenced by the
> iterator, even though it was closed. As such, the mmapped memory of the
> table has been unmapped and is now invalid, which causes the above
> invalid reads.
> 
> I'll work on a patch series that introduces refcounting for block
> sources, but guess that'll take a bit.

This is being handled via
https://lore.kernel.org/git/cover.1724080006.git.ps@pks.im/.

Patrick
diff mbox series

Patch

diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index 664a3c2874..1d4f9470ea 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -40,7 +40,8 @@  use, it'll affect how the auto pack limit works.
 
 gc.autoDetach::
 	Make `git gc --auto` return immediately and run in the background
-	if the system supports it. Default is true.
+	if the system supports it. Default is true. This config variable acts
+	as a fallback in case `maintenance.autoDetach` is not set.
 
 gc.bigPackThreshold::
 	If non-zero, all non-cruft packs larger than this limit are kept
diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt
index 69a4f05153..72a9d6cf81 100644
--- a/Documentation/config/maintenance.txt
+++ b/Documentation/config/maintenance.txt
@@ -3,6 +3,17 @@  maintenance.auto::
 	`git maintenance run --auto` after doing their normal work. Defaults
 	to true.
 
+maintenance.autoDetach::
+	Many Git commands trigger automatic maintenance after they have
+	written data into the repository. This boolean config option
+	controls whether this automatic maintenance shall happen in the
+	foreground or whether the maintenance process shall detach and
+	continue to run in the background.
++
+If unset, the value of `gc.autoDetach` is used as a fallback. Defaults
+to true if both are unset, meaning that the maintenance process will
+detach.
+
 maintenance.strategy::
 	This string config option provides a way to specify one of a few
 	recommended schedules for background maintenance. This only affects
diff --git a/builtin/gc.c b/builtin/gc.c
index 63106e2028..bafee330a2 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1063,6 +1063,7 @@  static int maintenance_task_gc(struct maintenance_run_opts *opts,
 		strvec_push(&child.args, "--quiet");
 	else
 		strvec_push(&child.args, "--no-quiet");
+	strvec_push(&child.args, "--no-detach");
 
 	return run_command(&child);
 }
diff --git a/run-command.c b/run-command.c
index 45ba544932..94f2f3079f 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1808,16 +1808,26 @@  void run_processes_parallel(const struct run_process_parallel_opts *opts)
 
 int prepare_auto_maintenance(int quiet, struct child_process *maint)
 {
-	int enabled;
+	int enabled, auto_detach;
 
 	if (!git_config_get_bool("maintenance.auto", &enabled) &&
 	    !enabled)
 		return 0;
 
+	/*
+	 * When `maintenance.autoDetach` isn't set, then we fall back to
+	 * honoring `gc.autoDetach`. This is somewhat weird, but required to
+	 * retain behaviour from when we used to run git-gc(1) here.
+	 */
+	if (git_config_get_bool("maintenance.autodetach", &auto_detach) &&
+	    git_config_get_bool("gc.autodetach", &auto_detach))
+		auto_detach = 1;
+
 	maint->git_cmd = 1;
 	maint->close_object_store = 1;
 	strvec_pushl(&maint->args, "maintenance", "run", "--auto", NULL);
 	strvec_push(&maint->args, quiet ? "--quiet" : "--no-quiet");
+	strvec_push(&maint->args, auto_detach ? "--detach" : "--no-detach");
 
 	return 1;
 }
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 2da7291e37..8415884754 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -229,7 +229,7 @@  test_expect_success 'fetch --refetch triggers repacking' '
 
 	GIT_TRACE2_EVENT="$PWD/trace1.event" \
 	git -C pc1 fetch --refetch origin &&
-	test_subcommand git maintenance run --auto --no-quiet <trace1.event &&
+	test_subcommand git maintenance run --auto --no-quiet --detach <trace1.event &&
 	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace1.event &&
 	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace1.event &&
 
@@ -238,7 +238,7 @@  test_expect_success 'fetch --refetch triggers repacking' '
 		-c gc.autoPackLimit=0 \
 		-c maintenance.incremental-repack.auto=1234 \
 		-C pc1 fetch --refetch origin &&
-	test_subcommand git maintenance run --auto --no-quiet <trace2.event &&
+	test_subcommand git maintenance run --auto --no-quiet --detach <trace2.event &&
 	grep \"param\":\"gc.autopacklimit\",\"value\":\"0\" trace2.event &&
 	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace2.event &&
 
@@ -247,7 +247,7 @@  test_expect_success 'fetch --refetch triggers repacking' '
 		-c gc.autoPackLimit=1234 \
 		-c maintenance.incremental-repack.auto=0 \
 		-C pc1 fetch --refetch origin &&
-	test_subcommand git maintenance run --auto --no-quiet <trace3.event &&
+	test_subcommand git maintenance run --auto --no-quiet --detach <trace3.event &&
 	grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace3.event &&
 	grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"0\" trace3.event
 '
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 771525aa4b..06ab43cfb5 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -49,22 +49,47 @@  test_expect_success 'run [--auto|--quiet]' '
 		git maintenance run --auto 2>/dev/null &&
 	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
 		git maintenance run --no-quiet 2>/dev/null &&
-	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
-	test_subcommand git gc --no-quiet <run-no-quiet.txt
+	test_subcommand git gc --quiet --no-detach <run-no-auto.txt &&
+	test_subcommand ! git gc --auto --quiet --no-detach <run-auto.txt &&
+	test_subcommand git gc --no-quiet --no-detach <run-no-quiet.txt
 '
 
 test_expect_success 'maintenance.auto config option' '
 	GIT_TRACE2_EVENT="$(pwd)/default" git commit --quiet --allow-empty -m 1 &&
-	test_subcommand git maintenance run --auto --quiet <default &&
+	test_subcommand git maintenance run --auto --quiet --detach <default &&
 	GIT_TRACE2_EVENT="$(pwd)/true" \
 		git -c maintenance.auto=true \
 		commit --quiet --allow-empty -m 2 &&
-	test_subcommand git maintenance run --auto --quiet  <true &&
+	test_subcommand git maintenance run --auto --quiet --detach <true &&
 	GIT_TRACE2_EVENT="$(pwd)/false" \
 		git -c maintenance.auto=false \
 		commit --quiet --allow-empty -m 3 &&
-	test_subcommand ! git maintenance run --auto --quiet  <false
+	test_subcommand ! git maintenance run --auto --quiet --detach <false
+'
+
+for cfg in maintenance.autoDetach gc.autoDetach
+do
+	test_expect_success "$cfg=true config option" '
+		test_when_finished "rm -f trace" &&
+		test_config $cfg true &&
+		GIT_TRACE2_EVENT="$(pwd)/trace" git commit --quiet --allow-empty -m 1 &&
+		test_subcommand git maintenance run --auto --quiet --detach <trace
+	'
+
+	test_expect_success "$cfg=false config option" '
+		test_when_finished "rm -f trace" &&
+		test_config $cfg false &&
+		GIT_TRACE2_EVENT="$(pwd)/trace" git commit --quiet --allow-empty -m 1 &&
+		test_subcommand git maintenance run --auto --quiet --no-detach <trace
+	'
+done
+
+test_expect_success "maintenance.autoDetach overrides gc.autoDetach" '
+	test_when_finished "rm -f trace" &&
+	test_config maintenance.autoDetach false &&
+	test_config gc.autoDetach true &&
+	GIT_TRACE2_EVENT="$(pwd)/trace" git commit --quiet --allow-empty -m 1 &&
+	test_subcommand git maintenance run --auto --quiet --no-detach <trace
 '
 
 test_expect_success 'register uses XDG_CONFIG_HOME config if it exists' '
@@ -129,9 +154,9 @@  test_expect_success 'run --task=<task>' '
 		git maintenance run --task=commit-graph 2>/dev/null &&
 	GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \
 		git maintenance run --task=commit-graph --task=gc 2>/dev/null &&
-	test_subcommand ! git gc --quiet <run-commit-graph.txt &&
-	test_subcommand git gc --quiet <run-gc.txt &&
-	test_subcommand git gc --quiet <run-both.txt &&
+	test_subcommand ! git gc --quiet --no-detach <run-commit-graph.txt &&
+	test_subcommand git gc --quiet --no-detach <run-gc.txt &&
+	test_subcommand git gc --quiet --no-detach <run-both.txt &&
 	test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt &&
 	test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt &&
 	test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt