Message ID | 9befef7c1f7520d58af2b2db17174b8dbc493d56.1723804990.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | 98077d06b28b97d508c389886ee5014056707a5e |
Headers | show |
Series | builtin/maintenance: fix auto-detach with non-standard tasks | expand |
On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote: > Fix this bug by asking git-gc(1) to not detach when it is being invoked > via git-maintenance(1). Instead, git-maintenance(1) now respects a new > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and > detaches itself into the background when running as part of our auto > maintenance. This should continue to behave the same for all users which > use the git-gc(1) task, only. For others though, it means that we now > properly perform all tasks in the background. The default behaviour of > git-maintenance(1) when executed by the user does not change, it will > remain in the foreground unless they pass the `--detach` option. This patch seems to cause segfaults in t5616 when combined with the reftable backend. Try this: GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress which fails for me within a few runs. Bisecting leads to 98077d06b2 (run-command: fix detaching when running auto maintenance, 2024-08-16). It doesn't trigger with the files ref backend. Compiling with ASan gets me a stack trace like this: + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin AddressSanitizer:DEADLYSIGNAL ================================================================= ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0) ==657994==The signal is caused by a READ memory access. #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29 #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170 #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194 #3 0x55f23e54e72e in block_iter_next reftable/block.c:398 #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240 #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355 #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339 #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69 #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123 #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172 #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175 #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464 #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13 #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452 #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623 #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659 #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133 #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432 #18 0x55f23dba7764 in run_builtin git.c:484 #19 0x55f23dba7764 in handle_builtin git.c:741 #20 0x55f23dbab61e in run_argv git.c:805 #21 0x55f23dbab61e in cmd_main git.c:1000 #22 0x55f23dba4781 in main common-main.c:64 #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360 #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27) My guess based on what I'm seeing and what the patch does is that now maintenance from a previous command is running in the background while our foreground git-fetch runs, and it somehow confuses things (perhaps by trying to compact reftables or something?). So I think there are two problems: 1. The reftable code needs to be more robust against whatever race is happening. I didn't dig further, but I'm sure it would be possible to produce a coredump. 2. Having racy background maintenance doesn't seem great for test robustness. At the very least, it might subject us to the "rm" problems mentioned elsewhere, where we fail to clean up. Annotating individual "git gc" or "git maintenance" calls with an extra descriptor isn't too bad, but in this case it's all happening under the hood via fetch. Is it a potential problem for every script, then? If so, should we disable background detaching for all test repos, and then let the few that want to test it turn it back on? -Peff
On Sat, Aug 17, 2024 at 08:14:24AM -0400, Jeff King wrote: > On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote: > > > Fix this bug by asking git-gc(1) to not detach when it is being invoked > > via git-maintenance(1). Instead, git-maintenance(1) now respects a new > > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and > > detaches itself into the background when running as part of our auto > > maintenance. This should continue to behave the same for all users which > > use the git-gc(1) task, only. For others though, it means that we now > > properly perform all tasks in the background. The default behaviour of > > git-maintenance(1) when executed by the user does not change, it will > > remain in the foreground unless they pass the `--detach` option. > > This patch seems to cause segfaults in t5616 when combined with the > reftable backend. Try this: > > GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress > > which fails for me within a few runs. Bisecting leads to 98077d06b2 > (run-command: fix detaching when running auto maintenance, 2024-08-16). > It doesn't trigger with the files ref backend. > > Compiling with ASan gets me a stack trace like this: > > + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin > AddressSanitizer:DEADLYSIGNAL > ================================================================= > ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0) > ==657994==The signal is caused by a READ memory access. > #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29 > #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170 > #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194 > #3 0x55f23e54e72e in block_iter_next reftable/block.c:398 > #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240 > #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355 > #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339 > #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69 > #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123 > #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172 > #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175 > #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464 > #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13 > #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452 > #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623 > #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659 > #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133 > #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432 > #18 0x55f23dba7764 in run_builtin git.c:484 > #19 0x55f23dba7764 in handle_builtin git.c:741 > #20 0x55f23dbab61e in run_argv git.c:805 > #21 0x55f23dbab61e in cmd_main git.c:1000 > #22 0x55f23dba4781 in main common-main.c:64 > #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360 > #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27) > > My guess based on what I'm seeing and what the patch does is that now > maintenance from a previous command is running in the background while > our foreground git-fetch runs, and it somehow confuses things (perhaps > by trying to compact reftables or something?). So I think there are two > problems: > > 1. The reftable code needs to be more robust against whatever race is > happening. I didn't dig further, but I'm sure it would be possible > to produce a coredump. Yes, it certainly has to be robust against this. It's also where the recent reftable compaction fixes [1] came from. In theory, it should work alright with a concurrent process compacting the stack at the same time where another process is reading. In practice the backend is still in its infancy, so I'm not entirely surprised that there are concurrency bugs. I will investigate this issue, thanks a lot for the backtrace. [1]: https://lore.kernel.org/git/cover.1722435214.git.ps@pks.im/ > 2. Having racy background maintenance doesn't seem great for test > robustness. At the very least, it might subject us to the "rm" > problems mentioned elsewhere, where we fail to clean up. Annotating > individual "git gc" or "git maintenance" calls with an extra > descriptor isn't too bad, but in this case it's all happening under > the hood via fetch. Is it a potential problem for every script, > then? If so, should we disable background detaching for all test > repos, and then let the few that want to test it turn it back on? Might be a good idea to set `maintenance.autoDetach=false` globally, yes. The only downside is of course that it wouldn't cause us to detect failures like the above, where the concurrency itself causes failure. Anyway, for now I'll: - Send a patch to fix the race in t7900. - Investigate the reftable concurrency issue. - _Not_ send a patch that sets `maintenance.autoDetach=false`. The last one requires a bit more discussion first, and we have been running with `gc.autoDetach=true` implicitly in the past. Thinking a bit more about it, the reason why the above bug triggers now is that git-gc(1) itself runs git-pack-refs(1), but does that _synchronously_ before detaching itself. Now we detach at a higher level in the hierarchy, which means that the previously-synchronous part now runs asynchronously, as well. I cannot think of a reason why we shouldn't do this, as the ref backends should handle this gracefully. The fact that the reftable backend doesn't is a separate, preexisting bug. Patrick
On Mon, Aug 19, 2024 at 08:17:22AM +0200, Patrick Steinhardt wrote: > > 2. Having racy background maintenance doesn't seem great for test > > robustness. At the very least, it might subject us to the "rm" > > problems mentioned elsewhere, where we fail to clean up. Annotating > > individual "git gc" or "git maintenance" calls with an extra > > descriptor isn't too bad, but in this case it's all happening under > > the hood via fetch. Is it a potential problem for every script, > > then? If so, should we disable background detaching for all test > > repos, and then let the few that want to test it turn it back on? > > Might be a good idea to set `maintenance.autoDetach=false` globally, > yes. The only downside is of course that it wouldn't cause us to detect > failures like the above, where the concurrency itself causes failure. > > Anyway, for now I'll: > > - Send a patch to fix the race in t7900. > > - Investigate the reftable concurrency issue. > > - _Not_ send a patch that sets `maintenance.autoDetach=false`. That sounds like a good direction. I do suspect there are at least _two_ races in t7900: 1. the detached maintenance that we run explicitly, which causes the "rm" cleanup to fail 2. whatever earlier test is kicking off detached maintenance via "git fetch", which is causing the reftable concurrency issue. Fixing (1) should be easy (and it looks like you've already sent a series). Fixing the reftable code will stop us from segfaulting for (2), but I wonder if that detached maintenance might cause similar "rm" style problems elsewhere. > The last one requires a bit more discussion first, and we have been > running with `gc.autoDetach=true` implicitly in the past. Thinking a bit > more about it, the reason why the above bug triggers now is that > git-gc(1) itself runs git-pack-refs(1), but does that _synchronously_ > before detaching itself. Now we detach at a higher level in the > hierarchy, which means that the previously-synchronous part now runs > asynchronously, as well. That makes sense. I guess we've perhaps been doing background gc for a long time, then, just not in the refs? In practice most repos in the test suite aren't big enough to trigger auto-gc anyway, so it may only affect a handful of scripts. Once the reftable issue is fixed, it's possible that the lingering detached processes don't cause a problem in practice (because they're not really writing much, and/or have finished by the time anybody else gets to cleanup), and we can just live with them. But I'm worried that sounds like wishful thinking. ;) -Peff
On Mon, Aug 19, 2024 at 04:46:14AM -0400, Jeff King wrote: > On Mon, Aug 19, 2024 at 08:17:22AM +0200, Patrick Steinhardt wrote: > > > > 2. Having racy background maintenance doesn't seem great for test > > > robustness. At the very least, it might subject us to the "rm" > > > problems mentioned elsewhere, where we fail to clean up. Annotating > > > individual "git gc" or "git maintenance" calls with an extra > > > descriptor isn't too bad, but in this case it's all happening under > > > the hood via fetch. Is it a potential problem for every script, > > > then? If so, should we disable background detaching for all test > > > repos, and then let the few that want to test it turn it back on? > > > > Might be a good idea to set `maintenance.autoDetach=false` globally, > > yes. The only downside is of course that it wouldn't cause us to detect > > failures like the above, where the concurrency itself causes failure. > > > > Anyway, for now I'll: > > > > - Send a patch to fix the race in t7900. > > > > - Investigate the reftable concurrency issue. > > > > - _Not_ send a patch that sets `maintenance.autoDetach=false`. > > That sounds like a good direction. I do suspect there are at least _two_ > races in t7900: > > 1. the detached maintenance that we run explicitly, which causes the > "rm" cleanup to fail > > 2. whatever earlier test is kicking off detached maintenance via "git > fetch", which is causing the reftable concurrency issue. > > Fixing (1) should be easy (and it looks like you've already sent a > series). Fixing the reftable code will stop us from segfaulting for (2), > but I wonder if that detached maintenance might cause similar "rm" style > problems elsewhere. It certainly might. The only reason why I don't want to send that patch now is that it feels a bit too reactionary. We haven't had issues in the past with it to the best of my knowledge, even though it is an issue in theory. We can certainly revisit that though if we see that it indeed is a more widespread issue. > > The last one requires a bit more discussion first, and we have been > > running with `gc.autoDetach=true` implicitly in the past. Thinking a bit > > more about it, the reason why the above bug triggers now is that > > git-gc(1) itself runs git-pack-refs(1), but does that _synchronously_ > > before detaching itself. Now we detach at a higher level in the > > hierarchy, which means that the previously-synchronous part now runs > > asynchronously, as well. > > That makes sense. I guess we've perhaps been doing background gc for a > long time, then, just not in the refs? In practice most repos in the > test suite aren't big enough to trigger auto-gc anyway, so it may only > affect a handful of scripts. Yup. We should expect this to work just fine, because it can trigger regardless of whether or not we run auto-compaction concurrently or not. After all, the reftable backend even performs auto-compaction after every write to the table, so any two concurrent writes may hit the bug without even invoking git-maintenance(1) at all. > Once the reftable issue is fixed, it's possible that the lingering > detached processes don't cause a problem in practice (because they're > not really writing much, and/or have finished by the time anybody else > gets to cleanup), and we can just live with them. But I'm worried that > sounds like wishful thinking. ;) Could certainly be, yeah. I just want to focus on fixing the immediate issues before we jump to conclusions too fast. Patrick
On Sat, Aug 17, 2024 at 08:14:24AM -0400, Jeff King wrote: > On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote: > > > Fix this bug by asking git-gc(1) to not detach when it is being invoked > > via git-maintenance(1). Instead, git-maintenance(1) now respects a new > > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and > > detaches itself into the background when running as part of our auto > > maintenance. This should continue to behave the same for all users which > > use the git-gc(1) task, only. For others though, it means that we now > > properly perform all tasks in the background. The default behaviour of > > git-maintenance(1) when executed by the user does not change, it will > > remain in the foreground unless they pass the `--detach` option. > > This patch seems to cause segfaults in t5616 when combined with the > reftable backend. Try this: > > GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress > > which fails for me within a few runs. Bisecting leads to 98077d06b2 > (run-command: fix detaching when running auto maintenance, 2024-08-16). > It doesn't trigger with the files ref backend. > > Compiling with ASan gets me a stack trace like this: > > + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin > AddressSanitizer:DEADLYSIGNAL > ================================================================= > ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0) > ==657994==The signal is caused by a READ memory access. > #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29 > #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170 > #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194 > #3 0x55f23e54e72e in block_iter_next reftable/block.c:398 > #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240 > #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355 > #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339 > #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69 > #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123 > #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172 > #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175 > #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464 > #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13 > #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452 > #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623 > #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659 > #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133 > #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432 > #18 0x55f23dba7764 in run_builtin git.c:484 > #19 0x55f23dba7764 in handle_builtin git.c:741 > #20 0x55f23dbab61e in run_argv git.c:805 > #21 0x55f23dbab61e in cmd_main git.c:1000 > #22 0x55f23dba4781 in main common-main.c:64 > #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360 > #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27) I haven't yet been able to definitely tell, but I think this is a lifetime issue. We create an iterator, eventually notice that the reftable stack has been rewritten, and reload the stack. But the block sources used for the old tables are still referenced by the iterator, even though it was closed. As such, the mmapped memory of the table has been unmapped and is now invalid, which causes the above invalid reads. I'll work on a patch series that introduces refcounting for block sources, but guess that'll take a bit. Patrick
On Mon, Aug 19, 2024 at 12:49:52PM +0200, Patrick Steinhardt wrote: > On Sat, Aug 17, 2024 at 08:14:24AM -0400, Jeff King wrote: > > On Fri, Aug 16, 2024 at 12:45:17PM +0200, Patrick Steinhardt wrote: > > > > > Fix this bug by asking git-gc(1) to not detach when it is being invoked > > > via git-maintenance(1). Instead, git-maintenance(1) now respects a new > > > config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and > > > detaches itself into the background when running as part of our auto > > > maintenance. This should continue to behave the same for all users which > > > use the git-gc(1) task, only. For others though, it means that we now > > > properly perform all tasks in the background. The default behaviour of > > > git-maintenance(1) when executed by the user does not change, it will > > > remain in the foreground unless they pass the `--detach` option. > > > > This patch seems to cause segfaults in t5616 when combined with the > > reftable backend. Try this: > > > > GIT_TEST_DEFAULT_REF_FORMAT=reftable ./t5616-partial-clone.sh --run=1-16 --stress > > > > which fails for me within a few runs. Bisecting leads to 98077d06b2 > > (run-command: fix detaching when running auto maintenance, 2024-08-16). > > It doesn't trigger with the files ref backend. > > > > Compiling with ASan gets me a stack trace like this: > > > > + git -c protocol.version=0 -C pc1 fetch --filter=blob:limit=29999 --refetch origin > > AddressSanitizer:DEADLYSIGNAL > > ================================================================= > > ==657994==ERROR: AddressSanitizer: SEGV on unknown address 0x7fa0f0ec6089 (pc 0x55f23e52ddf9 bp 0x7ffe7bfa1700 sp 0x7ffe7bfa1700 T0) > > ==657994==The signal is caused by a READ memory access. > > #0 0x55f23e52ddf9 in get_var_int reftable/record.c:29 > > #1 0x55f23e53295e in reftable_decode_keylen reftable/record.c:170 > > #2 0x55f23e532cc0 in reftable_decode_key reftable/record.c:194 > > #3 0x55f23e54e72e in block_iter_next reftable/block.c:398 > > #4 0x55f23e5573dc in table_iter_next_in_block reftable/reader.c:240 > > #5 0x55f23e5573dc in table_iter_next reftable/reader.c:355 > > #6 0x55f23e5573dc in table_iter_next reftable/reader.c:339 > > #7 0x55f23e551283 in merged_iter_advance_subiter reftable/merged.c:69 > > #8 0x55f23e55169e in merged_iter_next_entry reftable/merged.c:123 > > #9 0x55f23e55169e in merged_iter_next_void reftable/merged.c:172 > > #10 0x55f23e537625 in reftable_iterator_next_ref reftable/generic.c:175 > > #11 0x55f23e2cf9c6 in reftable_ref_iterator_advance refs/reftable-backend.c:464 > > #12 0x55f23e2d996e in ref_iterator_advance refs/iterator.c:13 > > #13 0x55f23e2d996e in do_for_each_ref_iterator refs/iterator.c:452 > > #14 0x55f23dca6767 in get_ref_map builtin/fetch.c:623 > > #15 0x55f23dca6767 in do_fetch builtin/fetch.c:1659 > > #16 0x55f23dca6767 in fetch_one builtin/fetch.c:2133 > > #17 0x55f23dca6767 in cmd_fetch builtin/fetch.c:2432 > > #18 0x55f23dba7764 in run_builtin git.c:484 > > #19 0x55f23dba7764 in handle_builtin git.c:741 > > #20 0x55f23dbab61e in run_argv git.c:805 > > #21 0x55f23dbab61e in cmd_main git.c:1000 > > #22 0x55f23dba4781 in main common-main.c:64 > > #23 0x7fa0f063fc89 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > > #24 0x7fa0f063fd44 in __libc_start_main_impl ../csu/libc-start.c:360 > > #25 0x55f23dba6ad0 in _start (git+0xadfad0) (BuildId: 803b2b7f59beb03d7849fb8294a8e2145dd4aa27) > > I haven't yet been able to definitely tell, but I think this is a > lifetime issue. We create an iterator, eventually notice that the > reftable stack has been rewritten, and reload the stack. But the > block sources used for the old tables are still referenced by the > iterator, even though it was closed. As such, the mmapped memory of the > table has been unmapped and is now invalid, which causes the above > invalid reads. > > I'll work on a patch series that introduces refcounting for block > sources, but guess that'll take a bit. This is being handled via https://lore.kernel.org/git/cover.1724080006.git.ps@pks.im/. Patrick
diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt index 664a3c2874..1d4f9470ea 100644 --- a/Documentation/config/gc.txt +++ b/Documentation/config/gc.txt @@ -40,7 +40,8 @@ use, it'll affect how the auto pack limit works. gc.autoDetach:: Make `git gc --auto` return immediately and run in the background - if the system supports it. Default is true. + if the system supports it. Default is true. This config variable acts + as a fallback in case `maintenance.autoDetach` is not set. gc.bigPackThreshold:: If non-zero, all non-cruft packs larger than this limit are kept diff --git a/Documentation/config/maintenance.txt b/Documentation/config/maintenance.txt index 69a4f05153..72a9d6cf81 100644 --- a/Documentation/config/maintenance.txt +++ b/Documentation/config/maintenance.txt @@ -3,6 +3,17 @@ maintenance.auto:: `git maintenance run --auto` after doing their normal work. Defaults to true. +maintenance.autoDetach:: + Many Git commands trigger automatic maintenance after they have + written data into the repository. This boolean config option + controls whether this automatic maintenance shall happen in the + foreground or whether the maintenance process shall detach and + continue to run in the background. ++ +If unset, the value of `gc.autoDetach` is used as a fallback. Defaults +to true if both are unset, meaning that the maintenance process will +detach. + maintenance.strategy:: This string config option provides a way to specify one of a few recommended schedules for background maintenance. This only affects diff --git a/builtin/gc.c b/builtin/gc.c index 63106e2028..bafee330a2 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1063,6 +1063,7 @@ static int maintenance_task_gc(struct maintenance_run_opts *opts, strvec_push(&child.args, "--quiet"); else strvec_push(&child.args, "--no-quiet"); + strvec_push(&child.args, "--no-detach"); return run_command(&child); } diff --git a/run-command.c b/run-command.c index 45ba544932..94f2f3079f 100644 --- a/run-command.c +++ b/run-command.c @@ -1808,16 +1808,26 @@ void run_processes_parallel(const struct run_process_parallel_opts *opts) int prepare_auto_maintenance(int quiet, struct child_process *maint) { - int enabled; + int enabled, auto_detach; if (!git_config_get_bool("maintenance.auto", &enabled) && !enabled) return 0; + /* + * When `maintenance.autoDetach` isn't set, then we fall back to + * honoring `gc.autoDetach`. This is somewhat weird, but required to + * retain behaviour from when we used to run git-gc(1) here. + */ + if (git_config_get_bool("maintenance.autodetach", &auto_detach) && + git_config_get_bool("gc.autodetach", &auto_detach)) + auto_detach = 1; + maint->git_cmd = 1; maint->close_object_store = 1; strvec_pushl(&maint->args, "maintenance", "run", "--auto", NULL); strvec_push(&maint->args, quiet ? "--quiet" : "--no-quiet"); + strvec_push(&maint->args, auto_detach ? "--detach" : "--no-detach"); return 1; } diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index 2da7291e37..8415884754 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -229,7 +229,7 @@ test_expect_success 'fetch --refetch triggers repacking' ' GIT_TRACE2_EVENT="$PWD/trace1.event" \ git -C pc1 fetch --refetch origin && - test_subcommand git maintenance run --auto --no-quiet <trace1.event && + test_subcommand git maintenance run --auto --no-quiet --detach <trace1.event && grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace1.event && grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace1.event && @@ -238,7 +238,7 @@ test_expect_success 'fetch --refetch triggers repacking' ' -c gc.autoPackLimit=0 \ -c maintenance.incremental-repack.auto=1234 \ -C pc1 fetch --refetch origin && - test_subcommand git maintenance run --auto --no-quiet <trace2.event && + test_subcommand git maintenance run --auto --no-quiet --detach <trace2.event && grep \"param\":\"gc.autopacklimit\",\"value\":\"0\" trace2.event && grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"-1\" trace2.event && @@ -247,7 +247,7 @@ test_expect_success 'fetch --refetch triggers repacking' ' -c gc.autoPackLimit=1234 \ -c maintenance.incremental-repack.auto=0 \ -C pc1 fetch --refetch origin && - test_subcommand git maintenance run --auto --no-quiet <trace3.event && + test_subcommand git maintenance run --auto --no-quiet --detach <trace3.event && grep \"param\":\"gc.autopacklimit\",\"value\":\"1\" trace3.event && grep \"param\":\"maintenance.incremental-repack.auto\",\"value\":\"0\" trace3.event ' diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 771525aa4b..06ab43cfb5 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -49,22 +49,47 @@ test_expect_success 'run [--auto|--quiet]' ' git maintenance run --auto 2>/dev/null && GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \ git maintenance run --no-quiet 2>/dev/null && - test_subcommand git gc --quiet <run-no-auto.txt && - test_subcommand ! git gc --auto --quiet <run-auto.txt && - test_subcommand git gc --no-quiet <run-no-quiet.txt + test_subcommand git gc --quiet --no-detach <run-no-auto.txt && + test_subcommand ! git gc --auto --quiet --no-detach <run-auto.txt && + test_subcommand git gc --no-quiet --no-detach <run-no-quiet.txt ' test_expect_success 'maintenance.auto config option' ' GIT_TRACE2_EVENT="$(pwd)/default" git commit --quiet --allow-empty -m 1 && - test_subcommand git maintenance run --auto --quiet <default && + test_subcommand git maintenance run --auto --quiet --detach <default && GIT_TRACE2_EVENT="$(pwd)/true" \ git -c maintenance.auto=true \ commit --quiet --allow-empty -m 2 && - test_subcommand git maintenance run --auto --quiet <true && + test_subcommand git maintenance run --auto --quiet --detach <true && GIT_TRACE2_EVENT="$(pwd)/false" \ git -c maintenance.auto=false \ commit --quiet --allow-empty -m 3 && - test_subcommand ! git maintenance run --auto --quiet <false + test_subcommand ! git maintenance run --auto --quiet --detach <false +' + +for cfg in maintenance.autoDetach gc.autoDetach +do + test_expect_success "$cfg=true config option" ' + test_when_finished "rm -f trace" && + test_config $cfg true && + GIT_TRACE2_EVENT="$(pwd)/trace" git commit --quiet --allow-empty -m 1 && + test_subcommand git maintenance run --auto --quiet --detach <trace + ' + + test_expect_success "$cfg=false config option" ' + test_when_finished "rm -f trace" && + test_config $cfg false && + GIT_TRACE2_EVENT="$(pwd)/trace" git commit --quiet --allow-empty -m 1 && + test_subcommand git maintenance run --auto --quiet --no-detach <trace + ' +done + +test_expect_success "maintenance.autoDetach overrides gc.autoDetach" ' + test_when_finished "rm -f trace" && + test_config maintenance.autoDetach false && + test_config gc.autoDetach true && + GIT_TRACE2_EVENT="$(pwd)/trace" git commit --quiet --allow-empty -m 1 && + test_subcommand git maintenance run --auto --quiet --no-detach <trace ' test_expect_success 'register uses XDG_CONFIG_HOME config if it exists' ' @@ -129,9 +154,9 @@ test_expect_success 'run --task=<task>' ' git maintenance run --task=commit-graph 2>/dev/null && GIT_TRACE2_EVENT="$(pwd)/run-both.txt" \ git maintenance run --task=commit-graph --task=gc 2>/dev/null && - test_subcommand ! git gc --quiet <run-commit-graph.txt && - test_subcommand git gc --quiet <run-gc.txt && - test_subcommand git gc --quiet <run-both.txt && + test_subcommand ! git gc --quiet --no-detach <run-commit-graph.txt && + test_subcommand git gc --quiet --no-detach <run-gc.txt && + test_subcommand git gc --quiet --no-detach <run-both.txt && test_subcommand git commit-graph write --split --reachable --no-progress <run-commit-graph.txt && test_subcommand ! git commit-graph write --split --reachable --no-progress <run-gc.txt && test_subcommand git commit-graph write --split --reachable --no-progress <run-both.txt
In the past, we used to execute `git gc --auto` as part of our automatic housekeeping routines. As git-gc(1) may require quite some time to perform the housekeeping, it knows to detach itself and run in the background so that the user can continue their work. Eventually, we refactored our automatic housekeeping to instead use the more flexible git-maintenance(1) command. The upside of this new infra is that the user can configure which maintenance tasks are performed, at least to a certain degree. So while it continues to run git-gc(1) by default, it can also be adapted to e.g. use git-multi-pack-index(1) for maintenance of the object database. The auto-detach of the new infra is somewhat broken though once the user configures non-standard tasks. The problem is essentially that we detach at the wrong level in the process hierarchy: git-maintenance(1) never detaches itself, but instead it continues to be git-gc(1) which does. When configured to only run the git-gc(1) maintenance task, then the result is basically the same as before. But when configured to run other tasks, then git-maintenance(1) will wait for these to run to completion. Even worse, it may be that git-gc(1) runs concurrently with other housekeeping tasks, stomping on each others feet. Fix this bug by asking git-gc(1) to not detach when it is being invoked via git-maintenance(1). Instead, git-maintenance(1) now respects a new config "maintenance.autoDetach", the equivalent of "gc.autoDetach", and detaches itself into the background when running as part of our auto maintenance. This should continue to behave the same for all users which use the git-gc(1) task, only. For others though, it means that we now properly perform all tasks in the background. The default behaviour of git-maintenance(1) when executed by the user does not change, it will remain in the foreground unless they pass the `--detach` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/config/gc.txt | 3 +- Documentation/config/maintenance.txt | 11 +++++++ builtin/gc.c | 1 + run-command.c | 12 +++++++- t/t5616-partial-clone.sh | 6 ++-- t/t7900-maintenance.sh | 43 ++++++++++++++++++++++------ 6 files changed, 62 insertions(+), 14 deletions(-)