mbox series

[0/1] Async-signal safety in signal handlers

Message ID cover.1641552500.git.ps@pks.im (mailing list archive)
Headers show
Series Async-signal safety in signal handlers | expand

Message

Patrick Steinhardt Jan. 7, 2022, 10:53 a.m. UTC
Hi,

we have recently observed a Git process which has been hanging around
for more than a month on one of our servers in production. A backtrace
showed that the git-fetch(1) process was deadlocked in its signal
handler while trying to free memory. Functions like malloc, free and
most I/O functions aren't reentrant though, which means they must not be
executed in async signal handlers as specified in signal-safety(7).

The fix for git-fetch(1) is rather simple: we can just unlink(2) the
lockfiles, which is indeed allowed, but skip free'ing memory. But in
fact, this is a wider issue we have: we mostly didn't pay attention to
those restrictions, and thus we freely call non-async-signal-safe
functions. It's less clear what to do about this in most of the cases
though:

- git-clone(1) tries to clean up the ".git" directory and its worktree
  on being killed, but needs to allocate memory to compute corresponding
  paths. We can try to preallocate the buffer, but it's not clear
  whether there is a proper upper boundary.

- git-gc(1) will try to commit "gc.log" and write to stderr, both of
  which aren't allowed. I think we'll have to just bail and leave it
  behind in a partially-written state.

- git-repack(1) tries to remove "pack/.tmp-*" files, calling opendir(3P),
  readdir(3P), closedir(3P) and allocates memory. We probably have to
  keep track of all temporary files we create in a global list, which we
  can then access in our signal handler.

- git-worktree(1) is doing the same as git-clone(1), trying to prune the
  new worktree if it's killed. Again, we'd probably have to preallocate
  a buffer to compute paths.

- HTTP pushes do all sorts of HTTP requests in their signal handler to
  unlock the remote server. I don't really see what to do about this
  except drop the code -- setting a global "please clean up and exit
  now" flags is probably not going to fly well.

The tempfiles and tmp-objdir code already handles signals correctly.

Patrick

Patrick Steinhardt (1):
  fetch: fix deadlock when cleaning up lockfiles in async signals

 builtin/clone.c |  2 +-
 builtin/fetch.c | 17 +++++++++++------
 transport.c     | 11 ++++++++---
 transport.h     | 14 +++++++++++++-
 4 files changed, 33 insertions(+), 11 deletions(-)