diff mbox series

curiosities with tempfile.active

Message ID YwXw2ytUlrXSSRh7@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series curiosities with tempfile.active | expand

Commit Message

Jeff King Aug. 24, 2022, 9:35 a.m. UTC
[starting a new thread, as this ended up rather long;
 +cc René as there's an interesting twist at the end]

On Tue, Aug 23, 2022 at 11:12:21AM +0200, Johannes Schindelin wrote:

> Side note: I do notice that `delete_tempfile(&tmpedit)` seems to _not_
> release memory when `tmpedit` is non-NULL when `tmpedit->active == 0`.
> I consider this a bug in the `delete_tempfile()` code (in its `if
> (!is_tempfile_active(tempfile))` clause, it should call
> `deactivate_tempfile()` for non-NULL `tempfile`s and set `*tempfile_p =
> NULL;`), but it is outside the scope of your patch to address that.

I agree it is confusing, but I think this turns out not to be a bug in
practice. In ye olden days, tempfile structs lived on the global
linked-list of entries to cleanup forever, so they were statically
allocated or effectively leaked from the heap. And that's why we needed
an "active" flag in the first place.

That changed around 422a21c6a0 (tempfile: remove deactivated list
entries, 2017-09-05) and 076aa2cbda (tempfile: auto-allocate tempfiles
on heap, 2017-09-05). Now calling deactivate_tempfile() unsets the
active flag _and_ frees the struct. And in normal calling, that's the
only way to unset the active flag (it also starts life unset, but the
creation functions always activate on success, or deactivate+free on
error).

So can we just get rid of the active flag? Possibly. I said "normal
calling" above, because the exception is in our remove_tempfiles()
cleanup routine, where we unset the "active" flag as we remove each
file. We can't use the regular delete_tempfile(), or even
deactivate_tempfile() here, because we may be in a signal handler, so
things like free() are forbidden. So we just "leak" the memory, but it's
OK because we're exiting (and even leak detection won't complain,
because it's reachable from the global list).

But why do we care about the flag, then? We just iterate over the list
once, so we should handle each entry once. But it can be called both as
an atexit() handler, as well as a signal handler. So it's possible that
we can be mid-iteration and get another signal (because the original was
via exit, or because we hook multiple signals for the cleanup). So the
active flag in theory is protecting us from that.

But it's not foolproof. We remove the tempfile and then unset the flag,
so there's a moment (albeit small) where that other signal could come
in. It shouldn't be a big deal because unlinking the tempfiles is mostly
idempotent-ish; barring the unlikely chance that somebody else creates
the same random filename as us, the second unlink will do nothing. So
while removing the active flag would increase the size of that race
(from handling a single entry to completing the whole atexit), that
shouldn't matter much in practice.

But here's the interesting twist. Commit 2c2db194bd (tempfile: add
mks_tempfile_dt(), 2022-04-20) recently taught remove_tempfiles() to
drop a surrounding directory. And it does so by munging the filename
buffer. It has to, because we can't allocate a new buffer in a signal
handler.

But is it also idempotent(-ish)? The directory removal itself is,
because it checks:

  tempfile->filename.buf[tempfile->directorylen] == '/'

before overwriting that byte with a NUL, so it should only be true once
(though I note this violates the usual "volatile" rules for signal
handlers, it's probably OK in practice since we know the NUL will be
written before we call rmdir()).

But after that happens, the filename buffer now contains a C string that
points to the directory. And if we end up in remove_tempfiles() again
due to another signal, we'll try to unlink(p->filename.buf), which will
now unlink the directory! I'm not sure how that behaves on all systems.
On Linux it's a noop. And if it just deleted the directory, that would
be OK. I seem to recall on old versions of SunOS it did bad things
(maybe because it really did unlink it, but without checking for
orphaned entries inside).

The good news is that it won't walk further up the tree. I was worried
that a second round might then try to rmdir the parent (usually "/tmp"
in this case). But the directory removal always uses the same length.

So...maybe it's all OK in practice? If not, this is an issue even
without removing the active flag. But removing it would make the race
window larger. I suspect it probably is OK in practice, but saying that
about a race always feels like famous last words.

If we did want to keep it, I suspect we could get the same effect by
munging the volatile "pid_t owner" field, but I think anything we munge
is supposed to be a sig_atomic_t according to the letter of the law.

Patch below shows what it would look like to just drop "active"
entirely. Test suite behaves as expected, but again, the real question
is what it might be doing in a weird racy situation. The one above, but
also a signal racing with adding an entry to the list. The volatile
sig_atomic_t is what tells the removal process in the signal handler
that it's OK to look at. But again, maybe that's OK in practice.

-Peff

---

Comments

René Scharfe Aug. 26, 2022, 10:46 p.m. UTC | #1
Am 24.08.22 um 11:35 schrieb Jeff King:
> But here's the interesting twist. Commit 2c2db194bd (tempfile: add
> mks_tempfile_dt(), 2022-04-20) recently taught remove_tempfiles() to
> drop a surrounding directory. And it does so by munging the filename
> buffer. It has to, because we can't allocate a new buffer in a signal
> handler.
>
> But is it also idempotent(-ish)? The directory removal itself is,
> because it checks:
>
>   tempfile->filename.buf[tempfile->directorylen] == '/'
>
> before overwriting that byte with a NUL, so it should only be true once
> (though I note this violates the usual "volatile" rules for signal
> handlers, it's probably OK in practice since we know the NUL will be
> written before we call rmdir()).
>
> But after that happens, the filename buffer now contains a C string that
> points to the directory. And if we end up in remove_tempfiles() again
> due to another signal, we'll try to unlink(p->filename.buf), which will
> now unlink the directory! I'm not sure how that behaves on all systems.
> On Linux it's a noop. And if it just deleted the directory, that would
> be OK. I seem to recall on old versions of SunOS it did bad things
> (maybe because it really did unlink it, but without checking for
> orphaned entries inside).

https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html
says of the unlink(2) parameter: "The path argument shall not name a
directory unless the process has appropriate privileges and the
implementation supports using unlink() on directories."  So we better
not do that..

--- >8 ---
Subject: [PATCH] tempfile: avoid directory cleanup race

The temporary directory created by mks_tempfile_dt() is deleted by first
deleting the file within, then truncating the filename strbuf and
passing the resulting string to rmdir(2).  When the cleanup routine is
invoked concurrently by a signal handler we can end up passing the now
truncated string to unlink(2), however, which could cause problems on
some systems.

Avoid that issue by remembering the directory name separately.  This way
the paths stay unchanged.  A signal handler can still race with normal
cleanup, but deleting the same files and directories twice is harmless.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 tempfile.c | 14 ++++++--------
 tempfile.h |  2 +-
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/tempfile.c b/tempfile.c
index 2024c82691..7414c81e31 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -59,14 +59,11 @@ static VOLATILE_LIST_HEAD(tempfile_list);
 static void remove_template_directory(struct tempfile *tempfile,
 				      int in_signal_handler)
 {
-	if (tempfile->directorylen > 0 &&
-	    tempfile->directorylen < tempfile->filename.len &&
-	    tempfile->filename.buf[tempfile->directorylen] == '/') {
-		strbuf_setlen(&tempfile->filename, tempfile->directorylen);
+	if (tempfile->directory) {
 		if (in_signal_handler)
-			rmdir(tempfile->filename.buf);
+			rmdir(tempfile->directory);
 		else
-			rmdir_or_warn(tempfile->filename.buf);
+			rmdir_or_warn(tempfile->directory);
 	}
 }

@@ -115,7 +112,7 @@ static struct tempfile *new_tempfile(void)
 	tempfile->owner = 0;
 	INIT_LIST_HEAD(&tempfile->list);
 	strbuf_init(&tempfile->filename, 0);
-	tempfile->directorylen = 0;
+	tempfile->directory = NULL;
 	return tempfile;
 }

@@ -141,6 +138,7 @@ static void deactivate_tempfile(struct tempfile *tempfile)
 {
 	tempfile->active = 0;
 	strbuf_release(&tempfile->filename);
+	free(tempfile->directory);
 	volatile_list_del(&tempfile->list);
 	free(tempfile);
 }
@@ -254,7 +252,7 @@ struct tempfile *mks_tempfile_dt(const char *directory_template,

 	tempfile = new_tempfile();
 	strbuf_swap(&tempfile->filename, &sb);
-	tempfile->directorylen = directorylen;
+	tempfile->directory = xmemdupz(tempfile->filename.buf, directorylen);
 	tempfile->fd = fd;
 	activate_tempfile(tempfile);
 	return tempfile;
diff --git a/tempfile.h b/tempfile.h
index d7804a214a..5b9e8743dd 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -82,7 +82,7 @@ struct tempfile {
 	FILE *volatile fp;
 	volatile pid_t owner;
 	struct strbuf filename;
-	size_t directorylen;
+	char *directory;
 };

 /*
--
2.37.2
Jeff King Aug. 27, 2022, 1:02 p.m. UTC | #2
On Sat, Aug 27, 2022 at 12:46:29AM +0200, René Scharfe wrote:

> https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html
> says of the unlink(2) parameter: "The path argument shall not name a
> directory unless the process has appropriate privileges and the
> implementation supports using unlink() on directories."  So we better
> not do that..

Yeah, I saw that. It's a bit vague, and if the call returns ENOSYS or
EISDIR, that would be perfectly fine. It's the "what happens on the
implementations that do support it..." part that I'm more worried about. :)

That said...

> --- >8 ---
> Subject: [PATCH] tempfile: avoid directory cleanup race
> 
> The temporary directory created by mks_tempfile_dt() is deleted by first
> deleting the file within, then truncating the filename strbuf and
> passing the resulting string to rmdir(2).  When the cleanup routine is
> invoked concurrently by a signal handler we can end up passing the now
> truncated string to unlink(2), however, which could cause problems on
> some systems.
> 
> Avoid that issue by remembering the directory name separately.  This way
> the paths stay unchanged.  A signal handler can still race with normal
> cleanup, but deleting the same files and directories twice is harmless.

Right, I'm a little embarrassed I didn't immediately come up with this
same fix. This is definitely the right thing to do. The more we can
treat data from signal-handlers are write-once, the better.

There's a slight extra memory cost to store the directory name twice,
but it's a drop in the bucket overall.

>  tempfile.c | 14 ++++++--------
>  tempfile.h |  2 +-

The patch itself looks good to me.

-Peff
Chris Torek Aug. 27, 2022, 9:47 p.m. UTC | #3
On Sat, Aug 27, 2022 at 6:05 AM Jeff King <peff@peff.net> wrote:
> Yeah, I saw that. It's a bit vague, and if the call returns ENOSYS or
> EISDIR, that would be perfectly fine. It's the "what happens on the
> implementations that do support it..." part that I'm more worried about. :)

The history here is that pre-4.2BSD, Unix systems had no mkdir
system call. You used mknod() to make a truly empty directory and
the link() to create the "." and ".." entries within it, and all three of
these operations were restricted to the super-user.  There was no
rmdir either, so again, unlink() as the super-user was permitted to
do the job (with three calls to unlink the "." and ".." entries first and
then remove the directory).

Unlinking a directory when it still contains "." leaves the link count
at 1 and there's no GC, so it sits around occupying an inode.

If you have a mkdir() system call and don't need backwards
compatibility, you get to have these return EISDIR errors...

Chris
Jeff King Aug. 30, 2022, 6:56 p.m. UTC | #4
On Sat, Aug 27, 2022 at 02:47:45PM -0700, Chris Torek wrote:

> On Sat, Aug 27, 2022 at 6:05 AM Jeff King <peff@peff.net> wrote:
> > Yeah, I saw that. It's a bit vague, and if the call returns ENOSYS or
> > EISDIR, that would be perfectly fine. It's the "what happens on the
> > implementations that do support it..." part that I'm more worried about. :)
> 
> The history here is that pre-4.2BSD, Unix systems had no mkdir
> system call. You used mknod() to make a truly empty directory and
> the link() to create the "." and ".." entries within it, and all three of
> these operations were restricted to the super-user.  There was no
> rmdir either, so again, unlink() as the super-user was permitted to
> do the job (with three calls to unlink the "." and ".." entries first and
> then remove the directory).
> 
> Unlinking a directory when it still contains "." leaves the link count
> at 1 and there's no GC, so it sits around occupying an inode.

Thanks, that matches the sense of unease I had in the back of my mind. I
seem to recall that maybe older versions of SunOS exhibited this, but
that feels like a lifetime ago. At any rate, we should avoid that
unlink() call, and René's patch neatly does so.

-Peff
diff mbox series

Patch

diff --git a/tempfile.c b/tempfile.c
index 2024c82691..6134b73972 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -89,8 +89,6 @@  static void remove_tempfiles(int in_signal_handler)
 		else
 			unlink_or_warn(p->filename.buf);
 		remove_template_directory(p, in_signal_handler);
-
-		p->active = 0;
 	}
 }
 
@@ -111,7 +109,6 @@  static struct tempfile *new_tempfile(void)
 	struct tempfile *tempfile = xmalloc(sizeof(*tempfile));
 	tempfile->fd = -1;
 	tempfile->fp = NULL;
-	tempfile->active = 0;
 	tempfile->owner = 0;
 	INIT_LIST_HEAD(&tempfile->list);
 	strbuf_init(&tempfile->filename, 0);
@@ -123,9 +120,6 @@  static void activate_tempfile(struct tempfile *tempfile)
 {
 	static int initialized;
 
-	if (is_tempfile_active(tempfile))
-		BUG("activate_tempfile called for active object");
-
 	if (!initialized) {
 		sigchain_push_common(remove_tempfiles_on_signal);
 		atexit(remove_tempfiles_on_exit);
@@ -134,12 +128,10 @@  static void activate_tempfile(struct tempfile *tempfile)
 
 	volatile_list_add(&tempfile->list, &tempfile_list);
 	tempfile->owner = getpid();
-	tempfile->active = 1;
 }
 
 static void deactivate_tempfile(struct tempfile *tempfile)
 {
-	tempfile->active = 0;
 	strbuf_release(&tempfile->filename);
 	volatile_list_del(&tempfile->list);
 	free(tempfile);
diff --git a/tempfile.h b/tempfile.h
index d7804a214a..f0bf59dbac 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -77,7 +77,6 @@ 
 
 struct tempfile {
 	volatile struct volatile_list_head list;
-	volatile sig_atomic_t active;
 	volatile int fd;
 	FILE *volatile fp;
 	volatile pid_t owner;
@@ -221,7 +220,7 @@  FILE *fdopen_tempfile(struct tempfile *tempfile, const char *mode);
 
 static inline int is_tempfile_active(struct tempfile *tempfile)
 {
-	return tempfile && tempfile->active;
+	return !!tempfile;
 }
 
 /*