diff mbox

[RFC] vfs: don't bother clearing close_on_exec bit for unused fds

Message ID 1446543679-28849-1-git-send-email-linux@rasmusvillemoes.dk (mailing list archive)
State New, archived
Headers show

Commit Message

Rasmus Villemoes Nov. 3, 2015, 9:41 a.m. UTC
In fc90888d07b8 (vfs: conditionally clear close-on-exec flag) a
conditional was added to __clear_close_on_exec to avoid dirtying a
cache line in the common case where the bit is already clear. However,
AFAICT, we don't rely on the close_on_exec bit being clear for unused
fds, except as an optimization in do_close_on_exec(); if I haven't
missed anything, __{set,clear}_close_on_exec is always called when a
new fd is allocated. At the expense of also reading through ->open_fds
in do_close_on_exec(), we can avoid accessing the close_on_exec bitmap
altogether in close(), which I think is a reasonable trade-off.

The conditional added in the commit above still makes sense to avoid
the dirtying on the allocation paths, but I also think it might make
sense in __set_close_on_exec: I suppose any given app handling a
non-trivial amount of fds uses O_CLOEXEC for either almost none or
almost all of them.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
---
I'm sure I've missed something, hence the RFC. But if not, there's
probably also a few memsets which become redundant. And the
__set_close_on_exec part should probably be its own patch...

 fs/file.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Linus Torvalds Nov. 3, 2015, 10:45 p.m. UTC | #1
On Tue, Nov 3, 2015 at 1:41 AM, Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> I'm sure I've missed something, hence the RFC. But if not, there's
> probably also a few memsets which become redundant. And the
> __set_close_on_exec part should probably be its own patch...

The patch looks fine to me. I'm not sure the __set_close_on_exec part
even makes sense, because if you set that bit, it usually really *is*
clear before, so testing it beforehand is just pointless.  And if
somebody really keeps setting the bit, they are doing something stupid
anyway..

So I have nothing against the patch, but I do wonder how much it
matters. If there isn't a noticeable performance win, I'd almost
rather just keep the close-on-exec bitmap up-to-date. Hmm?

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rasmus Villemoes Nov. 3, 2015, 11:13 p.m. UTC | #2
On Tue, Nov 03 2015, Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Nov 3, 2015 at 1:41 AM, Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
>>
>> I'm sure I've missed something, hence the RFC. But if not, there's
>> probably also a few memsets which become redundant. And the
>> __set_close_on_exec part should probably be its own patch...
>
> The patch looks fine to me. I'm not sure the __set_close_on_exec part
> even makes sense, because if you set that bit, it usually really *is*
> clear before, so testing it beforehand is just pointless.  And if
> somebody really keeps setting the bit, they are doing something stupid
> anyway..

So that's true for the lifetime of a single fd where no-one of course
does fcntl(fd, FD_CLOEXEC) more than once. But the scenario I was
thinking of was when fds get recycled. open(, O_CLOEXEC) => 5, close(5),
open(, O_CLOEXEC) => 5; in that case, letting the close_on_exec bit keep
its value avoids dirtying the cache line on all subsequent allocations
of fd 5 (for example, had Eric's app been using *_CLOEXEC for all its
open's, socket's etc. there wouldn't have been any gain by adding the
conditional to __clear_close_on_exec, but I'd expect to see a similar
gain by doing the symmetric thing). Again, this is assuming that almost
all fd allocations either do or do not apply CLOEXEC - after a while,
->close_on_exec would reach a steady-state where no bits get flipped
anymore.

The "usually really *is* clear" only holds when we do "bother clearing
close_on_exec bit for unused fds", which is what I suggest we don't :-)

I don't think either state of the bit in close_on_exec is more or less
'up-to-date' when its buddy in open_fds is not set.

Rasmus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Nov. 4, 2015, 1:31 a.m. UTC | #3
On Tue, 2015-11-03 at 10:41 +0100, Rasmus Villemoes wrote:

> @@ -667,7 +667,7 @@ void do_close_on_exec(struct files_struct *files)
>  		fdt = files_fdtable(files);
>  		if (fd >= fdt->max_fds)
>  			break;
> -		set = fdt->close_on_exec[i];
> +		set = fdt->close_on_exec[i] & fdt->open_fds[i];
>  		if (!set)
>  			continue;
>  		fdt->close_on_exec[i] = 0;

If you don't bother, why leaving this final fdt->close_on_exec[i] = 0 ?



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/file.c b/fs/file.c
index c6986dce0334..93cfbcd450c3 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -231,7 +231,8 @@  repeat:
 
 static inline void __set_close_on_exec(int fd, struct fdtable *fdt)
 {
-	__set_bit(fd, fdt->close_on_exec);
+	if (!test_bit(fd, fdt->close_on_exec))
+		__set_bit(fd, fdt->close_on_exec);
 }
 
 static inline void __clear_close_on_exec(int fd, struct fdtable *fdt)
@@ -644,7 +645,6 @@  int __close_fd(struct files_struct *files, unsigned fd)
 	if (!file)
 		goto out_unlock;
 	rcu_assign_pointer(fdt->fd[fd], NULL);
-	__clear_close_on_exec(fd, fdt);
 	__put_unused_fd(files, fd);
 	spin_unlock(&files->file_lock);
 	return filp_close(file, files);
@@ -667,7 +667,7 @@  void do_close_on_exec(struct files_struct *files)
 		fdt = files_fdtable(files);
 		if (fd >= fdt->max_fds)
 			break;
-		set = fdt->close_on_exec[i];
+		set = fdt->close_on_exec[i] & fdt->open_fds[i];
 		if (!set)
 			continue;
 		fdt->close_on_exec[i] = 0;