Message ID | 155800752418.4037.9567789434648701032.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
Headers | show |
Series | uapi, vfs: Change the mount API UAPI [ver #2] | expand |
On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote: > > Hi Linus, Al, > > Here are some patches that make changes to the mount API UAPI and two of > them really need applying, before -rc1 - if they're going to be applied at > all. I'm fine with 2--4, but I'm not convinced that cloexec-by-default crusade makes any sense. Could somebody give coherent arguments in favour of abandoning the existing conventions?
On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote: > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote: > > > > Hi Linus, Al, > > > > Here are some patches that make changes to the mount API UAPI and two of > > them really need applying, before -rc1 - if they're going to be applied at > > all. > > I'm fine with 2--4, but I'm not convinced that cloexec-by-default crusade > makes any sense. Could somebody give coherent arguments in favour of > abandoning the existing conventions? To elaborate: existing syscalls (open, socket, pipe, accept, epoll_create, etc., etc.) are not cloexec-by-default and that's not going to change, simply because it would be break the living hell out of existing userland code. IOW, the userland has to worry about leaking stuff over sensitive execve(), no matter what. All this change does is complicate things for userland programmer - which syscall belongs to which class. Where's the benefit? I could buy an argument about gradually changing over to APIs that are cloexec-by-default across the board, except for the obvious fact that it's not going to happen; not with the things like open() involved.
On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote: > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote: > > > > Hi Linus, Al, > > > > Here are some patches that make changes to the mount API UAPI and two of > > them really need applying, before -rc1 - if they're going to be applied at > > all. > > I'm fine with 2--4, but I'm not convinced that cloexec-by-default crusade > makes any sense. Could somebody give coherent arguments in favour of > abandoning the existing conventions? So as I said in the commit message. From a userspace perspective it's more of an issue if one accidently leaks an fd to a task during exec. Also, most of the time one does not want to inherit an fd during an exec. It is a hazzle to always have to specify an extra flag. As Al pointed out to me open() semantics are not going anywhere. Sure, no argument there at all. But the idea of making fds cloexec by default is only targeted at fds that come from separate syscalls. fsopen(), open_tree_clone(), etc. they all return fds independent of open() so it's really easy to have them cloexec by default without regressing anyone and we also remove the need for a bunch of separate flags for each syscall to turn them into cloexec-fds. I mean, those for syscalls came with 4 separate flags to be able to specify that the returned fd should be made cloexec. The other way around, cloexec by default, fcntl() to remove the cloexec bit is way saner imho. Christian
[linux-abi cc'd] On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote: > On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote: > > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote: > > > > > > Hi Linus, Al, > > > > > > Here are some patches that make changes to the mount API UAPI and two of > > > them really need applying, before -rc1 - if they're going to be applied at > > > all. > > > > I'm fine with 2--4, but I'm not convinced that cloexec-by-default crusade > > makes any sense. Could somebody give coherent arguments in favour of > > abandoning the existing conventions? > > So as I said in the commit message. From a userspace perspective it's > more of an issue if one accidently leaks an fd to a task during exec. > > Also, most of the time one does not want to inherit an fd during an > exec. It is a hazzle to always have to specify an extra flag. > > As Al pointed out to me open() semantics are not going anywhere. Sure, > no argument there at all. > But the idea of making fds cloexec by default is only targeted at fds > that come from separate syscalls. fsopen(), open_tree_clone(), etc. they > all return fds independent of open() so it's really easy to have them > cloexec by default without regressing anyone and we also remove the need > for a bunch of separate flags for each syscall to turn them into > cloexec-fds. I mean, those for syscalls came with 4 separate flags to be > able to specify that the returned fd should be made cloexec. The other > way around, cloexec by default, fcntl() to remove the cloexec bit is way > saner imho. Re separate flags - it is, in principle, a valid argument. OTOH, I'm not sure if they need to be separate - they all have the same value and I don't see any reason for that to change... Only tangentially related, but I wonder if something like close_range(from, to) would be a more useful approach... That kind of open-coded loops is not rare in userland and kernel-side code can do them much cheaper. Something like /* that exec is sensitive */ unshare(CLONE_FILES); /* we don't want anything past stderr here */ close_range(3, ~0U); execve(....); on the userland side of thing. Comments?
On May 16, 2019 6:50:22 PM GMT+02:00, Al Viro <viro@zeniv.linux.org.uk> wrote: >[linux-abi cc'd] > >On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote: >> On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote: >> > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote: >> > > >> > > Hi Linus, Al, >> > > >> > > Here are some patches that make changes to the mount API UAPI and >two of >> > > them really need applying, before -rc1 - if they're going to be >applied at >> > > all. >> > >> > I'm fine with 2--4, but I'm not convinced that cloexec-by-default >crusade >> > makes any sense. Could somebody give coherent arguments in favour >of >> > abandoning the existing conventions? >> >> So as I said in the commit message. From a userspace perspective it's >> more of an issue if one accidently leaks an fd to a task during exec. >> >> Also, most of the time one does not want to inherit an fd during an >> exec. It is a hazzle to always have to specify an extra flag. >> >> As Al pointed out to me open() semantics are not going anywhere. >Sure, >> no argument there at all. >> But the idea of making fds cloexec by default is only targeted at fds >> that come from separate syscalls. fsopen(), open_tree_clone(), etc. >they >> all return fds independent of open() so it's really easy to have them >> cloexec by default without regressing anyone and we also remove the >need >> for a bunch of separate flags for each syscall to turn them into >> cloexec-fds. I mean, those for syscalls came with 4 separate flags to >be >> able to specify that the returned fd should be made cloexec. The >other >> way around, cloexec by default, fcntl() to remove the cloexec bit is >way >> saner imho. > >Re separate flags - it is, in principle, a valid argument. OTOH, I'm >not >sure if they need to be separate - they all have the same value and >I don't see any reason for that to change... > >Only tangentially related, but I wonder if something like >close_range(from, to) >would be a more useful approach... That kind of open-coded loops is >not >rare in userland and kernel-side code can do them much cheaper. >Something >like > /* that exec is sensitive */ > unshare(CLONE_FILES); > /* we don't want anything past stderr here */ > close_range(3, ~0U); > execve(....); >on the userland side of thing. Comments? Very much in favor of that! That'd be a neat new addition.
[looks like linux-abi is a typo, Cc'ed linux-api instead] On Thu, May 16, 2019 at 05:50:22PM +0100, Al Viro wrote: > [linux-abi cc'd] > > On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote: > > On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote: > > > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote: > > > > > > > > Hi Linus, Al, > > > > > > > > Here are some patches that make changes to the mount API UAPI and two of > > > > them really need applying, before -rc1 - if they're going to be applied at > > > > all. > > > > > > I'm fine with 2--4, but I'm not convinced that cloexec-by-default crusade > > > makes any sense. Could somebody give coherent arguments in favour of > > > abandoning the existing conventions? > > > > So as I said in the commit message. From a userspace perspective it's > > more of an issue if one accidently leaks an fd to a task during exec. > > > > Also, most of the time one does not want to inherit an fd during an > > exec. It is a hazzle to always have to specify an extra flag. > > > > As Al pointed out to me open() semantics are not going anywhere. Sure, > > no argument there at all. > > But the idea of making fds cloexec by default is only targeted at fds > > that come from separate syscalls. fsopen(), open_tree_clone(), etc. they > > all return fds independent of open() so it's really easy to have them > > cloexec by default without regressing anyone and we also remove the need > > for a bunch of separate flags for each syscall to turn them into > > cloexec-fds. I mean, those for syscalls came with 4 separate flags to be > > able to specify that the returned fd should be made cloexec. The other > > way around, cloexec by default, fcntl() to remove the cloexec bit is way > > saner imho. > > Re separate flags - it is, in principle, a valid argument. OTOH, I'm not > sure if they need to be separate - they all have the same value and > I don't see any reason for that to change... > > Only tangentially related, but I wonder if something like close_range(from, to) > would be a more useful approach... That kind of open-coded loops is not > rare in userland and kernel-side code can do them much cheaper. Something > like > /* that exec is sensitive */ > unshare(CLONE_FILES); > /* we don't want anything past stderr here */ > close_range(3, ~0U); > execve(....); > on the userland side of thing. Comments? glibc people need a syscall to implement closefrom properly, see https://sourceware.org/bugzilla/show_bug.cgi?id=10353#c14
On May 16, 2019 10:23:31 PM GMT+02:00, "Dmitry V. Levin" <ldv@altlinux.org> wrote: >[looks like linux-abi is a typo, Cc'ed linux-api instead] > >On Thu, May 16, 2019 at 05:50:22PM +0100, Al Viro wrote: >> [linux-abi cc'd] >> >> On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote: >> > On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote: >> > > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote: >> > > > >> > > > Hi Linus, Al, >> > > > >> > > > Here are some patches that make changes to the mount API UAPI >and two of >> > > > them really need applying, before -rc1 - if they're going to be >applied at >> > > > all. >> > > >> > > I'm fine with 2--4, but I'm not convinced that cloexec-by-default >crusade >> > > makes any sense. Could somebody give coherent arguments in >favour of >> > > abandoning the existing conventions? >> > >> > So as I said in the commit message. From a userspace perspective >it's >> > more of an issue if one accidently leaks an fd to a task during >exec. >> > >> > Also, most of the time one does not want to inherit an fd during an >> > exec. It is a hazzle to always have to specify an extra flag. >> > >> > As Al pointed out to me open() semantics are not going anywhere. >Sure, >> > no argument there at all. >> > But the idea of making fds cloexec by default is only targeted at >fds >> > that come from separate syscalls. fsopen(), open_tree_clone(), etc. >they >> > all return fds independent of open() so it's really easy to have >them >> > cloexec by default without regressing anyone and we also remove the >need >> > for a bunch of separate flags for each syscall to turn them into >> > cloexec-fds. I mean, those for syscalls came with 4 separate flags >to be >> > able to specify that the returned fd should be made cloexec. The >other >> > way around, cloexec by default, fcntl() to remove the cloexec bit >is way >> > saner imho. >> >> Re separate flags - it is, in principle, a valid argument. OTOH, I'm >not >> sure if they need to be separate - they all have the same value and >> I don't see any reason for that to change... >> >> Only tangentially related, but I wonder if something like >close_range(from, to) >> would be a more useful approach... That kind of open-coded loops is >not >> rare in userland and kernel-side code can do them much cheaper. >Something >> like >> /* that exec is sensitive */ >> unshare(CLONE_FILES); >> /* we don't want anything past stderr here */ >> close_range(3, ~0U); >> execve(....); >> on the userland side of thing. Comments? > >glibc people need a syscall to implement closefrom properly, see >https://sourceware.org/bugzilla/show_bug.cgi?id=10353#c14 I have a prototype for close_range(). I'll send it out after rc1. Christian
On May 16, 2019 6:50:22 PM GMT+02:00, Al Viro <viro@zeniv.linux.org.uk> wrote: >[linux-abi cc'd] > >On Thu, May 16, 2019 at 06:31:52PM +0200, Christian Brauner wrote: >> On Thu, May 16, 2019 at 05:22:59PM +0100, Al Viro wrote: >> > On Thu, May 16, 2019 at 12:52:04PM +0100, David Howells wrote: >> > > >> > > Hi Linus, Al, >> > > >> > > Here are some patches that make changes to the mount API UAPI and >two of >> > > them really need applying, before -rc1 - if they're going to be >applied at >> > > all. >> > >> > I'm fine with 2--4, but I'm not convinced that cloexec-by-default >crusade >> > makes any sense. Could somebody give coherent arguments in favour >of >> > abandoning the existing conventions? >> >> So as I said in the commit message. From a userspace perspective it's >> more of an issue if one accidently leaks an fd to a task during exec. >> >> Also, most of the time one does not want to inherit an fd during an >> exec. It is a hazzle to always have to specify an extra flag. >> >> As Al pointed out to me open() semantics are not going anywhere. >Sure, >> no argument there at all. >> But the idea of making fds cloexec by default is only targeted at fds >> that come from separate syscalls. fsopen(), open_tree_clone(), etc. >they >> all return fds independent of open() so it's really easy to have them >> cloexec by default without regressing anyone and we also remove the >need >> for a bunch of separate flags for each syscall to turn them into >> cloexec-fds. I mean, those for syscalls came with 4 separate flags to >be >> able to specify that the returned fd should be made cloexec. The >other >> way around, cloexec by default, fcntl() to remove the cloexec bit is >way >> saner imho. > >Re separate flags - it is, in principle, a valid argument. OTOH, I'm >not >sure if they need to be separate - they all have the same value and >I don't see any reason for that to change... One last thing I'd like to point out is that we already have syscalls and ioctls that return cloexec fds. So the consistency argument is kinda dead. If you still prefer to have cloexec flags for the 4 new syscalls then yes, if they could at least all have the same name (FSMOUNT_CLOEXEC?) that would be good. > >Only tangentially related, but I wonder if something like >close_range(from, to) >would be a more useful approach... That kind of open-coded loops is >not >rare in userland and kernel-side code can do them much cheaper. >Something >like > /* that exec is sensitive */ > unshare(CLONE_FILES); > /* we don't want anything past stderr here */ > close_range(3, ~0U); > execve(....); >on the userland side of thing. Comments? Said it before but, the list was mistyped so again: I think that's a great idea. I have a prototype for close_range(start, end, flags). I'll wait after rc1 and then send it out. Christian
Christian Brauner <christian@brauner.io> wrote: > If you still prefer to have cloexec flags > for the 4 new syscalls then yes, > if they could at least all have the same name > (FSMOUNT_CLOEXEC?) that would be good. They don't all have the same value (see OPEN_TREE_CLOEXEC). Note that I also don't want to blindly #define them to O_CLOEXEC because it's not necessarily the same value on all arches. Currently it can be 02000000, 010000000 or 0x400000 for instance, which means that if it's sharing a mask with other flags, at least three bits have to be reserved for it or we have to have arch-dependent bit juggling. One thing I like about your approach of just making them O_CLOEXEC by default and removing the constants is that it avoids this mess entirely. David
On Fri, May 17, 2019 at 9:13 AM David Howells <dhowells@redhat.com> wrote: > > Christian Brauner <christian@brauner.io> wrote: > > > If you still prefer to have cloexec flags > > for the 4 new syscalls then yes, > > if they could at least all have the same name > > (FSMOUNT_CLOEXEC?) that would be good. > > They don't all have the same value (see OPEN_TREE_CLOEXEC). > > Note that I also don't want to blindly #define them to O_CLOEXEC because it's > not necessarily the same value on all arches. Currently it can be 02000000, > 010000000 or 0x400000 for instance, which means that if it's sharing a mask > with other flags, at least three bits have to be reserved for it or we have to > have arch-dependent bit juggling. > > One thing I like about your approach of just making them O_CLOEXEC by default > and removing the constants is that it avoids this mess entirely. +1. Confusion caused by inconsistency of naming is going to hurt more than inconsistency of semantics wrt. open(2). Thanks, Miklos
On May 17, 2019 9:13:26 AM GMT+02:00, David Howells <dhowells@redhat.com> wrote: >Christian Brauner <christian@brauner.io> wrote: > >> If you still prefer to have cloexec flags >> for the 4 new syscalls then yes, >> if they could at least all have the same name >> (FSMOUNT_CLOEXEC?) that would be good. > >They don't all have the same value (see OPEN_TREE_CLOEXEC). > >Note that I also don't want to blindly #define them to O_CLOEXEC >because it's >not necessarily the same value on all arches. Currently it can be >02000000, >010000000 or 0x400000 for instance, which means that if it's sharing a >mask >with other flags, at least three bits have to be reserved for it or we >have to >have arch-dependent bit juggling. Ugh. Right, I forgot about that entirely. Christian > >One thing I like about your approach of just making them O_CLOEXEC by >default >and removing the constants is that it avoids this mess entirely. > >David