diff mbox

Make file struct available to fchmod FS handlers.

Message ID 20161102225340.11613-1-jabolopes@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jose Lopes Nov. 2, 2016, 10:53 p.m. UTC
Syscall 'ftruncate' makes the 'file' struct available to filesystem
handlers. This makes it possible, e.g., for filesystems, such as,
FUSE, to access the file handle associated with the file descriptor
that was passed to 'ftruncate'. In the specific case of FUSE, this
also makes it possible for (userspace) FUSE-based filesystems to
distinguish between calls to 'truncate' and 'ftruncate'.

From an implementation point of view, this is possible because the
'ftruncate' syscall passes the 'file' struct to the underlying
filesystem handlers via the 'ia_file' field and the 'ia_valid' field
mask.

Similarly to 'ftruncate', pass the 'file' struct to 'fchmod', which
allows filesystem handlers to get the file handle associated with the
file descriptor passed to 'fchmod' and allows FUSE-based filesystems
to distinguish between calls to 'chmod' and 'fchmod'.

In a future patch, make a similar change to the 'fchown' and
'futimens' syscalls.

Signed-off-by: Jose Lopes <jabolopes@gmail.com>
---
 fs/open.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Al Viro Nov. 3, 2016, 12:59 a.m. UTC | #1
On Wed, Nov 02, 2016 at 11:53:40PM +0100, Jose Lopes wrote:
> Syscall 'ftruncate' makes the 'file' struct available to filesystem
> handlers. This makes it possible, e.g., for filesystems, such as,
> FUSE, to access the file handle associated with the file descriptor
> that was passed to 'ftruncate'. In the specific case of FUSE, this
> also makes it possible for (userspace) FUSE-based filesystems to
> distinguish between calls to 'truncate' and 'ftruncate'.

Why FUSE is such a precious snowflake that it needs to make that distinction,
unlike all other filesystems?

> In a future patch, make a similar change to the 'fchown' and
> 'futimens' syscalls.

I'm thucking frilled...

NAK, unless you can give a better reason than "what if somebody would want
that piece of information?".
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jean-Pierre André Nov. 3, 2016, 8:22 a.m. UTC | #2
Al Viro wrote:
> On Wed, Nov 02, 2016 at 11:53:40PM +0100, Jose Lopes wrote:
>> Syscall 'ftruncate' makes the 'file' struct available to filesystem
>> handlers. This makes it possible, e.g., for filesystems, such as,
>> FUSE, to access the file handle associated with the file descriptor
>> that was passed to 'ftruncate'. In the specific case of FUSE, this
>> also makes it possible for (userspace) FUSE-based filesystems to
>> distinguish between calls to 'truncate' and 'ftruncate'.
>
> Why FUSE is such a precious snowflake that it needs to make that distinction,
> unlike all other filesystems?

For fuse file system which delegate the permission checks
to user space (and have to do so because of cacheing
issues), the write permission has to be checked for
truncate(), and not checked for ftruncate() : the file
may have been opened for writing and then its permissions
set to read-only before the ftruncate() is requested.
The user space file system can check current permissions,
not the ones which were set when the file was opened.

Jean-Pierre
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolaus Rath Nov. 3, 2016, 3:22 p.m. UTC | #3
On Nov 03 2016, Al Viro <viro@ZenIV.linux.org.uk> wrote:
> On Wed, Nov 02, 2016 at 11:53:40PM +0100, Jose Lopes wrote:
>> Syscall 'ftruncate' makes the 'file' struct available to filesystem
>> handlers. This makes it possible, e.g., for filesystems, such as,
>> FUSE, to access the file handle associated with the file descriptor
>> that was passed to 'ftruncate'. In the specific case of FUSE, this
>> also makes it possible for (userspace) FUSE-based filesystems to
>> distinguish between calls to 'truncate' and 'ftruncate'.
>
> Why FUSE is such a precious snowflake that it needs to make that distinction,
> unlike all other filesystems?

FUSE filesystems are often used as an extra layer on top of another
filesystem (that stores the actual data). If the user opens a file (in
the fuse filesystem), deletes it, and then truncates it, FUSE currently
cannot do the same operation in the underlying filesystem: it receives
a truncate() call with the inode, but there is no syscall that allows
truncation for an inode. If FUSE had access to the file handle, it can
use that to store a file descriptor for the file on the underlying
filesystem and use ftruncate.

Best,
-Nikolaus
Nikolaus Rath Nov. 7, 2016, 4:51 a.m. UTC | #4
On Nov 03 2016, Nikolaus Rath <Nikolaus-BTH8mxji4b0@public.gmane.org> wrote:
> On Nov 03 2016, Al Viro <viro@ZenIV.linux.org.uk> wrote:
>> On Wed, Nov 02, 2016 at 11:53:40PM +0100, Jose Lopes wrote:
>>> Syscall 'ftruncate' makes the 'file' struct available to filesystem
>>> handlers. This makes it possible, e.g., for filesystems, such as,
>>> FUSE, to access the file handle associated with the file descriptor
>>> that was passed to 'ftruncate'. In the specific case of FUSE, this
>>> also makes it possible for (userspace) FUSE-based filesystems to
>>> distinguish between calls to 'truncate' and 'ftruncate'.
>>
>> Why FUSE is such a precious snowflake that it needs to make that distinction,
>> unlike all other filesystems?
>
> FUSE filesystems are often used as an extra layer on top of another
> filesystem (that stores the actual data). If the user opens a file (in
> the fuse filesystem), deletes it, and then truncates it, FUSE currently
> cannot do the same operation in the underlying filesystem: it receives
> a truncate() call with the inode, but there is no syscall that allows
> truncation for an inode. If FUSE had access to the file handle, it can
> use that to store a file descriptor for the file on the underlying
> filesystem and use ftruncate.

In case it wasn't clear: for the case of ftruncate, fuse *does* have
access to the file handle so the problem does not arise. For the case of
fchmod, this is currently a problem (and the patch we're discussing
would fix it).

Best,
-Nikolaus
Nikolaus Rath Nov. 7, 2016, 5:25 a.m. UTC | #5
On Nov 03 2016, Jose Lopes <jabolopes@gmail.com> wrote:
> Syscall 'ftruncate' makes the 'file' struct available to filesystem
> handlers. This makes it possible, e.g., for filesystems, such as,
> FUSE, to access the file handle associated with the file descriptor
> that was passed to 'ftruncate'. In the specific case of FUSE, this
> also makes it possible for (userspace) FUSE-based filesystems to
> distinguish between calls to 'truncate' and 'ftruncate'.
>
> From an implementation point of view, this is possible because the
> 'ftruncate' syscall passes the 'file' struct to the underlying
> filesystem handlers via the 'ia_file' field and the 'ia_valid' field
> mask.
>
> Similarly to 'ftruncate', pass the 'file' struct to 'fchmod', which
> allows filesystem handlers to get the file handle associated with the
> file descriptor passed to 'fchmod' and allows FUSE-based filesystems
> to distinguish between calls to 'chmod' and 'fchmod'.
>
> In a future patch, make a similar change to the 'fchown' and
> 'futimens' syscalls.
>
> Signed-off-by: Jose Lopes <jabolopes@gmail.com>

Tested-by: Nikolaus Rath <Nikolaus@rath.org>



Best,
-Nikolaus
Jose Lopes Nov. 9, 2016, 5:54 p.m. UTC | #6
Hi Al,

Can you please take another look?

Thanks,
Jose

On Mon, Nov 7, 2016 at 5:51 AM, Nikolaus Rath <Nikolaus@rath.org> wrote:
> On Nov 03 2016, Nikolaus Rath <Nikolaus-BTH8mxji4b0@public.gmane.org> wrote:
>> On Nov 03 2016, Al Viro <viro@ZenIV.linux.org.uk> wrote:
>>> On Wed, Nov 02, 2016 at 11:53:40PM +0100, Jose Lopes wrote:
>>>> Syscall 'ftruncate' makes the 'file' struct available to filesystem
>>>> handlers. This makes it possible, e.g., for filesystems, such as,
>>>> FUSE, to access the file handle associated with the file descriptor
>>>> that was passed to 'ftruncate'. In the specific case of FUSE, this
>>>> also makes it possible for (userspace) FUSE-based filesystems to
>>>> distinguish between calls to 'truncate' and 'ftruncate'.
>>>
>>> Why FUSE is such a precious snowflake that it needs to make that distinction,
>>> unlike all other filesystems?
>>
>> FUSE filesystems are often used as an extra layer on top of another
>> filesystem (that stores the actual data). If the user opens a file (in
>> the fuse filesystem), deletes it, and then truncates it, FUSE currently
>> cannot do the same operation in the underlying filesystem: it receives
>> a truncate() call with the inode, but there is no syscall that allows
>> truncation for an inode. If FUSE had access to the file handle, it can
>> use that to store a file descriptor for the file on the underlying
>> filesystem and use ftruncate.
>
> In case it wasn't clear: for the case of ftruncate, fuse *does* have
> access to the file handle so the problem does not arise. For the case of
> fchmod, this is currently a problem (and the patch we're discussing
> would fix it).
>
> Best,
> -Nikolaus
>
> --
> GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
> Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>
>              »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Nov. 17, 2016, 5:44 p.m. UTC | #7
Jose Lopes <jabolopes@gmail.com> writes:

> Hi,
>
> On Thu, Nov 3, 2016 at 9:22 AM Jean-Pierre André <jean-pierre.andre@wanadoo.fr> wrote:
>
>  Al Viro wrote:
>  > On Wed, Nov 02, 2016 at 11:53:40PM +0100, Jose Lopes wrote:
>  >> Syscall 'ftruncate' makes the 'file' struct available to filesystem
>  >> handlers. This makes it possible, e.g., for filesystems, such as,
>  >> FUSE, to access the file handle associated with the file descriptor
>  >> that was passed to 'ftruncate'. In the specific case of FUSE, this
>  >> also makes it possible for (userspace) FUSE-based filesystems to
>  >> distinguish between calls to 'truncate' and 'ftruncate'.
>  >
>  > Why FUSE is such a precious snowflake that it needs to make that distinction,
>  > unlike all other filesystems?
>
>  For fuse file system which delegate the permission checks
>  to user space (and have to do so because of cacheing
>  issues), the write permission has to be checked for
>  truncate(), and not checked for ftruncate() : the file
>  may have been opened for writing and then its permissions
>  set to read-only before the ftruncate() is requested.
>  The user space file system can check current permissions,
>  not the ones which were set when the file was opened.
>
> +1 what Jean-Pierre said.
>
> Also, I work on a FUSE-based network filesystem and the fact that we cannot
> distinguish between calls to fchmod and chmod produces incorrect results.
> For example, in the cases where a file was unlinked or moved, calling fchmod
> should apply the change directly in the open file. However, since the fchmod
> call arrives to FUSE as chmod (because of the missing file handle), FUSE will
> try to resolve the path to get to the open file, which fails because the file was
> moved or unlinked, or it will apply the change to the wrong file if in the meantime
> another file was open under the same path of the previous file.

I read through this and I agree with Al.  Semantically ftruncate needs
the file handle to operate correctly.  Semantically fchmod does not need
the file handle.  The file handle to fchmod is just a way to pass it the
specific inode.

Given that a file handle exists presumably userspace has state cached
for this file already.  So a lookup by inode in the userspace
filesystems data structures should get the job done.

Beyond that the kernel does have interfaces for dealing with things like
this if you don't want to have lots and lots of open files in userspace.
These are the system calls name_to_handle_at and open_by_handle_at.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolaus Rath Nov. 17, 2016, 6:39 p.m. UTC | #8
On Nov 17 2016, ebiederm@xmission.com (Eric W. Biederman) wrote:
> Jose Lopes <jabolopes@gmail.com> writes:
>
>> Hi,
>>
>> On Thu, Nov 3, 2016 at 9:22 AM Jean-Pierre André <jean-pierre.andre@wanadoo.fr> wrote:
>>
>>  Al Viro wrote:
>>  > On Wed, Nov 02, 2016 at 11:53:40PM +0100, Jose Lopes wrote:
>>  >> Syscall 'ftruncate' makes the 'file' struct available to filesystem
>>  >> handlers. This makes it possible, e.g., for filesystems, such as,
>>  >> FUSE, to access the file handle associated with the file descriptor
>>  >> that was passed to 'ftruncate'. In the specific case of FUSE, this
>>  >> also makes it possible for (userspace) FUSE-based filesystems to
>>  >> distinguish between calls to 'truncate' and 'ftruncate'.
>>  >
>>  > Why FUSE is such a precious snowflake that it needs to make that distinction,
>>  > unlike all other filesystems?
>>
>>  For fuse file system which delegate the permission checks
>>  to user space (and have to do so because of cacheing
>>  issues), the write permission has to be checked for
>>  truncate(), and not checked for ftruncate() : the file
>>  may have been opened for writing and then its permissions
>>  set to read-only before the ftruncate() is requested.
>>  The user space file system can check current permissions,
>>  not the ones which were set when the file was opened.
>>
>> +1 what Jean-Pierre said.
>>
>> Also, I work on a FUSE-based network filesystem and the fact that we cannot
>> distinguish between calls to fchmod and chmod produces incorrect results.
>> For example, in the cases where a file was unlinked or moved, calling fchmod
>> should apply the change directly in the open file. However, since the fchmod
>> call arrives to FUSE as chmod (because of the missing file handle), FUSE will
>> try to resolve the path to get to the open file, which fails because the file was
>> moved or unlinked, or it will apply the change to the wrong file if in the meantime
>> another file was open under the same path of the previous file.
>
> I read through this and I agree with Al.  Semantically ftruncate needs
> the file handle to operate correctly.  Semantically fchmod does not need
> the file handle.  The file handle to fchmod is just a way to pass it the
> specific inode.

Could you explain this in more detail? What does ftruncate need the file
handle for other than to obtain the inode?

> Given that a file handle exists presumably userspace has state cached
> for this file already.  So a lookup by inode in the userspace
> filesystems data structures should get the job done.

True. But passing the information from the kernel is just copying some
bytes around, obtaining it in userspace would mean a hash table lookup
for every request (including those that don't have a file handle).

I presume this is the reason why ftruncate gets the information from the
kernel (it could also just do lookup by inode). Why doesn't the same
argument apply to eg fchmod?

Best,
-Nikolaus
Eric W. Biederman Nov. 17, 2016, 7:20 p.m. UTC | #9
Nikolaus Rath <Nikolaus@rath.org> writes:

> On Nov 17 2016, ebiederm@xmission.com (Eric W. Biederman) wrote:
>> Jose Lopes <jabolopes@gmail.com> writes:
>>
>>> Hi,
>>>
>>> On Thu, Nov 3, 2016 at 9:22 AM Jean-Pierre André <jean-pierre.andre@wanadoo.fr> wrote:
>>>
>>>  Al Viro wrote:
>>>  > On Wed, Nov 02, 2016 at 11:53:40PM +0100, Jose Lopes wrote:
>>>  >> Syscall 'ftruncate' makes the 'file' struct available to filesystem
>>>  >> handlers. This makes it possible, e.g., for filesystems, such as,
>>>  >> FUSE, to access the file handle associated with the file descriptor
>>>  >> that was passed to 'ftruncate'. In the specific case of FUSE, this
>>>  >> also makes it possible for (userspace) FUSE-based filesystems to
>>>  >> distinguish between calls to 'truncate' and 'ftruncate'.
>>>  >
>>>  > Why FUSE is such a precious snowflake that it needs to make that distinction,
>>>  > unlike all other filesystems?
>>>
>>>  For fuse file system which delegate the permission checks
>>>  to user space (and have to do so because of cacheing
>>>  issues), the write permission has to be checked for
>>>  truncate(), and not checked for ftruncate() : the file
>>>  may have been opened for writing and then its permissions
>>>  set to read-only before the ftruncate() is requested.
>>>  The user space file system can check current permissions,
>>>  not the ones which were set when the file was opened.
>>>
>>> +1 what Jean-Pierre said.
>>>
>>> Also, I work on a FUSE-based network filesystem and the fact that we cannot
>>> distinguish between calls to fchmod and chmod produces incorrect results.
>>> For example, in the cases where a file was unlinked or moved, calling fchmod
>>> should apply the change directly in the open file. However, since the fchmod
>>> call arrives to FUSE as chmod (because of the missing file handle), FUSE will
>>> try to resolve the path to get to the open file, which fails because the file was
>>> moved or unlinked, or it will apply the change to the wrong file if in the meantime
>>> another file was open under the same path of the previous file.
>>
>> I read through this and I agree with Al.  Semantically ftruncate needs
>> the file handle to operate correctly.  Semantically fchmod does not need
>> the file handle.  The file handle to fchmod is just a way to pass it the
>> specific inode.
>
> Could you explain this in more detail? What does ftruncate need the file
> handle for other than to obtain the inode?

ftruncate requires the file to be opened for writing.

>> Given that a file handle exists presumably userspace has state cached
>> for this file already.  So a lookup by inode in the userspace
>> filesystems data structures should get the job done.
>
> True. But passing the information from the kernel is just copying some
> bytes around, obtaining it in userspace would mean a hash table lookup
> for every request (including those that don't have a file handle).
>
> I presume this is the reason why ftruncate gets the information from the
> kernel (it could also just do lookup by inode). Why doesn't the same
> argument apply to eg fchmod?

fchmod does not require the file to be opened for writing.


There might be an argument for better tokens between fuse and the kernel
for inodes, but that is another story.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stef Bon Nov. 17, 2016, 11:03 p.m. UTC | #10
2016-11-17 20:20 GMT+01:00 Eric W. Biederman <ebiederm@xmission.com>:
> Nikolaus Rath <Nikolaus@rath.org> writes:
>

>
> ftruncate requires the file to be opened for writing.
>
>
> fchmod does not require the file to be opened for writing.
>

Makes sense of course. ftruncate is acting on the file, fchmod does
work on the attributes (inode information).

Stef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/open.c b/fs/open.c
index d3ed817..00214c5 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -516,7 +516,8 @@  SYSCALL_DEFINE1(chroot, const char __user *, filename)
 	return error;
 }
 
-static int chmod_common(const struct path *path, umode_t mode)
+static int chmod_common(const struct path *path, umode_t mode,
+	struct file *filp)
 {
 	struct inode *inode = path->dentry->d_inode;
 	struct inode *delegated_inode = NULL;
@@ -533,6 +534,10 @@  static int chmod_common(const struct path *path, umode_t mode)
 		goto out_unlock;
 	newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
 	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
+	if (filp) {
+		newattrs.ia_file = filp;
+		newattrs.ia_valid |= ATTR_FILE;
+	}
 	error = notify_change(path->dentry, &newattrs, &delegated_inode);
 out_unlock:
 	inode_unlock(inode);
@@ -552,7 +557,7 @@  SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
 
 	if (f.file) {
 		audit_file(f.file);
-		err = chmod_common(&f.file->f_path, mode);
+		err = chmod_common(&f.file->f_path, mode, f.file);
 		fdput(f);
 	}
 	return err;
@@ -566,7 +571,7 @@  SYSCALL_DEFINE3(fchmodat, int, dfd, const char __user *, filename, umode_t, mode
 retry:
 	error = user_path_at(dfd, filename, lookup_flags, &path);
 	if (!error) {
-		error = chmod_common(&path, mode);
+		error = chmod_common(&path, mode, NULL);
 		path_put(&path);
 		if (retry_estale(error, lookup_flags)) {
 			lookup_flags |= LOOKUP_REVAL;