diff mbox

[RFC,8/9] vfs: Implement generic revoked file operations

Message ID m1prfj5qxp.fsf@fess.ebiederm.org (mailing list archive)
State RFC, archived
Headers show

Commit Message

Eric W. Biederman April 11, 2009, 12:13 p.m. UTC
revoked_file_ops is a set of file operations designed to be used
when a files backing store has been removed.

revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is
always ready for I/O and return -EIO from all other operations.

This is designed to allow userspace to gracefully file descriptors
that enter this unusable state.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/Makefile        |    2 +-
 fs/revoked_file.c  |  181 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |    2 +
 3 files changed, 184 insertions(+), 1 deletions(-)
 create mode 100644 fs/revoked_file.c

Comments

Jamie Lokier April 12, 2009, 6:56 p.m. UTC | #1
> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is
> always ready for I/O and return -EIO from all other operations.

I think read should return -EIO too.  If a program is reading from a
/proc file (say), and the thing it's reading suddenly disappears, EOF
gives the false impression that it's read to the end of formatted data
from that file and it can process the data as if it's complete, which
is wrong.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman April 12, 2009, 8:04 p.m. UTC | #2
Jamie Lokier <jamie@shareable.org> writes:

>> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is
>> always ready for I/O and return -EIO from all other operations.
>
> I think read should return -EIO too.  If a program is reading from a
> /proc file (say), and the thing it's reading suddenly disappears, EOF
> gives the false impression that it's read to the end of formatted data
> from that file and it can process the data as if it's complete, which
> is wrong.

Good point EIO is the current read return value for a removed proc file.

For closed pipes, and hung up ttys the read return value is 0, and from
my reading that is what bsd returns after a sys_revoke.

The reason I have f_op settable is because I never expected complete
agreement on the return codes, and because it makes auditing and spotting
this kind of thing easier.

I guess I should make two variations on revoked_file_ops then.  Say
eof_file_ops, eio_file_ops.  Identical except for their treatment of
reads.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamie Lokier April 12, 2009, 8:31 p.m. UTC | #3
Eric W. Biederman wrote:
> >> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is
> >> always ready for I/O and return -EIO from all other operations.
> >
> > I think read should return -EIO too.  If a program is reading from a
> > /proc file (say), and the thing it's reading suddenly disappears, EOF
> > gives the false impression that it's read to the end of formatted data
> > from that file and it can process the data as if it's complete, which
> > is wrong.
> 
> Good point EIO is the current read return value for a removed proc file.
> 
> For closed pipes, and hung up ttys the read return value is 0, and from
> my reading that is what bsd returns after a sys_revoke.

A few suggestions below.  Feel free to ignore them on account of the
basic revoking functionality being more important :-)

I'm not sure a revoked pipe should look like a normally closed one.
ECONNRESET?

For hung up ttys, I agree.  But where's the SIGHUP :-) You probably do
want the process using it to die if it's not handling SIGHUP, because
terminal-using processes don't always terminate themselves on EOF.

For things writing to a pipe or file, SIGPIPE may be appropriate in
addition to EIO, to avoid runaway processes.  Looks odd I know.  For
writing to a terminal, SIGHUP again.

> The reason I have f_op settable is because I never expected complete
> agreement on the return codes, and because it makes auditing and spotting
> this kind of thing easier.
>
> I guess I should make two variations on revoked_file_ops then.  Say
> eof_file_ops, eio_file_ops.  Identical except for their treatment of
> reads.

Fair enough.  It's good to have good defaults.  I'm not convinced
eof_file_ops is ever a good default.  sighup_file_ops and
sigpipe_file_ops maybe :-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman April 12, 2009, 8:54 p.m. UTC | #4
ebiederm@xmission.com (Eric W. Biederman) writes:

> Jamie Lokier <jamie@shareable.org> writes:
>
>>> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is
>>> always ready for I/O and return -EIO from all other operations.
>>
>> I think read should return -EIO too.  If a program is reading from a
>> /proc file (say), and the thing it's reading suddenly disappears, EOF
>> gives the false impression that it's read to the end of formatted data
>> from that file and it can process the data as if it's complete, which
>> is wrong.

I just thought about that some more and I am not convinced.

In general the current return values from proc after an I/O operation
are suspect.  seek returns -EINVAL instead of -EIO. poll returns
DEFAULT_POLLMASK (which doesn't set POLLERR).  So I am not convinced
that the existing proc return values on error are correct, and they
are recent additions so the historical precedent is not especially
large.

EOF does give the impression that you have read all of the data from
the /proc file, and that is in fact the case.  There is no more
data coming from that proc file.

That the data is stale is well know.

That the data is not atomic, anything that spans more than a single
read is not atomic.

So I don't see what returning EIO adds to the equation.  Perhaps
that your fragile user space string parser may break?

EOF gives a clear indication the application should stop reading
the data, because there is no more.

EIO only says that the was a problem.

I don't know of anything that depends on the rmmod behavior either
way.  But if we can get away with it I would like to use something
that is generally useful instead of something that only makes
sense in the context of proc.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jamie Lokier April 12, 2009, 9:02 p.m. UTC | #5
Eric W. Biederman wrote:
> I just thought about that some more and I am not convinced.
> 
> In general the current return values from proc after an I/O operation
> are suspect.  seek returns -EINVAL instead of -EIO. poll returns
> DEFAULT_POLLMASK (which doesn't set POLLERR).  So I am not convinced
> that the existing proc return values on error are correct, and they
> are recent additions so the historical precedent is not especially
> large.
> 
> EOF does give the impression that you have read all of the data from
> the /proc file, and that is in fact the case.  There is no more
> data coming from that proc file.
> 
> That the data is stale is well know.
> 
> That the data is not atomic, anything that spans more than a single
> read is not atomic.
> 
> So I don't see what returning EIO adds to the equation.  Perhaps
> that your fragile user space string parser may break?
> 
> EOF gives a clear indication the application should stop reading
> the data, because there is no more.
> 
> EIO only says that the was a problem.
> 
> I don't know of anything that depends on the rmmod behavior either
> way.  But if we can get away with it I would like to use something
> that is generally useful instead of something that only makes
> sense in the context of proc.

I'm not thinking of proc, really.  More thinking of applications: EOF
effectively means "whole file read without error - now do the next thing".

If a filesystem file is revoked (umount -f), you definitely want to
stop that Makefile which is copying a file from the unmounted
filesystem to a target file.  Otherwise you get inconsistent states
which can only occur as a result of this umount -f, something
Makefiles should never have to care about.

rmmod behaviour is not something any app should see normally.
Unexpected behaviour when files are oddly truncated (despite never
being written that way) is not "fragile user space".  So whatever it
returns, it should be some error code, imho.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman April 12, 2009, 9:53 p.m. UTC | #6
Jamie Lokier <jamie@shareable.org> writes:

> Eric W. Biederman wrote:
>> >> revoked_file_ops return 0 from reads (aka EOF). Tell poll the file is
>> >> always ready for I/O and return -EIO from all other operations.
>> >
>> > I think read should return -EIO too.  If a program is reading from a
>> > /proc file (say), and the thing it's reading suddenly disappears, EOF
>> > gives the false impression that it's read to the end of formatted data
>> > from that file and it can process the data as if it's complete, which
>> > is wrong.
>> 
>> Good point EIO is the current read return value for a removed proc file.
>> 
>> For closed pipes, and hung up ttys the read return value is 0, and from
>> my reading that is what bsd returns after a sys_revoke.
>
> A few suggestions below.  Feel free to ignore them on account of the
> basic revoking functionality being more important :-)

I think I will.  This seems to be the part of the code that is easily
approachable and it is going to be easy to have different opinions on,
and there is no one right answer.

For now I'm just going to pick my best understanding of what BSD did.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman April 12, 2009, 11:06 p.m. UTC | #7
Jamie Lokier <jamie@shareable.org> writes:

> Eric W. Biederman wrote:
>> I just thought about that some more and I am not convinced.
>> 
>> In general the current return values from proc after an I/O operation
>> are suspect.  seek returns -EINVAL instead of -EIO. poll returns
>> DEFAULT_POLLMASK (which doesn't set POLLERR).  So I am not convinced
>> that the existing proc return values on error are correct, and they
>> are recent additions so the historical precedent is not especially
>> large.
>> 
>> EOF does give the impression that you have read all of the data from
>> the /proc file, and that is in fact the case.  There is no more
>> data coming from that proc file.
>> 
>> That the data is stale is well know.
>> 
>> That the data is not atomic, anything that spans more than a single
>> read is not atomic.
>> 
>> So I don't see what returning EIO adds to the equation.  Perhaps
>> that your fragile user space string parser may break?
>> 
>> EOF gives a clear indication the application should stop reading
>> the data, because there is no more.
>> 
>> EIO only says that the was a problem.
>> 
>> I don't know of anything that depends on the rmmod behavior either
>> way.  But if we can get away with it I would like to use something
>> that is generally useful instead of something that only makes
>> sense in the context of proc.
>
> I'm not thinking of proc, really.  More thinking of applications: EOF
> effectively means "whole file read without error - now do the next thing".
>
> If a filesystem file is revoked (umount -f), you definitely want to
> stop that Makefile which is copying a file from the unmounted
> filesystem to a target file.  Otherwise you get inconsistent states
> which can only occur as a result of this umount -f, something
> Makefiles should never have to care about.
>
> rmmod behaviour is not something any app should see normally.
> Unexpected behaviour when files are oddly truncated (despite never
> being written that way) is not "fragile user space".  So whatever it
> returns, it should be some error code, imho.

Well I just took a look at NetBSD 4.0.1 and it appears they agree with
you.

Plus I'm starting to feel a lot better about the linux manual pages,
as the revoke(2) man pages from the BSDs describe different error
codes than the implementation, and they fail to mention revoke appears
to work on ordinary files as well.

If the file is not a tty EIO is returned from read.

opens return ENXIO
writes return EIO
ioctl returns EBADF
close returns 0

Operations that just lookup the vnode simply return EBADF.

I don't know if that is perfectly correct for the linux case.  EBADF 
usually means the file descriptor specified isn't open.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/Makefile b/fs/Makefile
index af6d047..7787ddd 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -11,7 +11,7 @@  obj-y :=	open.o read_write.o file_table.o super.o \
 		attr.o bad_inode.o file.o filesystems.o namespace.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o drop_caches.o splice.o sync.o utimes.o \
-		stack.o fs_struct.o
+		stack.o fs_struct.o revoked_file.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
diff --git a/fs/revoked_file.c b/fs/revoked_file.c
new file mode 100644
index 0000000..9936693
--- /dev/null
+++ b/fs/revoked_file.c
@@ -0,0 +1,181 @@ 
+/*
+ *  linux/fs/revoked_file.c
+ *
+ *  Copyright (C) 1997, Stephen Tweedie
+ *
+ *  Provide stub functions for unreadable inodes
+ *
+ *  Fabian Frederick : August 2003 - All file operations assigned to EIO
+ *
+ *  Eric Biederman : 8 April 2008 - Derivied from bad_inode.c
+ */
+
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/stat.h>
+#include <linux/time.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+
+static loff_t revoked_file_llseek(struct file *file, loff_t offset, int origin)
+{
+	return -EIO;
+}
+
+static ssize_t revoked_file_read(struct file *filp, char __user *buf,
+			size_t size, loff_t *ppos)
+{
+        return 0;
+}
+
+static ssize_t revoked_file_write(struct file *filp, const char __user *buf,
+			size_t siz, loff_t *ppos)
+{
+        return -EIO;
+}
+
+static ssize_t revoked_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
+			unsigned long nr_segs, loff_t pos)
+{
+	return 0;
+}
+
+static ssize_t revoked_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
+			unsigned long nr_segs, loff_t pos)
+{
+	return -EIO;
+}
+
+static int revoked_file_readdir(struct file *filp, void *dirent, filldir_t filldir)
+{
+	return -EIO;
+}
+
+static unsigned int revoked_file_poll(struct file *filp, poll_table *wait)
+{
+	return POLLIN | POLLOUT | POLLERR | POLLRDNORM | POLLWRNORM;
+}
+
+static int revoked_file_ioctl (struct inode *inode, struct file *filp,
+			unsigned int cmd, unsigned long arg)
+{
+	return -EIO;
+}
+
+static long revoked_file_unlocked_ioctl(struct file *file, unsigned cmd,
+			unsigned long arg)
+{
+	return -EIO;
+}
+
+static long revoked_file_compat_ioctl(struct file *file, unsigned int cmd,
+			unsigned long arg)
+{
+	return -EIO;
+}
+
+static int revoked_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	return -EIO;
+}
+
+static int revoked_file_open(struct inode *inode, struct file *filp)
+{
+	return -EIO;
+}
+
+static int revoked_file_flush(struct file *file, fl_owner_t id)
+{
+	return 0;
+}
+
+static int revoked_file_release(struct inode *inode, struct file *filp)
+{
+	return 0;
+}
+
+static int revoked_file_fsync(struct file *file, struct dentry *dentry,
+			int datasync)
+{
+	return -EIO;
+}
+
+static int revoked_file_aio_fsync(struct kiocb *iocb, int datasync)
+{
+	return -EIO;
+}
+
+static int revoked_file_fasync(int fd, struct file *filp, int on)
+{
+	return -EIO;
+}
+
+static int revoked_file_lock(struct file *file, int cmd, struct file_lock *fl)
+{
+	return -EIO;
+}
+
+static ssize_t revoked_file_sendpage(struct file *file, struct page *page,
+			int off, size_t len, loff_t *pos, int more)
+{
+	return -EIO;
+}
+
+static unsigned long revoked_file_get_unmapped_area(struct file *file,
+				unsigned long addr, unsigned long len,
+				unsigned long pgoff, unsigned long flags)
+{
+	return -EIO;
+}
+
+static int revoked_file_check_flags(int flags)
+{
+	return -EIO;
+}
+
+static int revoked_file_flock(struct file *filp, int cmd, struct file_lock *fl)
+{
+	return -EIO;
+}
+
+static ssize_t revoked_file_splice_write(struct pipe_inode_info *pipe,
+			struct file *out, loff_t *ppos, size_t len,
+			unsigned int flags)
+{
+	return -EIO;
+}
+
+static ssize_t revoked_file_splice_read(struct file *in, loff_t *ppos,
+			struct pipe_inode_info *pipe, size_t len,
+			unsigned int flags)
+{
+	return -EIO;
+}
+
+const struct file_operations revoked_file_ops =
+{
+	.llseek		= revoked_file_llseek,
+	.read		= revoked_file_read,
+	.write		= revoked_file_write,
+	.aio_read	= revoked_file_aio_read,
+	.aio_write	= revoked_file_aio_write,
+	.readdir	= revoked_file_readdir,
+	.poll		= revoked_file_poll,
+	.ioctl		= revoked_file_ioctl,
+	.unlocked_ioctl	= revoked_file_unlocked_ioctl,
+	.compat_ioctl	= revoked_file_compat_ioctl,
+	.mmap		= revoked_file_mmap,
+	.open		= revoked_file_open,
+	.flush		= revoked_file_flush,
+	.release	= revoked_file_release,
+	.fsync		= revoked_file_fsync,
+	.aio_fsync	= revoked_file_aio_fsync,
+	.fasync		= revoked_file_fasync,
+	.lock		= revoked_file_lock,
+	.sendpage	= revoked_file_sendpage,
+	.get_unmapped_area = revoked_file_get_unmapped_area,
+	.check_flags	= revoked_file_check_flags,
+	.flock		= revoked_file_flock,
+	.splice_write	= revoked_file_splice_write,
+	.splice_read	= revoked_file_splice_read,
+};
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a82a2ea..2fb0871 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -896,6 +896,8 @@  extern int fops_substitute(struct file *file, const struct file_operations *f_op
 extern void inode_fops_substitute(struct inode *inode,
 	const struct file_operations *f_op, struct vm_operations_struct *vm_ops);
 
+extern const struct file_operations revoked_file_ops;
+
 extern struct mutex files_lock;
 #define file_list_lock() mutex_lock(&files_lock);
 #define file_list_unlock() mutex_unlock(&files_lock);