diff mbox

[1/4] fs: split update_time() into update_time() and write_time()

Message ID 1416599964-21892-2-git-send-email-tytso@mit.edu (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Theodore Ts'o Nov. 21, 2014, 7:59 p.m. UTC
In preparation for adding support for the lazytime mount option, we
need to be able to separate out the update_time() and write_time()
inode operations.  Currently, only btrfs and xfs uses update_time().

We needed to preserve update_time() because btrfs wants to have a
special btrfs_root_readonly() check; otherwise we could drop the
update_time() inode operation entirely.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: xfs@oss.sgi.com
Cc: linux-btrfs@vger.kernel.org
---
 fs/btrfs/inode.c   | 10 ++++++++++
 fs/inode.c         | 29 ++++++++++++++++++-----------
 fs/xfs/xfs_iops.c  | 39 ++++++++++++++++-----------------------
 include/linux/fs.h |  1 +
 4 files changed, 45 insertions(+), 34 deletions(-)

Comments

Chris Mason Nov. 21, 2014, 8:08 p.m. UTC | #1
On Fri, Nov 21, 2014 at 2:59 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> In preparation for adding support for the lazytime mount option, we
> need to be able to separate out the update_time() and write_time()
> inode operations.  Currently, only btrfs and xfs uses update_time().
> 
> We needed to preserve update_time() because btrfs wants to have a
> special btrfs_root_readonly() check; otherwise we could drop the
> update_time() inode operation entirely.

No objections here, I'll give the queue a whirl.  You can add my 
acked-by-or-Ill-fix-whatever-breaks

I look forward to the patch that makes us all lazy by default.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 21, 2014, 9:42 p.m. UTC | #2
Out of curiosity, why does btrfs_update_time() need to call
btrfs_root_readonly()?  Why can't it just depend on the
__mnt_want_write() call in touch_atime()?

Surely if there are times when it's not OK to write into a btrfs file
system and mnt_is_readonly() returns false, the VFS is going to get
very confused abyway.

If the btrfs_update_time() is not necessary, then we could drop
btrfs_update_time() and update_time() from the inode operations
entirely, and depend on the VFS-level code in update_time().

	      	     	    	      	   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Nov. 24, 2014, 3:21 p.m. UTC | #3
On Fri, Nov 21, 2014 at 02:59:21PM -0500, Theodore Ts'o wrote:
> We needed to preserve update_time() because btrfs wants to have a
> special btrfs_root_readonly() check; otherwise we could drop the
> update_time() inode operation entirely.

Can't btrfs just set the immutable flag on every inode that is read
when the root has the BTRFS_ROOT_SUBVOL_RDONLY flag?  That would
cut down the places that need this check to the ioctl path so that
we prevent users from clearling the immutable flag.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 24, 2014, 3:56 p.m. UTC | #4
On Mon, Nov 24, 2014 at 07:21:01AM -0800, Christoph Hellwig wrote:
> On Fri, Nov 21, 2014 at 02:59:21PM -0500, Theodore Ts'o wrote:
> > We needed to preserve update_time() because btrfs wants to have a
> > special btrfs_root_readonly() check; otherwise we could drop the
> > update_time() inode operation entirely.
> 
> Can't btrfs just set the immutable flag on every inode that is read
> when the root has the BTRFS_ROOT_SUBVOL_RDONLY flag?  That would
> cut down the places that need this check to the ioctl path so that
> we prevent users from clearling the immutable flag.

Sounds like a good plan to me, although I'm not sure I understand how
BTRFS_ROOT_SUBVOL_RDONLY flag works, since I would have thought there
are all sorts of places in the VFS layer where it is currently
checking MS_RDONLY and MNT_READONLY and _not_ checking
BTRFS_ROOT_SUBVOL_RDONLY isn't causing other problems.

But unless there's something more subtle going on, it would seem to me
that setting the immutable flag on each inode would be a better way to
go in any case.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 24, 2014, 4:38 p.m. UTC | #5
On Fri, Nov 21, 2014 at 04:42:45PM -0500, Theodore Ts'o wrote:
> Out of curiosity, why does btrfs_update_time() need to call
> btrfs_root_readonly()?  Why can't it just depend on the
> __mnt_want_write() call in touch_atime()?

mnt_want_write looks only at the mountpoint flags, the readonly
subvolume status is external to that.

> Surely if there are times when it's not OK to write into a btrfs file
> system and mnt_is_readonly() returns false, the VFS is going to get
> very confused abyway.
> 
> If the btrfs_update_time() is not necessary, then we could drop
> btrfs_update_time() and update_time() from the inode operations
> entirely, and depend on the VFS-level code in update_time().

It is necessary and the whole .update_time callback was added
intentionally, see commits

c3b2da314834499f34cba94f7053e55f6d6f92d8
fs: introduce inode operation ->update_time

e41f941a23115e84a8550b3d901a13a14b2edc2f
Btrfs: move over to use ->update_time

2bc5565286121d2a77ccd728eb3484dff2035b58
Btrfs: don't update atime on RO subvolumes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 24, 2014, 5:22 p.m. UTC | #6
On Mon, Nov 24, 2014 at 05:38:30PM +0100, David Sterba wrote:
> 
> It is necessary and the whole .update_time callback was added
> intentionally, see commits
> 
> c3b2da314834499f34cba94f7053e55f6d6f92d8
> fs: introduce inode operation ->update_time
> 
> e41f941a23115e84a8550b3d901a13a14b2edc2f
> Btrfs: move over to use ->update_time

Being able to signal an error if the time update fails is still
possible even if we drop update_time(), because the new write_time()
function will return an error.

> 2bc5565286121d2a77ccd728eb3484dff2035b58
> Btrfs: don't update atime on RO subvolumes

Yes, but this doesn't answer my question about other places where the
VFS is only checking MS_RDONLY and MNT_READONLY besides just
update_atime().  Maybe we should be exposing an "is_readonly(inode)"
inode operations function to address this?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 24, 2014, 5:34 p.m. UTC | #7
On Mon, Nov 24, 2014 at 07:21:01AM -0800, Christoph Hellwig wrote:
> On Fri, Nov 21, 2014 at 02:59:21PM -0500, Theodore Ts'o wrote:
> > We needed to preserve update_time() because btrfs wants to have a
> > special btrfs_root_readonly() check; otherwise we could drop the
> > update_time() inode operation entirely.
> 
> Can't btrfs just set the immutable flag on every inode that is read
> when the root has the BTRFS_ROOT_SUBVOL_RDONLY flag? 

This would lead to change in return code from EROFS to EPERM/EACCESS for
all syscalls that do not pass through permissions() callback. Also, now
each file from a readonly subvol will show the 'i' attribute and there's
now way to determine if the file had had the 'i' attribute before it was
snapshotted.

> That would
> cut down the places that need this check to the ioctl path so that
> we prevent users from clearling the immutable flag.

This means that even the root or capable user are not able to clear the
flag.

Even though the extra btrfs_root_readonly checks are not all great, the
number of surprises that the immutable bit could bring is IMO not great
either.

These callbacks that now implement the extra check:

- update_time
- setattr
- setflags (ioctl)
- setxattr
- removexattr

The btrfs_root_readonly checks in setxattr and removexattr are
unnecessary because they're done through xattr_permisssion. I'll send a
patch to remove them.

'setflags' is an ioctl and all the checks have to be done there.
The generic permission() callback cannot be used here because it would
fail to clear eg. the immutable flag.

'setattr' does limited permission checks (IMMUTABLE and APPEND), nothing
that calls to the filesystem directly or indirectly.

The remaining one is 'update_time'. I'm not sure if the amount of work
the switch to IMUUTABLE bit would need is justified, compared to this
one instance of extra btrfs_root_readonly test. As the VFS layer has no
notion of subvolume it's not able to determine the RO/RW status without
asking the filesystem anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 24, 2014, 6:09 p.m. UTC | #8
On Mon, Nov 24, 2014 at 12:22:16PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 24, 2014 at 05:38:30PM +0100, David Sterba wrote:
> > 
> > It is necessary and the whole .update_time callback was added
> > intentionally, see commits
> > 
> > c3b2da314834499f34cba94f7053e55f6d6f92d8
> > fs: introduce inode operation ->update_time
> > 
> > e41f941a23115e84a8550b3d901a13a14b2edc2f
> > Btrfs: move over to use ->update_time
> 
> Being able to signal an error if the time update fails is still
> possible even if we drop update_time(), because the new write_time()
> function will return an error.

Fine, means your change does not break the current status. I was
providing the more complete list of related commits.

> > 2bc5565286121d2a77ccd728eb3484dff2035b58
> > Btrfs: don't update atime on RO subvolumes
> 
> Yes, but this doesn't answer my question about other places where the
> VFS is only checking MS_RDONLY and MNT_READONLY besides just
> update_atime().  Maybe we should be exposing an "is_readonly(inode)"
> inode operations function to address this?

Yes, if this is a lightweight check then it'd would allow to remove the
filesystem-specific checks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Nov. 25, 2014, 3:51 p.m. UTC | #9
On Mon, Nov 24, 2014 at 06:34:30PM +0100, David Sterba wrote:
> The btrfs_root_readonly checks in setxattr and removexattr are
> unnecessary because they're done through xattr_permisssion. I'll send a
> patch to remove them.

Does not work because the security.* and system.* namespaces do not call
the permission() hook, so no patch. However, if the proposed
inode_is_readonly callback is merged, we can replace the btrfs-specific
checks with is_readonly check in xattr_permission().
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Nov. 25, 2014, 5:01 p.m. UTC | #10
On Tue, Nov 25, 2014 at 04:51:41PM +0100, David Sterba wrote:
> Does not work because the security.* and system.* namespaces do not call
> the permission() hook, so no patch. However, if the proposed
> inode_is_readonly callback is merged, we can replace the btrfs-specific
> checks with is_readonly check in xattr_permission().

I think that patch should go first in the series.  But I really need to
find some time to review the whole thing before commenting with profound
half-knowledge..
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d23362f..a5e0d0d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5574,6 +5574,11 @@  static int btrfs_update_time(struct inode *inode, struct timespec *now,
 		inode->i_mtime = *now;
 	if (flags & S_ATIME)
 		inode->i_atime = *now;
+	return 0;
+}
+
+static int btrfs_write_time(struct inode *inode)
+{
 	return btrfs_dirty_inode(inode);
 }
 
@@ -9462,6 +9467,7 @@  static const struct inode_operations btrfs_dir_inode_operations = {
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 	.tmpfile        = btrfs_tmpfile,
 };
 static const struct inode_operations btrfs_dir_ro_inode_operations = {
@@ -9470,6 +9476,7 @@  static const struct inode_operations btrfs_dir_ro_inode_operations = {
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 };
 
 static const struct file_operations btrfs_dir_file_operations = {
@@ -9540,6 +9547,7 @@  static const struct inode_operations btrfs_file_inode_operations = {
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 };
 static const struct inode_operations btrfs_special_inode_operations = {
 	.getattr	= btrfs_getattr,
@@ -9552,6 +9560,7 @@  static const struct inode_operations btrfs_special_inode_operations = {
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 };
 static const struct inode_operations btrfs_symlink_inode_operations = {
 	.readlink	= generic_readlink,
@@ -9565,6 +9574,7 @@  static const struct inode_operations btrfs_symlink_inode_operations = {
 	.listxattr	= btrfs_listxattr,
 	.removexattr	= btrfs_removexattr,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 };
 
 const struct dentry_operations btrfs_dentry_operations = {
diff --git a/fs/inode.c b/fs/inode.c
index 26753ba..8f5c4b5 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1499,17 +1499,24 @@  static int relatime_need_update(struct vfsmount *mnt, struct inode *inode,
  */
 static int update_time(struct inode *inode, struct timespec *time, int flags)
 {
-	if (inode->i_op->update_time)
-		return inode->i_op->update_time(inode, time, flags);
-
-	if (flags & S_ATIME)
-		inode->i_atime = *time;
-	if (flags & S_VERSION)
-		inode_inc_iversion(inode);
-	if (flags & S_CTIME)
-		inode->i_ctime = *time;
-	if (flags & S_MTIME)
-		inode->i_mtime = *time;
+	int ret;
+
+	if (inode->i_op->update_time) {
+		ret = inode->i_op->update_time(inode, time, flags);
+		if (ret)
+			return ret;
+	} else {
+		if (flags & S_ATIME)
+			inode->i_atime = *time;
+		if (flags & S_VERSION)
+			inode_inc_iversion(inode);
+		if (flags & S_CTIME)
+			inode->i_ctime = *time;
+		if (flags & S_MTIME)
+			inode->i_mtime = *time;
+	}
+	if (inode->i_op->write_time)
+		return inode->i_op->write_time(inode);
 	mark_inode_dirty_sync(inode);
 	return 0;
 }
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ec6dcdc..0e9653c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -984,10 +984,8 @@  xfs_vn_setattr(
 }
 
 STATIC int
-xfs_vn_update_time(
-	struct inode		*inode,
-	struct timespec		*now,
-	int			flags)
+xfs_vn_write_time(
+	struct inode		*inode)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1004,21 +1002,16 @@  xfs_vn_update_time(
 	}
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	if (flags & S_CTIME) {
-		inode->i_ctime = *now;
-		ip->i_d.di_ctime.t_sec = (__int32_t)now->tv_sec;
-		ip->i_d.di_ctime.t_nsec = (__int32_t)now->tv_nsec;
-	}
-	if (flags & S_MTIME) {
-		inode->i_mtime = *now;
-		ip->i_d.di_mtime.t_sec = (__int32_t)now->tv_sec;
-		ip->i_d.di_mtime.t_nsec = (__int32_t)now->tv_nsec;
-	}
-	if (flags & S_ATIME) {
-		inode->i_atime = *now;
-		ip->i_d.di_atime.t_sec = (__int32_t)now->tv_sec;
-		ip->i_d.di_atime.t_nsec = (__int32_t)now->tv_nsec;
-	}
+
+	ip->i_d.di_ctime.t_sec = (__int32_t) inode->i_ctime.tv_sec;
+	ip->i_d.di_ctime.t_nsec = (__int32_t) inode->i_ctime.tv_nsec;
+
+	ip->i_d.di_mtime.t_sec = (__int32_t)inode->i_mtime.tv_sec;
+	ip->i_d.di_mtime.t_nsec = (__int32_t) inode->i_mtime.tv_nsec;
+
+	ip->i_d.di_atime.t_sec = (__int32_t) inode->i_atime.tv_sec;
+	ip->i_d.di_atime.t_nsec = (__int32_t) inode->i_atime.tv_nsec;
+
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
 	return xfs_trans_commit(tp, 0);
@@ -1129,7 +1122,7 @@  static const struct inode_operations xfs_inode_operations = {
 	.removexattr		= generic_removexattr,
 	.listxattr		= xfs_vn_listxattr,
 	.fiemap			= xfs_vn_fiemap,
-	.update_time		= xfs_vn_update_time,
+	.write_time		= xfs_vn_write_time,
 };
 
 static const struct inode_operations xfs_dir_inode_operations = {
@@ -1156,7 +1149,7 @@  static const struct inode_operations xfs_dir_inode_operations = {
 	.getxattr		= generic_getxattr,
 	.removexattr		= generic_removexattr,
 	.listxattr		= xfs_vn_listxattr,
-	.update_time		= xfs_vn_update_time,
+	.write_time		= xfs_vn_write_time,
 	.tmpfile		= xfs_vn_tmpfile,
 };
 
@@ -1184,7 +1177,7 @@  static const struct inode_operations xfs_dir_ci_inode_operations = {
 	.getxattr		= generic_getxattr,
 	.removexattr		= generic_removexattr,
 	.listxattr		= xfs_vn_listxattr,
-	.update_time		= xfs_vn_update_time,
+	.write_time		= xfs_vn_write_time,
 	.tmpfile		= xfs_vn_tmpfile,
 };
 
@@ -1198,7 +1191,7 @@  static const struct inode_operations xfs_symlink_inode_operations = {
 	.getxattr		= generic_getxattr,
 	.removexattr		= generic_removexattr,
 	.listxattr		= xfs_vn_listxattr,
-	.update_time		= xfs_vn_update_time,
+	.write_time		= xfs_vn_write_time,
 };
 
 STATIC void
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ab779e..3633239 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1545,6 +1545,7 @@  struct inode_operations {
 	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
 		      u64 len);
 	int (*update_time)(struct inode *, struct timespec *, int);
+	int (*write_time)(struct inode *);
 	int (*atomic_open)(struct inode *, struct dentry *,
 			   struct file *, unsigned open_flag,
 			   umode_t create_mode, int *opened);