Message ID | 1416599964-21892-2-git-send-email-tytso@mit.edu (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Fri, Nov 21, 2014 at 2:59 PM, Theodore Ts'o <tytso@mit.edu> wrote: > In preparation for adding support for the lazytime mount option, we > need to be able to separate out the update_time() and write_time() > inode operations. Currently, only btrfs and xfs uses update_time(). > > We needed to preserve update_time() because btrfs wants to have a > special btrfs_root_readonly() check; otherwise we could drop the > update_time() inode operation entirely. No objections here, I'll give the queue a whirl. You can add my acked-by-or-Ill-fix-whatever-breaks I look forward to the patch that makes us all lazy by default. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Out of curiosity, why does btrfs_update_time() need to call btrfs_root_readonly()? Why can't it just depend on the __mnt_want_write() call in touch_atime()? Surely if there are times when it's not OK to write into a btrfs file system and mnt_is_readonly() returns false, the VFS is going to get very confused abyway. If the btrfs_update_time() is not necessary, then we could drop btrfs_update_time() and update_time() from the inode operations entirely, and depend on the VFS-level code in update_time(). - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 21, 2014 at 02:59:21PM -0500, Theodore Ts'o wrote: > We needed to preserve update_time() because btrfs wants to have a > special btrfs_root_readonly() check; otherwise we could drop the > update_time() inode operation entirely. Can't btrfs just set the immutable flag on every inode that is read when the root has the BTRFS_ROOT_SUBVOL_RDONLY flag? That would cut down the places that need this check to the ioctl path so that we prevent users from clearling the immutable flag. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 24, 2014 at 07:21:01AM -0800, Christoph Hellwig wrote: > On Fri, Nov 21, 2014 at 02:59:21PM -0500, Theodore Ts'o wrote: > > We needed to preserve update_time() because btrfs wants to have a > > special btrfs_root_readonly() check; otherwise we could drop the > > update_time() inode operation entirely. > > Can't btrfs just set the immutable flag on every inode that is read > when the root has the BTRFS_ROOT_SUBVOL_RDONLY flag? That would > cut down the places that need this check to the ioctl path so that > we prevent users from clearling the immutable flag. Sounds like a good plan to me, although I'm not sure I understand how BTRFS_ROOT_SUBVOL_RDONLY flag works, since I would have thought there are all sorts of places in the VFS layer where it is currently checking MS_RDONLY and MNT_READONLY and _not_ checking BTRFS_ROOT_SUBVOL_RDONLY isn't causing other problems. But unless there's something more subtle going on, it would seem to me that setting the immutable flag on each inode would be a better way to go in any case. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Nov 21, 2014 at 04:42:45PM -0500, Theodore Ts'o wrote: > Out of curiosity, why does btrfs_update_time() need to call > btrfs_root_readonly()? Why can't it just depend on the > __mnt_want_write() call in touch_atime()? mnt_want_write looks only at the mountpoint flags, the readonly subvolume status is external to that. > Surely if there are times when it's not OK to write into a btrfs file > system and mnt_is_readonly() returns false, the VFS is going to get > very confused abyway. > > If the btrfs_update_time() is not necessary, then we could drop > btrfs_update_time() and update_time() from the inode operations > entirely, and depend on the VFS-level code in update_time(). It is necessary and the whole .update_time callback was added intentionally, see commits c3b2da314834499f34cba94f7053e55f6d6f92d8 fs: introduce inode operation ->update_time e41f941a23115e84a8550b3d901a13a14b2edc2f Btrfs: move over to use ->update_time 2bc5565286121d2a77ccd728eb3484dff2035b58 Btrfs: don't update atime on RO subvolumes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 24, 2014 at 05:38:30PM +0100, David Sterba wrote: > > It is necessary and the whole .update_time callback was added > intentionally, see commits > > c3b2da314834499f34cba94f7053e55f6d6f92d8 > fs: introduce inode operation ->update_time > > e41f941a23115e84a8550b3d901a13a14b2edc2f > Btrfs: move over to use ->update_time Being able to signal an error if the time update fails is still possible even if we drop update_time(), because the new write_time() function will return an error. > 2bc5565286121d2a77ccd728eb3484dff2035b58 > Btrfs: don't update atime on RO subvolumes Yes, but this doesn't answer my question about other places where the VFS is only checking MS_RDONLY and MNT_READONLY besides just update_atime(). Maybe we should be exposing an "is_readonly(inode)" inode operations function to address this? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 24, 2014 at 07:21:01AM -0800, Christoph Hellwig wrote: > On Fri, Nov 21, 2014 at 02:59:21PM -0500, Theodore Ts'o wrote: > > We needed to preserve update_time() because btrfs wants to have a > > special btrfs_root_readonly() check; otherwise we could drop the > > update_time() inode operation entirely. > > Can't btrfs just set the immutable flag on every inode that is read > when the root has the BTRFS_ROOT_SUBVOL_RDONLY flag? This would lead to change in return code from EROFS to EPERM/EACCESS for all syscalls that do not pass through permissions() callback. Also, now each file from a readonly subvol will show the 'i' attribute and there's now way to determine if the file had had the 'i' attribute before it was snapshotted. > That would > cut down the places that need this check to the ioctl path so that > we prevent users from clearling the immutable flag. This means that even the root or capable user are not able to clear the flag. Even though the extra btrfs_root_readonly checks are not all great, the number of surprises that the immutable bit could bring is IMO not great either. These callbacks that now implement the extra check: - update_time - setattr - setflags (ioctl) - setxattr - removexattr The btrfs_root_readonly checks in setxattr and removexattr are unnecessary because they're done through xattr_permisssion. I'll send a patch to remove them. 'setflags' is an ioctl and all the checks have to be done there. The generic permission() callback cannot be used here because it would fail to clear eg. the immutable flag. 'setattr' does limited permission checks (IMMUTABLE and APPEND), nothing that calls to the filesystem directly or indirectly. The remaining one is 'update_time'. I'm not sure if the amount of work the switch to IMUUTABLE bit would need is justified, compared to this one instance of extra btrfs_root_readonly test. As the VFS layer has no notion of subvolume it's not able to determine the RO/RW status without asking the filesystem anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 24, 2014 at 12:22:16PM -0500, Theodore Ts'o wrote: > On Mon, Nov 24, 2014 at 05:38:30PM +0100, David Sterba wrote: > > > > It is necessary and the whole .update_time callback was added > > intentionally, see commits > > > > c3b2da314834499f34cba94f7053e55f6d6f92d8 > > fs: introduce inode operation ->update_time > > > > e41f941a23115e84a8550b3d901a13a14b2edc2f > > Btrfs: move over to use ->update_time > > Being able to signal an error if the time update fails is still > possible even if we drop update_time(), because the new write_time() > function will return an error. Fine, means your change does not break the current status. I was providing the more complete list of related commits. > > 2bc5565286121d2a77ccd728eb3484dff2035b58 > > Btrfs: don't update atime on RO subvolumes > > Yes, but this doesn't answer my question about other places where the > VFS is only checking MS_RDONLY and MNT_READONLY besides just > update_atime(). Maybe we should be exposing an "is_readonly(inode)" > inode operations function to address this? Yes, if this is a lightweight check then it'd would allow to remove the filesystem-specific checks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 24, 2014 at 06:34:30PM +0100, David Sterba wrote: > The btrfs_root_readonly checks in setxattr and removexattr are > unnecessary because they're done through xattr_permisssion. I'll send a > patch to remove them. Does not work because the security.* and system.* namespaces do not call the permission() hook, so no patch. However, if the proposed inode_is_readonly callback is merged, we can replace the btrfs-specific checks with is_readonly check in xattr_permission(). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 25, 2014 at 04:51:41PM +0100, David Sterba wrote: > Does not work because the security.* and system.* namespaces do not call > the permission() hook, so no patch. However, if the proposed > inode_is_readonly callback is merged, we can replace the btrfs-specific > checks with is_readonly check in xattr_permission(). I think that patch should go first in the series. But I really need to find some time to review the whole thing before commenting with profound half-knowledge.. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d23362f..a5e0d0d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5574,6 +5574,11 @@ static int btrfs_update_time(struct inode *inode, struct timespec *now, inode->i_mtime = *now; if (flags & S_ATIME) inode->i_atime = *now; + return 0; +} + +static int btrfs_write_time(struct inode *inode) +{ return btrfs_dirty_inode(inode); } @@ -9462,6 +9467,7 @@ static const struct inode_operations btrfs_dir_inode_operations = { .get_acl = btrfs_get_acl, .set_acl = btrfs_set_acl, .update_time = btrfs_update_time, + .write_time = btrfs_write_time, .tmpfile = btrfs_tmpfile, }; static const struct inode_operations btrfs_dir_ro_inode_operations = { @@ -9470,6 +9476,7 @@ static const struct inode_operations btrfs_dir_ro_inode_operations = { .get_acl = btrfs_get_acl, .set_acl = btrfs_set_acl, .update_time = btrfs_update_time, + .write_time = btrfs_write_time, }; static const struct file_operations btrfs_dir_file_operations = { @@ -9540,6 +9547,7 @@ static const struct inode_operations btrfs_file_inode_operations = { .get_acl = btrfs_get_acl, .set_acl = btrfs_set_acl, .update_time = btrfs_update_time, + .write_time = btrfs_write_time, }; static const struct inode_operations btrfs_special_inode_operations = { .getattr = btrfs_getattr, @@ -9552,6 +9560,7 @@ static const struct inode_operations btrfs_special_inode_operations = { .get_acl = btrfs_get_acl, .set_acl = btrfs_set_acl, .update_time = btrfs_update_time, + .write_time = btrfs_write_time, }; static const struct inode_operations btrfs_symlink_inode_operations = { .readlink = generic_readlink, @@ -9565,6 +9574,7 @@ static const struct inode_operations btrfs_symlink_inode_operations = { .listxattr = btrfs_listxattr, .removexattr = btrfs_removexattr, .update_time = btrfs_update_time, + .write_time = btrfs_write_time, }; const struct dentry_operations btrfs_dentry_operations = { diff --git a/fs/inode.c b/fs/inode.c index 26753ba..8f5c4b5 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1499,17 +1499,24 @@ static int relatime_need_update(struct vfsmount *mnt, struct inode *inode, */ static int update_time(struct inode *inode, struct timespec *time, int flags) { - if (inode->i_op->update_time) - return inode->i_op->update_time(inode, time, flags); - - if (flags & S_ATIME) - inode->i_atime = *time; - if (flags & S_VERSION) - inode_inc_iversion(inode); - if (flags & S_CTIME) - inode->i_ctime = *time; - if (flags & S_MTIME) - inode->i_mtime = *time; + int ret; + + if (inode->i_op->update_time) { + ret = inode->i_op->update_time(inode, time, flags); + if (ret) + return ret; + } else { + if (flags & S_ATIME) + inode->i_atime = *time; + if (flags & S_VERSION) + inode_inc_iversion(inode); + if (flags & S_CTIME) + inode->i_ctime = *time; + if (flags & S_MTIME) + inode->i_mtime = *time; + } + if (inode->i_op->write_time) + return inode->i_op->write_time(inode); mark_inode_dirty_sync(inode); return 0; } diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index ec6dcdc..0e9653c 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -984,10 +984,8 @@ xfs_vn_setattr( } STATIC int -xfs_vn_update_time( - struct inode *inode, - struct timespec *now, - int flags) +xfs_vn_write_time( + struct inode *inode) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; @@ -1004,21 +1002,16 @@ xfs_vn_update_time( } xfs_ilock(ip, XFS_ILOCK_EXCL); - if (flags & S_CTIME) { - inode->i_ctime = *now; - ip->i_d.di_ctime.t_sec = (__int32_t)now->tv_sec; - ip->i_d.di_ctime.t_nsec = (__int32_t)now->tv_nsec; - } - if (flags & S_MTIME) { - inode->i_mtime = *now; - ip->i_d.di_mtime.t_sec = (__int32_t)now->tv_sec; - ip->i_d.di_mtime.t_nsec = (__int32_t)now->tv_nsec; - } - if (flags & S_ATIME) { - inode->i_atime = *now; - ip->i_d.di_atime.t_sec = (__int32_t)now->tv_sec; - ip->i_d.di_atime.t_nsec = (__int32_t)now->tv_nsec; - } + + ip->i_d.di_ctime.t_sec = (__int32_t) inode->i_ctime.tv_sec; + ip->i_d.di_ctime.t_nsec = (__int32_t) inode->i_ctime.tv_nsec; + + ip->i_d.di_mtime.t_sec = (__int32_t)inode->i_mtime.tv_sec; + ip->i_d.di_mtime.t_nsec = (__int32_t) inode->i_mtime.tv_nsec; + + ip->i_d.di_atime.t_sec = (__int32_t) inode->i_atime.tv_sec; + ip->i_d.di_atime.t_nsec = (__int32_t) inode->i_atime.tv_nsec; + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP); return xfs_trans_commit(tp, 0); @@ -1129,7 +1122,7 @@ static const struct inode_operations xfs_inode_operations = { .removexattr = generic_removexattr, .listxattr = xfs_vn_listxattr, .fiemap = xfs_vn_fiemap, - .update_time = xfs_vn_update_time, + .write_time = xfs_vn_write_time, }; static const struct inode_operations xfs_dir_inode_operations = { @@ -1156,7 +1149,7 @@ static const struct inode_operations xfs_dir_inode_operations = { .getxattr = generic_getxattr, .removexattr = generic_removexattr, .listxattr = xfs_vn_listxattr, - .update_time = xfs_vn_update_time, + .write_time = xfs_vn_write_time, .tmpfile = xfs_vn_tmpfile, }; @@ -1184,7 +1177,7 @@ static const struct inode_operations xfs_dir_ci_inode_operations = { .getxattr = generic_getxattr, .removexattr = generic_removexattr, .listxattr = xfs_vn_listxattr, - .update_time = xfs_vn_update_time, + .write_time = xfs_vn_write_time, .tmpfile = xfs_vn_tmpfile, }; @@ -1198,7 +1191,7 @@ static const struct inode_operations xfs_symlink_inode_operations = { .getxattr = generic_getxattr, .removexattr = generic_removexattr, .listxattr = xfs_vn_listxattr, - .update_time = xfs_vn_update_time, + .write_time = xfs_vn_write_time, }; STATIC void diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ab779e..3633239 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1545,6 +1545,7 @@ struct inode_operations { int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, u64 len); int (*update_time)(struct inode *, struct timespec *, int); + int (*write_time)(struct inode *); int (*atomic_open)(struct inode *, struct dentry *, struct file *, unsigned open_flag, umode_t create_mode, int *opened);
In preparation for adding support for the lazytime mount option, we need to be able to separate out the update_time() and write_time() inode operations. Currently, only btrfs and xfs uses update_time(). We needed to preserve update_time() because btrfs wants to have a special btrfs_root_readonly() check; otherwise we could drop the update_time() inode operation entirely. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org --- fs/btrfs/inode.c | 10 ++++++++++ fs/inode.c | 29 ++++++++++++++++++----------- fs/xfs/xfs_iops.c | 39 ++++++++++++++++----------------------- include/linux/fs.h | 1 + 4 files changed, 45 insertions(+), 34 deletions(-)