Message ID | 20190326104956.10314-1-fdmanana@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Btrfs: do not allow trimming when a fs is mounted with the nologreplay option | expand |
On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > Whan a filesystem is mounted with the nologreplay mount option, which > requires it to be mounted in RO mode as well, we can not allow discard on > free space inside block groups, because log trees refer to extents that > are not pinned in a block group's free space cache (pinning the extents is > precisely the first phase of replaying a log tree). > > So do not allow the fitrim ioctl to do anything when the filesystem is > mounted with the nologreplay option, because later it can be mounted RW > without that option, which causes log replay to happen and result in > either a failure to replay the log trees (leading to a mount failure), a > crash or some silent corruption. > > Reported-by: Darrick J. Wong <darrick.wong@oracle.com> > Signed-off-by: Filipe Manana <fdmanana@suse.com> Does it make sense to make the check a bit more specific and only return EROFS when NOLOGREPLAY and the log tree has non-null generation? In any case: Reviewed-by: Nikolay Borisov <nborisov@suse.com> > --- > fs/btrfs/ioctl.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 494f0f10d70e..01808934d21f 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg) > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > > + /* > + * If the fs is mounted with nologreplay, which requires it to be > + * mounted in RO mode as well, we can not allow discard on free space > + * inside block groups, because log trees refer to extents that are not > + * pinned in a block group's free space cache (pinning the extents is > + * precisely the first phase of replaying a log tree). > + */ > + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) > + return -EROFS; > + > rcu_read_lock(); > list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, > dev_list) { >
On Tue, Mar 26, 2019 at 12:17 PM Nikolay Borisov <nborisov@suse.com> wrote: > > > > On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote: > > From: Filipe Manana <fdmanana@suse.com> > > > > Whan a filesystem is mounted with the nologreplay mount option, which > > requires it to be mounted in RO mode as well, we can not allow discard on > > free space inside block groups, because log trees refer to extents that > > are not pinned in a block group's free space cache (pinning the extents is > > precisely the first phase of replaying a log tree). > > > > So do not allow the fitrim ioctl to do anything when the filesystem is > > mounted with the nologreplay option, because later it can be mounted RW > > without that option, which causes log replay to happen and result in > > either a failure to replay the log trees (leading to a mount failure), a > > crash or some silent corruption. > > > > Reported-by: Darrick J. Wong <darrick.wong@oracle.com> > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > Does it make sense to make the check a bit more specific and only return > EROFS when NOLOGREPLAY and the log tree has non-null generation? It would make sense checking if there's actually a log tree as well. Neither the xfs nor ext4 (which is already in Linus' tree) do such equivalent checks, nor the proposed fstests test case makes sure a journal/log exists. Not against it, but this isn't a common use case either. > > In any case: > > Reviewed-by: Nikolay Borisov <nborisov@suse.com> > > > --- > > fs/btrfs/ioctl.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > > index 494f0f10d70e..01808934d21f 100644 > > --- a/fs/btrfs/ioctl.c > > +++ b/fs/btrfs/ioctl.c > > @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg) > > if (!capable(CAP_SYS_ADMIN)) > > return -EPERM; > > > > + /* > > + * If the fs is mounted with nologreplay, which requires it to be > > + * mounted in RO mode as well, we can not allow discard on free space > > + * inside block groups, because log trees refer to extents that are not > > + * pinned in a block group's free space cache (pinning the extents is > > + * precisely the first phase of replaying a log tree). > > + */ > > + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) > > + return -EROFS; > > + > > rcu_read_lock(); > > list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, > > dev_list) { > >
On 26.03.19 г. 14:35 ч., Filipe Manana wrote: > On Tue, Mar 26, 2019 at 12:17 PM Nikolay Borisov <nborisov@suse.com> wrote: >> >> >> >> On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote: >>> From: Filipe Manana <fdmanana@suse.com> >>> >>> Whan a filesystem is mounted with the nologreplay mount option, which >>> requires it to be mounted in RO mode as well, we can not allow discard on >>> free space inside block groups, because log trees refer to extents that >>> are not pinned in a block group's free space cache (pinning the extents is >>> precisely the first phase of replaying a log tree). >>> >>> So do not allow the fitrim ioctl to do anything when the filesystem is >>> mounted with the nologreplay option, because later it can be mounted RW >>> without that option, which causes log replay to happen and result in >>> either a failure to replay the log trees (leading to a mount failure), a >>> crash or some silent corruption. >>> >>> Reported-by: Darrick J. Wong <darrick.wong@oracle.com> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com> >> >> Does it make sense to make the check a bit more specific and only return >> EROFS when NOLOGREPLAY and the log tree has non-null generation? > > It would make sense checking if there's actually a log tree as well. > Neither the xfs nor ext4 (which is already in Linus' tree) do such > equivalent checks, nor the proposed fstests test case makes sure a > journal/log exists. > > Not against it, but this isn't a common use case either. I think of this as sorts of "optimisation" where if we don't have a tree then we can allow trim. Though this is much simpler so I'm fine with it as well. > >> >> In any case: >> >> Reviewed-by: Nikolay Borisov <nborisov@suse.com> >> >>> --- >>> fs/btrfs/ioctl.c | 10 ++++++++++ >>> 1 file changed, 10 insertions(+) >>> >>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >>> index 494f0f10d70e..01808934d21f 100644 >>> --- a/fs/btrfs/ioctl.c >>> +++ b/fs/btrfs/ioctl.c >>> @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg) >>> if (!capable(CAP_SYS_ADMIN)) >>> return -EPERM; >>> >>> + /* >>> + * If the fs is mounted with nologreplay, which requires it to be >>> + * mounted in RO mode as well, we can not allow discard on free space >>> + * inside block groups, because log trees refer to extents that are not >>> + * pinned in a block group's free space cache (pinning the extents is >>> + * precisely the first phase of replaying a log tree). >>> + */ >>> + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) >>> + return -EROFS; >>> + >>> rcu_read_lock(); >>> list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, >>> dev_list) { >>> >
On 2019/3/26 下午8:17, Nikolay Borisov wrote: > > > On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote: >> From: Filipe Manana <fdmanana@suse.com> >> >> Whan a filesystem is mounted with the nologreplay mount option, which >> requires it to be mounted in RO mode as well, we can not allow discard on >> free space inside block groups, because log trees refer to extents that >> are not pinned in a block group's free space cache (pinning the extents is >> precisely the first phase of replaying a log tree). >> >> So do not allow the fitrim ioctl to do anything when the filesystem is >> mounted with the nologreplay option, because later it can be mounted RW >> without that option, which causes log replay to happen and result in >> either a failure to replay the log trees (leading to a mount failure), a >> crash or some silent corruption. >> >> Reported-by: Darrick J. Wong <darrick.wong@oracle.com> >> Signed-off-by: Filipe Manana <fdmanana@suse.com> > > Does it make sense to make the check a bit more specific and only return > EROFS when NOLOGREPLAY and the log tree has non-null generation? To me fstrim is a WRITE operation, why it is allowed even in RO mount? Thanks, Qu > > In any case: > > Reviewed-by: Nikolay Borisov <nborisov@suse.com> > >> --- >> fs/btrfs/ioctl.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index 494f0f10d70e..01808934d21f 100644 >> --- a/fs/btrfs/ioctl.c >> +++ b/fs/btrfs/ioctl.c >> @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg) >> if (!capable(CAP_SYS_ADMIN)) >> return -EPERM; >> >> + /* >> + * If the fs is mounted with nologreplay, which requires it to be >> + * mounted in RO mode as well, we can not allow discard on free space >> + * inside block groups, because log trees refer to extents that are not >> + * pinned in a block group's free space cache (pinning the extents is >> + * precisely the first phase of replaying a log tree). >> + */ >> + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) >> + return -EROFS; >> + >> rcu_read_lock(); >> list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, >> dev_list) { >>
On Tue, Mar 26, 2019 at 09:40:08PM +0800, Qu Wenruo wrote: > > > On 2019/3/26 下午8:17, Nikolay Borisov wrote: > > > > > > On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote: > >> From: Filipe Manana <fdmanana@suse.com> > >> > >> Whan a filesystem is mounted with the nologreplay mount option, which > >> requires it to be mounted in RO mode as well, we can not allow discard on > >> free space inside block groups, because log trees refer to extents that > >> are not pinned in a block group's free space cache (pinning the extents is > >> precisely the first phase of replaying a log tree). > >> > >> So do not allow the fitrim ioctl to do anything when the filesystem is > >> mounted with the nologreplay option, because later it can be mounted RW > >> without that option, which causes log replay to happen and result in > >> either a failure to replay the log trees (leading to a mount failure), a > >> crash or some silent corruption. > >> > >> Reported-by: Darrick J. Wong <darrick.wong@oracle.com> > >> Signed-off-by: Filipe Manana <fdmanana@suse.com> > > > > Does it make sense to make the check a bit more specific and only return > > EROFS when NOLOGREPLAY and the log tree has non-null generation? > > To me fstrim is a WRITE operation, why it is allowed even in RO mount? It's write to the block device, not to the filesystem.
On Tue, Mar 26, 2019 at 02:39:45PM +0200, Nikolay Borisov wrote: > On 26.03.19 г. 14:35 ч., Filipe Manana wrote: > > On Tue, Mar 26, 2019 at 12:17 PM Nikolay Borisov <nborisov@suse.com> wrote: > >> On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote: > >>> From: Filipe Manana <fdmanana@suse.com> > >>> > >>> Whan a filesystem is mounted with the nologreplay mount option, which > >>> requires it to be mounted in RO mode as well, we can not allow discard on > >>> free space inside block groups, because log trees refer to extents that > >>> are not pinned in a block group's free space cache (pinning the extents is > >>> precisely the first phase of replaying a log tree). > >>> > >>> So do not allow the fitrim ioctl to do anything when the filesystem is > >>> mounted with the nologreplay option, because later it can be mounted RW > >>> without that option, which causes log replay to happen and result in > >>> either a failure to replay the log trees (leading to a mount failure), a > >>> crash or some silent corruption. > >>> > >>> Reported-by: Darrick J. Wong <darrick.wong@oracle.com> > >>> Signed-off-by: Filipe Manana <fdmanana@suse.com> > >> > >> Does it make sense to make the check a bit more specific and only return > >> EROFS when NOLOGREPLAY and the log tree has non-null generation? > > > > It would make sense checking if there's actually a log tree as well. > > Neither the xfs nor ext4 (which is already in Linus' tree) do such > > equivalent checks, nor the proposed fstests test case makes sure a > > journal/log exists. > > > > Not against it, but this isn't a common use case either. > > I think of this as sorts of "optimisation" where if we don't have a tree > then we can allow trim. Though this is much simpler so I'm fine with it > as well. Agreed, the simple solution sounds ok to me, trim is not a critical operation so we don't need to try harder to make it work even with the mount option.
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 494f0f10d70e..01808934d21f 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg) if (!capable(CAP_SYS_ADMIN)) return -EPERM; + /* + * If the fs is mounted with nologreplay, which requires it to be + * mounted in RO mode as well, we can not allow discard on free space + * inside block groups, because log trees refer to extents that are not + * pinned in a block group's free space cache (pinning the extents is + * precisely the first phase of replaying a log tree). + */ + if (btrfs_test_opt(fs_info, NOLOGREPLAY)) + return -EROFS; + rcu_read_lock(); list_for_each_entry_rcu(device, &fs_info->fs_devices->devices, dev_list) {