Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
diff mbox series

Message ID 20190326104956.10314-1-fdmanana@kernel.org
State New
Headers show
Series
  • Btrfs: do not allow trimming when a fs is mounted with the nologreplay option
Related show

Commit Message

Filipe Manana March 26, 2019, 10:49 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

Whan a filesystem is mounted with the nologreplay mount option, which
requires it to be mounted in RO mode as well, we can not allow discard on
free space inside block groups, because log trees refer to extents that
are not pinned in a block group's free space cache (pinning the extents is
precisely the first phase of replaying a log tree).

So do not allow the fitrim ioctl to do anything when the filesystem is
mounted with the nologreplay option, because later it can be mounted RW
without that option, which causes log replay to happen and result in
either a failure to replay the log trees (leading to a mount failure), a
crash or some silent corruption.

Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ioctl.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Nikolay Borisov March 26, 2019, 12:17 p.m. UTC | #1
On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Whan a filesystem is mounted with the nologreplay mount option, which
> requires it to be mounted in RO mode as well, we can not allow discard on
> free space inside block groups, because log trees refer to extents that
> are not pinned in a block group's free space cache (pinning the extents is
> precisely the first phase of replaying a log tree).
> 
> So do not allow the fitrim ioctl to do anything when the filesystem is
> mounted with the nologreplay option, because later it can be mounted RW
> without that option, which causes log replay to happen and result in
> either a failure to replay the log trees (leading to a mount failure), a
> crash or some silent corruption.
> 
> Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Does it make sense to make the check a bit more specific and only return
EROFS when NOLOGREPLAY and the log tree has non-null generation?

In any case:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/ioctl.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 494f0f10d70e..01808934d21f 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  
> +	/*
> +	 * If the fs is mounted with nologreplay, which requires it to be
> +	 * mounted in RO mode as well, we can not allow discard on free space
> +	 * inside block groups, because log trees refer to extents that are not
> +	 * pinned in a block group's free space cache (pinning the extents is
> +	 * precisely the first phase of replaying a log tree).
> +	 */
> +	if (btrfs_test_opt(fs_info, NOLOGREPLAY))
> +		return -EROFS;
> +
>  	rcu_read_lock();
>  	list_for_each_entry_rcu(device, &fs_info->fs_devices->devices,
>  				dev_list) {
>
Filipe Manana March 26, 2019, 12:35 p.m. UTC | #2
On Tue, Mar 26, 2019 at 12:17 PM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Whan a filesystem is mounted with the nologreplay mount option, which
> > requires it to be mounted in RO mode as well, we can not allow discard on
> > free space inside block groups, because log trees refer to extents that
> > are not pinned in a block group's free space cache (pinning the extents is
> > precisely the first phase of replaying a log tree).
> >
> > So do not allow the fitrim ioctl to do anything when the filesystem is
> > mounted with the nologreplay option, because later it can be mounted RW
> > without that option, which causes log replay to happen and result in
> > either a failure to replay the log trees (leading to a mount failure), a
> > crash or some silent corruption.
> >
> > Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Does it make sense to make the check a bit more specific and only return
> EROFS when NOLOGREPLAY and the log tree has non-null generation?

It would make sense checking if there's actually a log tree as well.
Neither the xfs nor ext4 (which is already in Linus' tree) do such
equivalent checks, nor the proposed fstests test case makes sure a
journal/log exists.

Not against it, but this isn't a common use case either.

>
> In any case:
>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>
> > ---
> >  fs/btrfs/ioctl.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 494f0f10d70e..01808934d21f 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
> >       if (!capable(CAP_SYS_ADMIN))
> >               return -EPERM;
> >
> > +     /*
> > +      * If the fs is mounted with nologreplay, which requires it to be
> > +      * mounted in RO mode as well, we can not allow discard on free space
> > +      * inside block groups, because log trees refer to extents that are not
> > +      * pinned in a block group's free space cache (pinning the extents is
> > +      * precisely the first phase of replaying a log tree).
> > +      */
> > +     if (btrfs_test_opt(fs_info, NOLOGREPLAY))
> > +             return -EROFS;
> > +
> >       rcu_read_lock();
> >       list_for_each_entry_rcu(device, &fs_info->fs_devices->devices,
> >                               dev_list) {
> >
Nikolay Borisov March 26, 2019, 12:39 p.m. UTC | #3
On 26.03.19 г. 14:35 ч., Filipe Manana wrote:
> On Tue, Mar 26, 2019 at 12:17 PM Nikolay Borisov <nborisov@suse.com> wrote:
>>
>>
>>
>> On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> Whan a filesystem is mounted with the nologreplay mount option, which
>>> requires it to be mounted in RO mode as well, we can not allow discard on
>>> free space inside block groups, because log trees refer to extents that
>>> are not pinned in a block group's free space cache (pinning the extents is
>>> precisely the first phase of replaying a log tree).
>>>
>>> So do not allow the fitrim ioctl to do anything when the filesystem is
>>> mounted with the nologreplay option, because later it can be mounted RW
>>> without that option, which causes log replay to happen and result in
>>> either a failure to replay the log trees (leading to a mount failure), a
>>> crash or some silent corruption.
>>>
>>> Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>
>> Does it make sense to make the check a bit more specific and only return
>> EROFS when NOLOGREPLAY and the log tree has non-null generation?
> 
> It would make sense checking if there's actually a log tree as well.
> Neither the xfs nor ext4 (which is already in Linus' tree) do such
> equivalent checks, nor the proposed fstests test case makes sure a
> journal/log exists.
> 
> Not against it, but this isn't a common use case either.

I think of this as sorts of "optimisation" where if we don't have a tree
then we can allow trim. Though this is much simpler so I'm fine with it
as well.


> 
>>
>> In any case:
>>
>> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>>
>>> ---
>>>  fs/btrfs/ioctl.c | 10 ++++++++++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>> index 494f0f10d70e..01808934d21f 100644
>>> --- a/fs/btrfs/ioctl.c
>>> +++ b/fs/btrfs/ioctl.c
>>> @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>>>       if (!capable(CAP_SYS_ADMIN))
>>>               return -EPERM;
>>>
>>> +     /*
>>> +      * If the fs is mounted with nologreplay, which requires it to be
>>> +      * mounted in RO mode as well, we can not allow discard on free space
>>> +      * inside block groups, because log trees refer to extents that are not
>>> +      * pinned in a block group's free space cache (pinning the extents is
>>> +      * precisely the first phase of replaying a log tree).
>>> +      */
>>> +     if (btrfs_test_opt(fs_info, NOLOGREPLAY))
>>> +             return -EROFS;
>>> +
>>>       rcu_read_lock();
>>>       list_for_each_entry_rcu(device, &fs_info->fs_devices->devices,
>>>                               dev_list) {
>>>
>
Qu Wenruo March 26, 2019, 1:40 p.m. UTC | #4
On 2019/3/26 下午8:17, Nikolay Borisov wrote:
>
>
> On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> Whan a filesystem is mounted with the nologreplay mount option, which
>> requires it to be mounted in RO mode as well, we can not allow discard on
>> free space inside block groups, because log trees refer to extents that
>> are not pinned in a block group's free space cache (pinning the extents is
>> precisely the first phase of replaying a log tree).
>>
>> So do not allow the fitrim ioctl to do anything when the filesystem is
>> mounted with the nologreplay option, because later it can be mounted RW
>> without that option, which causes log replay to happen and result in
>> either a failure to replay the log trees (leading to a mount failure), a
>> crash or some silent corruption.
>>
>> Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Does it make sense to make the check a bit more specific and only return
> EROFS when NOLOGREPLAY and the log tree has non-null generation?

To me fstrim is a WRITE operation, why it is allowed even in RO mount?

Thanks,
Qu

>
> In any case:
>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>
>> ---
>>  fs/btrfs/ioctl.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index 494f0f10d70e..01808934d21f 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -501,6 +501,16 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
>>  	if (!capable(CAP_SYS_ADMIN))
>>  		return -EPERM;
>>
>> +	/*
>> +	 * If the fs is mounted with nologreplay, which requires it to be
>> +	 * mounted in RO mode as well, we can not allow discard on free space
>> +	 * inside block groups, because log trees refer to extents that are not
>> +	 * pinned in a block group's free space cache (pinning the extents is
>> +	 * precisely the first phase of replaying a log tree).
>> +	 */
>> +	if (btrfs_test_opt(fs_info, NOLOGREPLAY))
>> +		return -EROFS;
>> +
>>  	rcu_read_lock();
>>  	list_for_each_entry_rcu(device, &fs_info->fs_devices->devices,
>>  				dev_list) {
>>
David Sterba March 26, 2019, 1:48 p.m. UTC | #5
On Tue, Mar 26, 2019 at 09:40:08PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/3/26 下午8:17, Nikolay Borisov wrote:
> >
> >
> > On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote:
> >> From: Filipe Manana <fdmanana@suse.com>
> >>
> >> Whan a filesystem is mounted with the nologreplay mount option, which
> >> requires it to be mounted in RO mode as well, we can not allow discard on
> >> free space inside block groups, because log trees refer to extents that
> >> are not pinned in a block group's free space cache (pinning the extents is
> >> precisely the first phase of replaying a log tree).
> >>
> >> So do not allow the fitrim ioctl to do anything when the filesystem is
> >> mounted with the nologreplay option, because later it can be mounted RW
> >> without that option, which causes log replay to happen and result in
> >> either a failure to replay the log trees (leading to a mount failure), a
> >> crash or some silent corruption.
> >>
> >> Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
> >> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >
> > Does it make sense to make the check a bit more specific and only return
> > EROFS when NOLOGREPLAY and the log tree has non-null generation?
> 
> To me fstrim is a WRITE operation, why it is allowed even in RO mount?

It's write to the block device, not to the filesystem.
David Sterba March 28, 2019, 3:54 p.m. UTC | #6
On Tue, Mar 26, 2019 at 02:39:45PM +0200, Nikolay Borisov wrote:
> On 26.03.19 г. 14:35 ч., Filipe Manana wrote:
> > On Tue, Mar 26, 2019 at 12:17 PM Nikolay Borisov <nborisov@suse.com> wrote:
> >> On 26.03.19 г. 12:49 ч., fdmanana@kernel.org wrote:
> >>> From: Filipe Manana <fdmanana@suse.com>
> >>>
> >>> Whan a filesystem is mounted with the nologreplay mount option, which
> >>> requires it to be mounted in RO mode as well, we can not allow discard on
> >>> free space inside block groups, because log trees refer to extents that
> >>> are not pinned in a block group's free space cache (pinning the extents is
> >>> precisely the first phase of replaying a log tree).
> >>>
> >>> So do not allow the fitrim ioctl to do anything when the filesystem is
> >>> mounted with the nologreplay option, because later it can be mounted RW
> >>> without that option, which causes log replay to happen and result in
> >>> either a failure to replay the log trees (leading to a mount failure), a
> >>> crash or some silent corruption.
> >>>
> >>> Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >>
> >> Does it make sense to make the check a bit more specific and only return
> >> EROFS when NOLOGREPLAY and the log tree has non-null generation?
> > 
> > It would make sense checking if there's actually a log tree as well.
> > Neither the xfs nor ext4 (which is already in Linus' tree) do such
> > equivalent checks, nor the proposed fstests test case makes sure a
> > journal/log exists.
> > 
> > Not against it, but this isn't a common use case either.
> 
> I think of this as sorts of "optimisation" where if we don't have a tree
> then we can allow trim. Though this is much simpler so I'm fine with it
> as well.

Agreed, the simple solution sounds ok to me, trim is not a critical
operation so we don't need to try harder to make it work even with the
mount option.

Patch
diff mbox series

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 494f0f10d70e..01808934d21f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -501,6 +501,16 @@  static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	/*
+	 * If the fs is mounted with nologreplay, which requires it to be
+	 * mounted in RO mode as well, we can not allow discard on free space
+	 * inside block groups, because log trees refer to extents that are not
+	 * pinned in a block group's free space cache (pinning the extents is
+	 * precisely the first phase of replaying a log tree).
+	 */
+	if (btrfs_test_opt(fs_info, NOLOGREPLAY))
+		return -EROFS;
+
 	rcu_read_lock();
 	list_for_each_entry_rcu(device, &fs_info->fs_devices->devices,
 				dev_list) {