diff mbox

[v3] btrfs: Introduce new mount option to disable tree log replay

Message ID 1449714846-18174-1-git-send-email-quwenruo@cn.fujitsu.com (mailing list archive)
State Superseded
Headers show

Commit Message

Qu Wenruo Dec. 10, 2015, 2:34 a.m. UTC
Introduce a new mount option "nologreplay" to co-operate with "ro" mount
option to get real readonly mount, like "norecovery" in ext* and xfs.

Since the new parse_options() need to check new flags at remount time,
so add a new parameter for parse_options().

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
---
v2:
  Make RO check mandatory for btrfs_parse_options().
  Add btrfs_show_options() support for nologreplay.

  Document for btrfs-mount(5) will follow after the patch being merged.
v3:
  Apply the document suggestion from  Chandan Rajendra.
---
 Documentation/filesystems/btrfs.txt |  9 +++++++++
 fs/btrfs/ctree.h                    |  4 +++-
 fs/btrfs/disk-io.c                  |  7 ++++---
 fs/btrfs/super.c                    | 29 +++++++++++++++++++++++++----
 4 files changed, 41 insertions(+), 8 deletions(-)

Comments

David Sterba Dec. 14, 2015, 5:32 p.m. UTC | #1
On Thu, Dec 10, 2015 at 10:34:06AM +0800, Qu Wenruo wrote:
> Introduce a new mount option "nologreplay" to co-operate with "ro" mount
> option to get real readonly mount, like "norecovery" in ext* and xfs.
> 
> Since the new parse_options() need to check new flags at remount time,
> so add a new parameter for parse_options().
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>

I've read the discussions around the change and from the user's POV I'd
suggest to add another mount option that would be just an alias for any
mount options that would implement the 'hard-ro' semantics.

Say it's called 'nowr'. Now it would imply 'nologreplay', but may cover
more options in the future.

 mount -o ro,nowr /dev/sdx /mnt

would work when switching kernels.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Austin S. Hemmelgarn Dec. 14, 2015, 5:50 p.m. UTC | #2
On 2015-12-14 12:32, David Sterba wrote:
> On Thu, Dec 10, 2015 at 10:34:06AM +0800, Qu Wenruo wrote:
>> Introduce a new mount option "nologreplay" to co-operate with "ro" mount
>> option to get real readonly mount, like "norecovery" in ext* and xfs.
>>
>> Since the new parse_options() need to check new flags at remount time,
>> so add a new parameter for parse_options().
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>
> I've read the discussions around the change and from the user's POV I'd
> suggest to add another mount option that would be just an alias for any
> mount options that would implement the 'hard-ro' semantics.
>
> Say it's called 'nowr'. Now it would imply 'nologreplay', but may cover
> more options in the future.
It should also imply noatime.  I'm not sure how BTRFS handles atime when 
mounted RO, but I know a lot of old UNIX systems updated atime even on 
filesystems mounted RO, and I know that at least at one point Linux did too.
>
>   mount -o ro,nowr /dev/sdx /mnt
>
> would work when switching kernels.

I like this idea, but I think that having a name like true-ro or hard-ro 
and making it imply ro (and noatime) would probably be better (or at 
least, simpler to use from a user perspective).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Dec. 14, 2015, 7:11 p.m. UTC | #3
On Mon, 2015-12-14 at 18:32 +0100, David Sterba wrote:
> I've read the discussions around the change and from the user's POV
> I'd
> suggest to add another mount option that would be just an alias for
> any
> mount options that would implement the 'hard-ro' semantics.
Nice to hear... 


> Say it's called 'nowr'
though I'm deeply saddened, that you don't like my proposed "hard-ro"
which I though about for nearly 1s ;-)

>  mount -o ro,nowr /dev/sdx /mnt
Sounds reasonable... especially I mean that, as long ro's documentation
points to "nowr" and clearly states whether both (ro+nowr) are required
to get the desired behaviour, I have no very strong opinion, whether
both (ro+nowr) should be required, or whether nowr, should imply ro.
Though I think, the later may be better.

Thanks,
Chris.
Christoph Anton Mitterer Dec. 14, 2015, 7:16 p.m. UTC | #4
On Mon, 2015-12-14 at 12:50 -0500, Austin S. Hemmelgarn wrote:
> It should also imply noatime.  I'm not sure how BTRFS handles atime
> when 
> mounted RO, but I know a lot of old UNIX systems updated atime even
> on 
> filesystems mounted RO, and I know that at least at one point Linux
> did too.
I stumbled over that recently myself, and haven't bothered to try it
out, yet.
But Duncan's argument, why at least ro-snapshots (yes I know, this may
not be exactly the same as RO mount option) would need to imply
noatime, is pretty convincing. :)

Anyway, if it "ro" wouldn't imply noatime, I would ask why, because the
atime is definitely something the fs exports normally to userland,...
and that's how I'd basically consider hard-ro vs. (soft-)ro:

soft-ro: data as visible by the mounted fs must not change (unless
         perhaps for necessary repair/replay operations to get the 
         filesystem back in a consistent state)
hard-ro: soft-ro + nothing on the backing devices may change (bitwise)


Cheers,
Chris.
Austin S. Hemmelgarn Dec. 14, 2015, 7:33 p.m. UTC | #5
On 2015-12-14 14:16, Christoph Anton Mitterer wrote:
> On Mon, 2015-12-14 at 12:50 -0500, Austin S. Hemmelgarn wrote:
>> It should also imply noatime.  I'm not sure how BTRFS handles atime
>> when
>> mounted RO, but I know a lot of old UNIX systems updated atime even
>> on
>> filesystems mounted RO, and I know that at least at one point Linux
>> did too.
> I stumbled over that recently myself, and haven't bothered to try it
> out, yet.
> But Duncan's argument, why at least ro-snapshots (yes I know, this may
> not be exactly the same as RO mount option) would need to imply
> noatime, is pretty convincing. :)
The traditional reasoning was that read-only meant that users couldn't 
change anything, not that the actual data on disk wouldn't change. 
That, and there's been some really brain-dead software over the years 
that depended on atimes being right (now, the only remaining software I 
know of that even uses them at all is Mutt).
>
> Anyway, if it "ro" wouldn't imply noatime, I would ask why, because the
> atime is definitely something the fs exports normally to userland,...
> and that's how I'd basically consider hard-ro vs. (soft-)ro:
>
> soft-ro: data as visible by the mounted fs must not change (unless
>           perhaps for necessary repair/replay operations to get the
>           filesystem back in a consistent state)
> hard-ro: soft-ro + nothing on the backing devices may change (bitwise)
This should be 'Nothing on the backing device may change as a result of 
the FS', nitpicking I know, but we should be specific so that we 
hopefully avoid ending up in the same situation again.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Dec. 14, 2015, 7:44 p.m. UTC | #6
On Mon, 2015-12-14 at 14:33 -0500, Austin S. Hemmelgarn wrote:
> The traditional reasoning was that read-only meant that users
> couldn't 
> change anything
Where I'd however count the atime changes to.
The atimes wouldn't change magically, but only because the user stared
some program, configured some daemon, etc. ... which reads/writes/etc.
the file.


> , not that the actual data on disk wouldn't change. 
> That, and there's been some really brain-dead software over the years
> that depended on atimes being right (now, the only remaining software
> I 
> know of that even uses them at all is Mutt).
Wasn't tmpwatcher anoterh candidate?


> This should be 'Nothing on the backing device may change as a result
> of 
> the FS', nitpicking I know, but we should be specific so that we 
> hopefully avoid ending up in the same situation again.
Of course, you're right! :-)

(especially when btrfs should ever be formalised in a standards
document, this should read like:
>hard-ro: Nothing on the backing device may change as a result of the
>FS, however, e.g. maleware, may directly destroy the data on the
>blockdevice ;-)


Chris.
Austin S. Hemmelgarn Dec. 14, 2015, 8:20 p.m. UTC | #7
On 2015-12-14 14:44, Christoph Anton Mitterer wrote:
> On Mon, 2015-12-14 at 14:33 -0500, Austin S. Hemmelgarn wrote:
>> The traditional reasoning was that read-only meant that users
>> couldn't
>> change anything
> Where I'd however count the atime changes to.
> The atimes wouldn't change magically, but only because the user stared
> some program, configured some daemon, etc. ... which reads/writes/etc.
> the file.
But reading the file is allowed, which is where this starts to get 
ambiguous.  Reading a file updates the atime (and in fact, this is the 
way that most stuff that uses them cares about them), but even a ro 
mount allows reading the file.  The traditional meaning of ro on UNIX 
was (AFAIUI) that directory structure couldn't change, new files 
couldn't be created, existing files couldn't be deleted, flags on the 
inodes couldn't be changed, and file data couldn't be changed.  TBH, I'm 
not even certain that atime updates on ro filesystems was even an 
intentional thing in the first place, it really sounds to me like the 
type of thing that somebody forgot to put in a permissions check for, 
and then people thought it was a feature.
>
>
>> , not that the actual data on disk wouldn't change.
>> That, and there's been some really brain-dead software over the years
>> that depended on atimes being right (now, the only remaining software
>> I
>> know of that even uses them at all is Mutt).
> Wasn't tmpwatcher anoterh candidate?
Most such software can use it, but doesn't depend on it.  TBH, many 
people these days run /tmp (and even /var/tmp) on an in memory 
filesystem, so atime updates aren't as much of an issue there.  Also, 
even with noatime, I'm pretty sure the VFS updates the atime every time 
the mtime changes (because not doing so would be somewhat stupid, and 
you're writing the inode anyway), which technically means that stuff 
could work around this by opening the file, truncating it to the size it 
already is, and then closing it.
>
>
>> This should be 'Nothing on the backing device may change as a result
>> of
>> the FS', nitpicking I know, but we should be specific so that we
>> hopefully avoid ending up in the same situation again.
> Of course, you're right! :-)
>
> (especially when btrfs should ever be formalised in a standards
> document, this should read like:
>> hard-ro: Nothing on the backing device may change as a result of the
>> FS, however, e.g. maleware, may directly destroy the data on the
>> blockdevice ;-)
>
>
> Chris.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Dec. 14, 2015, 11:34 p.m. UTC | #8
On Mon, 2015-12-14 at 15:20 -0500, Austin S. Hemmelgarn wrote:
> On 2015-12-14 14:44, Christoph Anton Mitterer wrote:
> > On Mon, 2015-12-14 at 14:33 -0500, Austin S. Hemmelgarn wrote:
> > > The traditional reasoning was that read-only meant that users
> > > couldn't
> > > change anything
> > Where I'd however count the atime changes to.
> > The atimes wouldn't change magically, but only because the user
> > stared
> > some program, configured some daemon, etc. ... which
> > reads/writes/etc.
> > the file.
> But reading the file is allowed, which is where this starts to get 
> ambiguous.
Why?

> Reading a file updates the atime (and in fact, this is the 
> way that most stuff that uses them cares about them), but even a ro 
> mount allows reading the file.
As I just wrote in the other post, at least for btrfs (haven't checked
ext/xfs due to being... well... lazy O:-) ) ro mount option  or  ro
snapshot seems to mean: no atime updates even if mounted with
strictatimes (or maybe I did just something stupid when checking, so
better double check)


> The traditional meaning of ro on UNIX 
> was (AFAIUI) that directory structure couldn't change, new files 
> couldn't be created, existing files couldn't be deleted, flags on the
> inodes couldn't be changed, and file data couldn't be changed.  TBH,
> I'm 
> not even certain that atime updates on ro filesystems was even an 
> intentional thing in the first place, it really sounds to me like the
> type of thing that somebody forgot to put in a permissions check for,
> and then people thought it was a feature.
Well in the end it probably doesn't matter how it came to existence,...
rather what it should be and what it actually is.
As said, I, personally, from the user PoV, would says soft-ro already
includes no dates on files being modifiable (including atime), as I'd
consider these a property of the file.
However anyone else may of course see that differently and at the same
time be smarter than I am.



> Also,
 
> even with noatime, I'm pretty sure the VFS updates the atime every
> time 
> the mtime changes
I've just checked and not it doesn't:
  File: ‘subvol/FILE’
  Size: 8         	Blocks: 16         IO Block: 4096   regular
file
Device: 30h/48d	Inode: 257         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid:
(    0/    root)
Access: 2015-12-15 00:01:46.452007798 +0100
Modify: 2015-12-15 00:31:26.579511816 +0100
Change: 2015-12-15 00:31:26.579511816 +0100

(rw,noatime mounted,... mtime, is more recent than atime)


>  (because not doing so would be somewhat stupid, and 
> you're writing the inode anyway), which technically means that stuff 
> could work around this by opening the file, truncating it to the size
> it 
> already is, and then closing it.
Hmm I don't have a strong opinion here... it sounds "supid" from the
technical point in that it *could* write the atime and that wouldn't
cost much.
OTOH, that would make things more ambiguous when atimes change and when
not... (they'd only change on writes, never on reads,...)
So I think it's good as it is... and it matches the name, which is
noatime - and not noatime-unless-on-writes ;-)



Cheers,
Chris.
Austin S. Hemmelgarn Dec. 15, 2015, 1:31 p.m. UTC | #9
On 2015-12-14 18:34, Christoph Anton Mitterer wrote:
> On Mon, 2015-12-14 at 15:20 -0500, Austin S. Hemmelgarn wrote:
>> On 2015-12-14 14:44, Christoph Anton Mitterer wrote:
>>> On Mon, 2015-12-14 at 14:33 -0500, Austin S. Hemmelgarn wrote:
>>>> The traditional reasoning was that read-only meant that users
>>>> couldn't
>>>> change anything
>>> Where I'd however count the atime changes to.
>>> The atimes wouldn't change magically, but only because the user
>>> stared
>>> some program, configured some daemon, etc. ... which
>>> reads/writes/etc.
>>> the file.
>> But reading the file is allowed, which is where this starts to get
>> ambiguous.
> Why?
Because according to POSIX, when a file gets read, the atime gets 
updated.  Except that POSIX doesn't specify what happens if the 
filesystem is mounted read-only, but the underlying block device is 
writable.
>
>> Reading a file updates the atime (and in fact, this is the
>> way that most stuff that uses them cares about them), but even a ro
>> mount allows reading the file.
> As I just wrote in the other post, at least for btrfs (haven't checked
> ext/xfs due to being... well... lazy O:-) ) ro mount option  or  ro
> snapshot seems to mean: no atime updates even if mounted with
> strictatimes (or maybe I did just something stupid when checking, so
> better double check)
>
>
>> The traditional meaning of ro on UNIX
>> was (AFAIUI) that directory structure couldn't change, new files
>> couldn't be created, existing files couldn't be deleted, flags on the
>> inodes couldn't be changed, and file data couldn't be changed.  TBH,
>> I'm
>> not even certain that atime updates on ro filesystems was even an
>> intentional thing in the first place, it really sounds to me like the
>> type of thing that somebody forgot to put in a permissions check for,
>> and then people thought it was a feature.
> Well in the end it probably doesn't matter how it came to existence,...
> rather what it should be and what it actually is.
Knowing how you got where you are is pretty important for figuring out 
how to not end up there again :)
> As said, I, personally, from the user PoV, would says soft-ro already
> includes no dates on files being modifiable (including atime), as I'd
> consider these a property of the file.
> However anyone else may of course see that differently and at the same
> time be smarter than I am.
AFAIK, the original versions of UNIX had no touch command or utime() 
syscall, so ctime, mtime, and atime were these things that just got 
magically updated by the system (ctime is still this way), and thus 
wasn't something that was considered user modification to the filesystem.
>
>> Also,
>
>> even with noatime, I'm pretty sure the VFS updates the atime every
>> time
>> the mtime changes
> I've just checked and not it doesn't:
>    File: ‘subvol/FILE’
>    Size: 8         	Blocks: 16         IO Block: 4096   regular
> file
> Device: 30h/48d	Inode: 257         Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid:
> (    0/    root)
> Access: 2015-12-15 00:01:46.452007798 +0100
> Modify: 2015-12-15 00:31:26.579511816 +0100
> Change: 2015-12-15 00:31:26.579511816 +0100
>
> (rw,noatime mounted,... mtime, is more recent than atime)
Hmm, I could have sworn that updating the mtime on a file would force an 
atime update.  \me checks documentation.  OK, I was thinking of 
relatime, which updates the atime if it's older than mtime or ctime.
>
>>   (because not doing so would be somewhat stupid, and
>> you're writing the inode anyway), which technically means that stuff
>> could work around this by opening the file, truncating it to the size
>> it
>> already is, and then closing it.
> Hmm I don't have a strong opinion here... it sounds "supid" from the
> technical point in that it *could* write the atime and that wouldn't
> cost much.
> OTOH, that would make things more ambiguous when atimes change and when
> not... (they'd only change on writes, never on reads,...)
> So I think it's good as it is... and it matches the name, which is
> noatime - and not noatime-unless-on-writes ;-)
Except there are still ways to update the atime even on a filesystem 
mounted noatime.  For example, on _every_ POSIX compliant system out 
there (and Linux itself is mostly POSIX compliant, it's primarily the 
userspace that isn't), you can update the atime using the utime() system 
call, unless the filesystem is read-only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Dec. 16, 2015, 1:36 a.m. UTC | #10
David Sterba wrote on 2015/12/14 18:32 +0100:
> On Thu, Dec 10, 2015 at 10:34:06AM +0800, Qu Wenruo wrote:
>> Introduce a new mount option "nologreplay" to co-operate with "ro" mount
>> option to get real readonly mount, like "norecovery" in ext* and xfs.
>>
>> Since the new parse_options() need to check new flags at remount time,
>> so add a new parameter for parse_options().
>>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>
> I've read the discussions around the change and from the user's POV I'd
> suggest to add another mount option that would be just an alias for any
> mount options that would implement the 'hard-ro' semantics.
>
> Say it's called 'nowr'. Now it would imply 'nologreplay', but may cover
> more options in the future.
>
>   mount -o ro,nowr /dev/sdx /mnt
>
> would work when switching kernels.
>
>
That would be nice.

I'd like to forward the idea/discussion to filesystem ml, not only btrfs 
maillist.

Such behavior should better be coordinated between all(at least xfs and 
ext4 and btrfs) filesystems.

One sad example is, we can't use 'norecovery' mount option to disable 
log replay in btrfs, as there is 'recovery' mount option already.

So I hope we can have a unified mount option between mainline filesystems.

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Dec. 16, 2015, 2:13 a.m. UTC | #11
On Wed, 2015-12-16 at 09:36 +0800, Qu Wenruo wrote:
> One sad example is, we can't use 'norecovery' mount option to disable
 
> log replay in btrfs, as there is 'recovery' mount option already.
I think "norecovery" would anyway not really fit... the name should
rather indicated, that from the filesystem side, nothing changes the
underlying device's contents.
"norecovery" would just tell, that no recovery options would be tried,
however, any other changes (optimisations, etc.) could still go on.

David's "nowr" is already, better, though it could be misinterpreted as
no write/read (as wr being rw swapped), so perhaps "nowrites" would be
better... but that again may be considered just another name for "ro".

So perhaps one could do something that includes "dev", like "rodev",
"ro-dev", or "immuatable-dev"... or instead of "dev" "devs" to cover
multi-device cases.
OTOH, the devices aren't really set "ro" (as in blockdev --setro).

Maybe "nodevwrites" or "no-dev-writes" or one of these with "device"
not abbreviated?


Many programs have a "--dry-run" option, but I kinda don't liky
drymount or something like that.


Guess from the above, I'd personally like "nodevwrites" the most.


Oh and Qu's idea of coordinating that with the other filesystems is
surely good.


Cheers,
Chris.
Duncan Dec. 16, 2015, 11:10 a.m. UTC | #12
Qu Wenruo posted on Wed, 16 Dec 2015 09:36:23 +0800 as excerpted:

> David Sterba wrote on 2015/12/14 18:32 +0100:
>> On Thu, Dec 10, 2015 at 10:34:06AM +0800, Qu Wenruo wrote:
>>> Introduce a new mount option "nologreplay" to co-operate with "ro"
>>> mount option to get real readonly mount, like "norecovery" in ext* and
>>> xfs.
>>>
>>> Since the new parse_options() need to check new flags at remount time,
>>> so add a new parameter for parse_options().
>>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>>> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>>
>> I've read the discussions around the change and from the user's POV I'd
>> suggest to add another mount option that would be just an alias for any
>> mount options that would implement the 'hard-ro' semantics.
>>
>> Say it's called 'nowr'. Now it would imply 'nologreplay', but may cover
>> more options in the future.
>>
>>   mount -o ro,nowr /dev/sdx /mnt
>>
>> would work when switching kernels.
>>
>>
> That would be nice.
> 
> I'd like to forward the idea/discussion to filesystem ml, not only btrfs
> maillist.
> 
> Such behavior should better be coordinated between all(at least xfs and
> ext4 and btrfs) filesystems.
> 
> One sad example is, we can't use 'norecovery' mount option to disable
> log replay in btrfs, as there is 'recovery' mount option already.
> 
> So I hope we can have a unified mount option between mainline
> filesystems.

FWIW, I was just reading the mount manpage in connection with a reply for 
a different thread, and noticed...

mount (8) (from util-linux 2.27.1) says noload and norecovery are the 
same option, for ext3/4 at least.  It refers to the xfs (5) manpage, from 
xfsprogs, for xfs mount options, and that's not installed here, so I 
can't confirm noload for it, but it's there for ext3/4.

And noload doesn't have the namespace collision problem norecovery does 
on btrfs, so I'd strongly suggest using it, at least as an alias for 
whatever other btrfs-specific name we might choose.
Christoph Anton Mitterer Dec. 16, 2015, 11:45 a.m. UTC | #13
On Wed, 2015-12-16 at 11:10 +0000, Duncan wrote:
> And noload doesn't have the namespace collision problem norecovery
> does 
> on btrfs, so I'd strongly suggest using it, at least as an alias for 
> whatever other btrfs-specific name we might choose.

but noload is, AFAIU, not what's desired here, is it?
Per manpage it's "Don't load the journal on mounting",... not only
wouldn't that fit for btrfs, it's also not what's really desired, i.e.
an option that implies everything necessary to not modify the device.

Cheers,
Chris.
Austin S. Hemmelgarn Dec. 16, 2015, 12:12 p.m. UTC | #14
On 2015-12-16 06:10, Duncan wrote:
> Qu Wenruo posted on Wed, 16 Dec 2015 09:36:23 +0800 as excerpted:
>
>> David Sterba wrote on 2015/12/14 18:32 +0100:
>>> On Thu, Dec 10, 2015 at 10:34:06AM +0800, Qu Wenruo wrote:
>>>> Introduce a new mount option "nologreplay" to co-operate with "ro"
>>>> mount option to get real readonly mount, like "norecovery" in ext* and
>>>> xfs.
>>>>
>>>> Since the new parse_options() need to check new flags at remount time,
>>>> so add a new parameter for parse_options().
>>>>
>>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>>> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
>>>> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
>>>
>>> I've read the discussions around the change and from the user's POV I'd
>>> suggest to add another mount option that would be just an alias for any
>>> mount options that would implement the 'hard-ro' semantics.
>>>
>>> Say it's called 'nowr'. Now it would imply 'nologreplay', but may cover
>>> more options in the future.
>>>
>>>    mount -o ro,nowr /dev/sdx /mnt
>>>
>>> would work when switching kernels.
>>>
>>>
>> That would be nice.
>>
>> I'd like to forward the idea/discussion to filesystem ml, not only btrfs
>> maillist.
>>
>> Such behavior should better be coordinated between all(at least xfs and
>> ext4 and btrfs) filesystems.
>>
>> One sad example is, we can't use 'norecovery' mount option to disable
>> log replay in btrfs, as there is 'recovery' mount option already.
>>
>> So I hope we can have a unified mount option between mainline
>> filesystems.
>
> FWIW, I was just reading the mount manpage in connection with a reply for
> a different thread, and noticed...
>
> mount (8) (from util-linux 2.27.1) says noload and norecovery are the
> same option, for ext3/4 at least.  It refers to the xfs (5) manpage, from
> xfsprogs, for xfs mount options, and that's not installed here, so I
> can't confirm noload for it, but it's there for ext3/4.
Unless it's undocumented, XFS doesn't have it (as much as I hate XFS, I 
have to have xfsprogs installed so that I can do recovery for the few 
systems at work that actually use it if the need arises).
>
> And noload doesn't have the namespace collision problem norecovery does
> on btrfs, so I'd strongly suggest using it, at least as an alias for
> whatever other btrfs-specific name we might choose.
I kind of agree with Christoph here.  I don't think that noload should 
be the what we actually use, although I do think having it as an alias 
for whatever name we end up using would be a good thing.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Dec. 16, 2015, 12:34 p.m. UTC | #15
On Wed, 2015-12-16 at 07:12 -0500, Austin S. Hemmelgarn wrote:
> I kind of agree with Christoph here.  I don't think that noload
> should 
> be the what we actually use, although I do think having it as an
> alias 
> for whatever name we end up using would be a good thing.
No, because people would start using it, getting used to it, and in 4
years we could never change it again,... which may be necessary...

noload, seems to mean don't load the journal. Unless btrfs gets a
journal in the sense xfs/ext has one, it simply should either not use
that name at all... or not try to "map" it to something of it's own
which is similar, but in reality not the same.


Chris.
Austin S. Hemmelgarn Dec. 16, 2015, 12:57 p.m. UTC | #16
On 2015-12-16 07:34, Christoph Anton Mitterer wrote:
> On Wed, 2015-12-16 at 07:12 -0500, Austin S. Hemmelgarn wrote:
>> I kind of agree with Christoph here.  I don't think that noload
>> should
>> be the what we actually use, although I do think having it as an
>> alias
>> for whatever name we end up using would be a good thing.
> No, because people would start using it, getting used to it, and in 4
> years we could never change it again,... which may be necessary...
>
No, because we should ease the transition from other filesystems to the 
greatest extent reasonably possible.  It should be clearly documented as 
an alias for compatibility with ext{3,4}, and that it might go away in 
the future.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Anton Mitterer Dec. 16, 2015, 1:01 p.m. UTC | #17
On Wed, 2015-12-16 at 07:57 -0500, Austin S. Hemmelgarn wrote:
> No, because we should ease the transition from other filesystems to
> the 
> greatest extent reasonably possible.  It should be clearly documented
> as 
> an alias for compatibility with ext{3,4}, and that it might go away
> in 
> the future.

Where's the need for an easy migration path? Such option wouldn't be
used by default and even if, people would need to change their
fstab/etc. anyway (ext->btrfs).

The note that things go away never really work out that easy... people
start to use it, rely on it... and sooner or later you have a situation
as with atime, that you basically never can get rid of it.

IMHO an alias here would be just ambiguous.

Cheers,
Chris.
David Sterba Dec. 16, 2015, 1:53 p.m. UTC | #18
On Mon, Dec 14, 2015 at 12:50:37PM -0500, Austin S. Hemmelgarn wrote:
> On 2015-12-14 12:32, David Sterba wrote:
> > On Thu, Dec 10, 2015 at 10:34:06AM +0800, Qu Wenruo wrote:
> >> Introduce a new mount option "nologreplay" to co-operate with "ro" mount
> >> option to get real readonly mount, like "norecovery" in ext* and xfs.
> >>
> >> Since the new parse_options() need to check new flags at remount time,
> >> so add a new parameter for parse_options().
> >>
> >> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> >> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> >> Tested-by: Austin S. Hemmelgarn <ahferroin7@gmail.com>
> >
> > I've read the discussions around the change and from the user's POV I'd
> > suggest to add another mount option that would be just an alias for any
> > mount options that would implement the 'hard-ro' semantics.
> >
> > Say it's called 'nowr'. Now it would imply 'nologreplay', but may cover
> > more options in the future.
> It should also imply noatime.  I'm not sure how BTRFS handles atime when 
> mounted RO, but I know a lot of old UNIX systems updated atime even on 
> filesystems mounted RO, and I know that at least at one point Linux did too.

A mount with -o ro will not touch atimes. At one point the read-only
snapshots changed atimes, but this has been fixed since.

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/inode.c#n1602
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/inode.c#n5973

> >   mount -o ro,nowr /dev/sdx /mnt
> >
> > would work when switching kernels.
> 
> I like this idea, but I think that having a name like true-ro or hard-ro 
> and making it imply ro (and noatime) would probably be better (or at 
> least, simpler to use from a user perspective).

Ok, a single option to do the real-ro sounds better than ro,something.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Dec. 16, 2015, 1:58 p.m. UTC | #19
On Wed, Dec 16, 2015 at 09:36:23AM +0800, Qu Wenruo wrote:
> >
> >   mount -o ro,nowr /dev/sdx /mnt
> >
> > would work when switching kernels.
> >
> That would be nice.
> 
> I'd like to forward the idea/discussion to filesystem ml, not only btrfs 
> maillist.

Good idea.

> Such behavior should better be coordinated between all(at least xfs and 
> ext4 and btrfs) filesystems.
> 
> One sad example is, we can't use 'norecovery' mount option to disable 
> log replay in btrfs, as there is 'recovery' mount option already.

I think we should pick a name that's not tied to the implementation how
the potential writes could happen under a RO mount.
Recovery/replay/whatever, the expected use is "avoid any writes".

> So I hope we can have a unified mount option between mainline filesystems.

That would be a good thing indeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duncan Dec. 17, 2015, 1:09 a.m. UTC | #20
Christoph Anton Mitterer posted on Wed, 16 Dec 2015 12:45:00 +0100 as
excerpted:

> On Wed, 2015-12-16 at 11:10 +0000, Duncan wrote:
>> And noload doesn't have the namespace collision problem norecovery does
>> on btrfs, so I'd strongly suggest using it, at least as an alias for
>> whatever other btrfs-specific name we might choose.
> 
> but noload is, AFAIU, not what's desired here, is it?
> Per manpage it's "Don't load the journal on mounting",... not only
> wouldn't that fit for btrfs, it's also not what's really desired, i.e.
> an option that implies everything necessary to not modify the device.

Well, "don't load the journal on mounting" is exactly what the option 
would do.  The journal (aka log) of course has a slightly different 
meaning, it's only the fsync log, but loading it is exactly what the 
option would prevent, here.

Of course that isn't to say there shouldn't be another option, call it 
nomodify, for argument, that includes this and perhaps other options that 
would otherwise trigger filesystem level changes on a normal read-only 
mount.

Too bad we can't simply rename the recovery mount option so norecovery 
could be used as well, but I guess that could potentially break too many 
existing deployments. =:^(
Christoph Anton Mitterer Dec. 17, 2015, 1:46 a.m. UTC | #21
On Thu, 2015-12-17 at 01:09 +0000, Duncan wrote:
> Well, "don't load the journal on mounting" is exactly what the option
 
> would do.  The journal (aka log) of course has a slightly different 
> meaning, it's only the fsync log, but loading it is exactly what the 
> option would prevent, here.
That's not the point.
What David asked for was an option, that has the meaning "do what ever
is necessary to mount the fs in such a way, that the device isn't
changed".
At least that's how I've understood him.

*Right now* this is for btrfs the "nologreplay" option, so
"nodevwrites" or whatever you call it, would simply imply
"nologreply"... and in the future any other options that are necessary
to get the above defined semantics.

For ext, "nodevwrites" would imply "noload" (AFAIU).

If you now make "noload" an alias for "nodevwrites" in btrfs you
clearly break semantics here:
"noload" from ext4 hasn't the same meaning as "nodevwrites" from
btrfs.. it has *just now* while ext doesn't need any other possible
future options.

Maybe in 10 years, ext has a dozens new features (because btrfs still
hasn't stabilised yet, as it misses snapshot-aware defrag and checksums
for noCoWed data >:-D O:-D ... sorry, couldn't resist ;) ), one of that
new features needs to be disabled for "hard ro mounts", thus ext's
"nodevwrites" would in addition imply "noSuperNewExtFeature".

Then "noload" from ext4 isn't even *effectively* the same anymore as
"nodevwrites" from btrfs.
Therefore, it shouldn't be an alias.


Cheers,
Chris.
diff mbox

Patch

diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt
index c772b47..0f2ed48 100644
--- a/Documentation/filesystems/btrfs.txt
+++ b/Documentation/filesystems/btrfs.txt
@@ -168,6 +168,15 @@  Options with (*) are default options and will not show in the mount options.
   notreelog
 	Enable/disable the tree logging used for fsync and O_SYNC writes.
 
+  nologreplay
+	Disable the log tree replay at mount time to prevent filesystem
+	from getting modified.
+	Must be used with 'ro' mount option.
+	A filesystem mounted with the 'nologreplay' option cannot
+	transition to a read-write mount via remount,rw - the filesystem
+	must be unmounted and mounted back again if read-write access is
+	desired.
+
   recovery
 	Enable autorecovery attempts if a bad tree root is found at mount time.
 	Currently this scans a list of several previous tree roots and tries to
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a0165c6..c54ff25 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2184,6 +2184,7 @@  struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_RESCAN_UUID_TREE	(1 << 23)
 #define BTRFS_MOUNT_FRAGMENT_DATA	(1 << 24)
 #define BTRFS_MOUNT_FRAGMENT_METADATA	(1 << 25)
+#define BTRFS_MOUNT_NOLOGREPLAY		(1 << 26)
 
 #define BTRFS_DEFAULT_COMMIT_INTERVAL	(30)
 #define BTRFS_DEFAULT_MAX_INLINE	(8192)
@@ -4070,7 +4071,8 @@  void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info);
 ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size);
 
 /* super.c */
-int btrfs_parse_options(struct btrfs_root *root, char *options);
+int btrfs_parse_options(struct btrfs_root *root, char *options,
+			unsigned long new_flags);
 int btrfs_sync_fs(struct super_block *sb, int wait);
 
 #ifdef CONFIG_PRINTK
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1eb0839..617bf4f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2711,7 +2711,7 @@  int open_ctree(struct super_block *sb,
 	 */
 	fs_info->compress_type = BTRFS_COMPRESS_ZLIB;
 
-	ret = btrfs_parse_options(tree_root, options);
+	ret = btrfs_parse_options(tree_root, options, sb->s_flags);
 	if (ret) {
 		err = ret;
 		goto fail_alloc;
@@ -3009,8 +3009,9 @@  retry_root_backup:
 	if (ret)
 		goto fail_trans_kthread;
 
-	/* do not make disk changes in broken FS */
-	if (btrfs_super_log_root(disk_super) != 0) {
+	/* do not make disk changes in broken FS or nologreplay is given */
+	if (btrfs_super_log_root(disk_super) != 0 &&
+	    !btrfs_test_opt(tree_root, NOLOGREPLAY)) {
 		ret = btrfs_replay_log(fs_info, fs_devices);
 		if (ret) {
 			err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 24154e4..55989cb 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -302,7 +302,7 @@  enum {
 	Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
 	Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
 	Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow,
-	Opt_datasum, Opt_treelog, Opt_noinode_cache,
+	Opt_datasum, Opt_treelog, Opt_noinode_cache, Opt_nologreplay,
 #ifdef CONFIG_BTRFS_DEBUG
 	Opt_fragment_data, Opt_fragment_metadata, Opt_fragment_all,
 #endif
@@ -334,6 +334,7 @@  static match_table_t tokens = {
 	{Opt_noacl, "noacl"},
 	{Opt_notreelog, "notreelog"},
 	{Opt_treelog, "treelog"},
+	{Opt_nologreplay, "nologreplay"},
 	{Opt_flushoncommit, "flushoncommit"},
 	{Opt_noflushoncommit, "noflushoncommit"},
 	{Opt_ratio, "metadata_ratio=%d"},
@@ -371,7 +372,8 @@  static match_table_t tokens = {
  * reading in a new superblock is parsed here.
  * XXX JDM: This needs to be cleaned up for remount.
  */
-int btrfs_parse_options(struct btrfs_root *root, char *options)
+int btrfs_parse_options(struct btrfs_root *root, char *options,
+			unsigned long new_flags)
 {
 	struct btrfs_fs_info *info = root->fs_info;
 	substring_t args[MAX_OPT_ARGS];
@@ -386,8 +388,12 @@  int btrfs_parse_options(struct btrfs_root *root, char *options)
 	if (cache_gen)
 		btrfs_set_opt(info->mount_opt, SPACE_CACHE);
 
+	/*
+	 * Even the options are empty, we still need to do extra check
+	 * against new flags
+	 */
 	if (!options)
-		goto out;
+		goto check;
 
 	/*
 	 * strsep changes the string, duplicate it because parse_options
@@ -587,6 +593,10 @@  int btrfs_parse_options(struct btrfs_root *root, char *options)
 			btrfs_clear_and_info(root, NOTREELOG,
 					     "enabling tree log");
 			break;
+		case Opt_nologreplay:
+			btrfs_set_and_info(root, NOLOGREPLAY,
+					   "disabling log replay at mount time");
+			break;
 		case Opt_flushoncommit:
 			btrfs_set_and_info(root, FLUSHONCOMMIT,
 					   "turning on flush-on-commit");
@@ -753,6 +763,15 @@  int btrfs_parse_options(struct btrfs_root *root, char *options)
 			break;
 		}
 	}
+check:
+	/*
+	 * Extra check for current option against current flag
+	 */
+	if (btrfs_test_opt(root, NOLOGREPLAY) && !(new_flags & MS_RDONLY)) {
+		btrfs_err(root->fs_info,
+			  "nologreplay must be used with ro mount option");
+		ret = -EINVAL;
+	}
 out:
 	if (!ret && btrfs_test_opt(root, SPACE_CACHE))
 		btrfs_info(root->fs_info, "disk space caching is enabled");
@@ -1154,6 +1173,8 @@  static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 		seq_puts(seq, ",ssd");
 	if (btrfs_test_opt(root, NOTREELOG))
 		seq_puts(seq, ",notreelog");
+	if (btrfs_test_opt(root, NOLOGREPLAY))
+		seq_puts(seq, ",nologreplay");
 	if (btrfs_test_opt(root, FLUSHONCOMMIT))
 		seq_puts(seq, ",flushoncommit");
 	if (btrfs_test_opt(root, DISCARD))
@@ -1637,7 +1658,7 @@  static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 		}
 	}
 
-	ret = btrfs_parse_options(root, data);
+	ret = btrfs_parse_options(root, data, *flags);
 	if (ret) {
 		ret = -EINVAL;
 		goto restore;